Hello,
Recently someone working at Yahoo emailed me regarding an old thread I’ve started on the Apache Flink user mailing list. I’ve replied to the e-mail but also decided to turn the reply into a blog post, because it might help other people as well.
Hi,
I was able to get it working after tinkering with it. The issue was mainly a miscommunication, we didn’t formally know which authentication method we were using in AWS. We we’re using only s3://
Here are our configuration options:
On S3 compatible storage:
|
|
On S3 with AWS:
|
|
fs.s3a.aws.credentials.provider was the authentication method (credentials provider) that we were missing, it’s not found in the Hadoop plugin docs[2] but it’s found in AWSJavaSDK docs[3][4]. AWS mounts secrets inside Flink pods so using this provider should make it work without further configuration.
Note that flink-s3-fs-hadoop-1.13.2.jar needs to be adapted to your Flink version. $cluster_name should also be substituted with your cluster/deployment name.
That’s pretty much it, I’m also attaching the Flink S3 docs[1] to the email. Thanks for reaching out! Hope you’ll figure it out!
Best,
Denis
docs/deployment/filesystems/s3/[1]
hadoop-aws/tools/hadoop-aws/index.html#S3A[2]
amazonaws/auth/AWSCredentialsProvider.html[3]
amazonaws/auth/WebIdentityTokenCredentialsProvider.html[4]
As a side note, if you’re using the Flink Operator to deploy your Flink job you can set environment variables in the pod template file instead of flink-config.yaml.
Thanks for reading!