Trouble loading parquet files from AWS S3 source

Summary

User is facing issues setting up access to an S3 bucket to load parquet files from AWS. Errors occur during connection testing despite having the correct IAM policy permissions.


Question

Is anyone using the S3 source to load parquet files from AWS? Been trying for a while setup the access to NY Taxis dataset at ‘arn:aws:s3:::nyc-tlc’ and it doesn’t work.

I have created the IAM policy with the ListBucket and GetObject permissions for the above resource.

When I perform the connection testing I always get errors.

If on the bucket I set the ARN then it complains the format is not correct, if I just add the bucket name it mentions it cannot list files…

I am lost trying to understand what is exactly failing.
Any assistance is appreciated :slight_smile:



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["s3-source", "parquet-files", "aws", "iam-policy", "connection-testing", "error"]

Yes of course. I set the Key ID and Secret.

The policy was created the same way its explained on the guide for the connector and I assigned the policy to the user where I got the key and secret from.

Can you try something like this?

    "s3:List*",
    "s3:Get*"
]```
Still read-only, but with some extra actions

Are you able to list files using those credentials with AWS CLI aws s3 ls <s3://nyc-tlc>?

I dont have aws utility installed in my laptop unfortunately because of lack of permissions.

The policy update still havent worked.

This is my configuration:

It looks like you cannot make screenshots as well :wink:

One more thing came to my mind, maybe there are some Service control policies configured in your organization
https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html

Maybe some access between different regions is not allowed.

No the AWS account is my personal one. The bucket is supposed to be public and is on the marketplace.

PS: Indeed, I use slack on mobile only since its blocked on organization network.

I think this bucket is not available

I checked another public bucket and it works just fine

Uhm but its on the AWS marketplace. That is weird.

https://github.com/awslabs/open-data-registry/issues/1418
https://github.com/awslabs/open-data-registry/issues/2280
maybe it’s better to find another bucket with parquet files

Yeah something weird seems to be happening. Thank you, will research.