S3 Source Bucket Size Check

Summary

Inquiring about the size of S3 buckets for source check and sync timeouts in Airbyte.


Question

Would anyone using an S3 source be able to tell me how big their buckets are? We’re getting source check and sync timeouts on a bucket with 500k objects which doesn’t feel like it should be oversized for Airbyte to be able to ingest from.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["s3-source", "bucket-size", "source-check", "sync-timeouts", "500k-objects", "ingest"]

my biggest bucket is around 80k. I also had trouble with larger buckets due to how they implemented incremental load (at least in an earlier version of the connector). There was no support for object key prefixes with timestamps, so it would need to list the whole bucket to find new objects, which doesn’t scale well.