Why does destination S3 generate one or mutliple files for one postgre table?

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Helm deployment on K8S
  • Memory / Disk: Not limited (auto scaling of K8S nodes)
  • Deployment: Kubernetes
  • Airbyte Version: 0.50.3
  • Source name/version: Postgres 2.0.33
  • Destination name/version: S3 0.4.1
  • Step: The issue is happening during sync
  • Description:

We sync postgre tables into Snowflake, via an external amazon s3 stage (files are then ingested via snowpipe into Snowflake automatically).

We have got a one-one relationship so far, one table corresponding to one S3 file.

Recently, we add a new table stream in an existing connection containing already 9 streams.
This table contains only 3 rows and 4 columns (2 int, 2 datetime).

The issue we have is that the sync of this new table produces 2 files. The two file seems containing partitioned data by one column.

We do not understand why this table is split into this two files, whereas other tables are well sync in only one file.

What is strange is that if we add our new table stream to another postgre connection (with only two streams), it produces one file only, as we expect.

The sync into one file is mandatory for us, as we do a fullrefresh of the table and we want all table data to be ingested in snowflake at the same time (within the same file) to be able to detect deleted records without any delay.

We did not find anything in the connector documentation, and we need to understand how to setup Airbyte to guarantee a one-one relationship (one table must generate one file only).

Does anyone can explain this behaviour or point us t the right documentation?

Best regards