Error when writing data into Ceph S3 with Airbyte OSS on Kubernetes

Summary

Encountering integrity check error when writing data into Ceph S3 using Airbyte OSS on Kubernetes. Error relates to ETag mismatch. Attempted to downgrade S3 destination version but issue persists, especially with Postgres and MySQL sources. Suspected issue with multi-part uploads.


Question

Hi all, I’ve started getting a brand new error when writing data into Ceph S3.

I am running Airbyte OSS 0.53.1 on Kubernetes v1.23.17. Using Helm chart version 0.56.16. S3 destination version 0.3.5.

The error always follows the same pattern, something like:

2024-05-22 13:18:02 destination > alex.mojaki.s3upload.IntegrityCheckException: File upload completed, but integrity check failed. Expected ETag: 6deb6916219dc856ffb2c7e5413094c4-1 but actual is
There is a known error on Github which relates to this: https://github.com/airbytehq/airbyte/issues/36035

I followed a suggestion in the thread to downgrade the S3 destination version to 0.3.5, which I originally thought solved the problem, but I noticed that the job would still fail if the source was Postgres or MySQL and it wouldn’t fail if the source was Google sheets… however, when one goes into the logs for the google sheet, one sees that it actually still shows the error in the logs when the source is google sheets, the job just appears to be successful, i.e. if you look at the UI you get a green tick, but the logs indicate otherwise… the google sheet had a much smaller file size (53.38 KB|211 records extracted|211 records loaded|1m 53s) than the Postrges or MySQL ones. So I suspect it has something to do with multi-part uploads, because when I reduced the block size and synced only one stream which contains only one record for one of my Postgres sources, the job shows as succeeded in the UI, despite showing the Etag error in the logs…

I am really uncertain about the root cause, since things were working before and there are so many combinations of versions I could try out (Airbyte helm chart version, Airbyte Terraform version, Airbyte S3 destination version and even the source versions), perhaps someone has seen something similar before and can point me in the right direction?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["error", "writing-data", "ceph-s3", "airbyte-oss", "kubernetes", "s3-destination", "postgres", "mysql", "google-sheets", "multi-part-uploads"]