- Is this your first time deploying Airbyte?: No
- OS Version / Instance: Kubernetes
- Memory / Disk: unlimited (scalable cluster)
- Deployment: Kubernetes
- Airbyte Version: 0.35.64-alpha
- Source name/version: source-mssql 0.3.22
- Destination name/version: destination-s3 / 0.3.5
- Step: The issue is happening during sync (when the sync succeed after one or multiple attempts)
- Description: we have some connection syncs that sometimes succeed after multiple failed attempts, I am fine with that (sync time is not important to us), However the data is duplicated in the destination (s3 bucket) because failed attempts data is not cleaned by the S3 destination, I am wondering if this is a choice make by the devs who developed the connector, or it is something that will be fixed in future releases of the connector ?
Hey have a couple of questions about this?
- Is there any reason why it gets succeeded after multiple attempts?
- Also you could use custom DBT if you can figure out the unique cursor so that dedup can happen but yeah this is something not out of the box
- we get generic errors like these :
errors: $.flattening: is missing but it is required, $.format_type: does not have a value in the enumeration [CSV] 2022-06-10 02:37:53 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed. errors: $.format_type: does not have a value in the enumeration [JSONL]
we suspect that it is the MSSQL server that times out because this happens only during sync of large tables
- I don’t think the s3 destination supports DBT transformations, even if it does, still not an option because the tables don’t have a field that can be used for dedup .
A solution for the S3 destination would be something like this : keeping track of the files written during an attempt, and then deleting them if the attempt fails.
Got it. Looks like a problem could you create an issue around this so that team can give some suggestions or we can get to a solution