Handling Parsing Errors in Incremental Sync with SFTP Connector

Summary

User reports a parsing error during incremental sync with the SFTP connector, leading to duplicate records due to cursor not updating. Suggests improvements for error handling and alerting.


Question

Hello fellow Airbytees…

I have a scenario where I’m loading a file using an incremental sync with the https://docs.airbyte.com/integrations/sources/sftp-bulk connector. It extracts x records but then fails due to a parsing error. It then replicates these records to the destination. In addition, it does not increment the cursor field _ab_source_file_last_modified in this instance. Therefore, the next sync will re-load the same records and cause them to be duplicated in the destination. In my particular case, the parsing error only occurred during the first sync but not the subsequent sync (separate issue but very strange). Furthermore, Airbyte does not generate any alert for a parsing error. There’s no way to detect this error occurring unless you check the logs explicitly. This is a big gap in the monitoring and observability of Airbyte connections.

Given this context what is the best way to manage this issue? In my opinion, it would make sense for Airbyte to stop replication entirely, if any records fail parsing. It should then generate an alert for the parsing error. This would allow the DataOps team to proactively fix the issue at source and then re-sync the stream. In the case where partial records are synced (Airbyte’s current behaviour), how can we ensure that duplicate records are not loaded in the next sync? Airbyte does not update the cursor during syncs with failed parsing. This causes a full re-load during the next sync. Would it make sense for Airbyte to track the row number so that it can use an offset to only load failed records instead? This would prevent duplicates. Giving the current behaviour of Airbyte and lack of alerting. What is the best way to handle this scenario?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

['incremental-sync', 'sftp-connector', 'parsing-error', 'cursor-field', 'duplicate-records', 'alerting', 'monitoring']