Large Backfill with unreliable API

wjwatkinson · June 13, 2022, 1:40pm

I am trying to backfill a large number of records from a source that consistently hits persistent retryable errors. I have increased the retry backoff and number of retires, but even with 10 retries with an exponential backoff the retryable errors continue.

With the current setup, an incremental sync using the get_updated_state method when I run into these errors it will retry, pulling the same records 3 times, and then commit whatever it had from the last sync. It seems like this will eventually sync all records, but it is not ideal, so I am trying to improve my connector code.

Reading the documentation it looks like using small stream slices could help with this, but it is not clear from the documentation exactly how. Would the stream slices mean that retries on error would not pull the same records over again?

I am also wondering if updating to use the Incremental Mixin would help. This was not available when I initially created the source.

marcosmarxm · June 13, 2022, 10:31pm

Did you try to use checkpoint state? https://docs.airbyte.com/connector-development/cdk-python/incremental-stream#checkpointing-state

wjwatkinson · June 14, 2022, 11:54am

Yes, I am checkpointing state every 100 records.

marcosmarxm · June 14, 2022, 7:53pm

You don’t receive any information from the API about the next retry or in the documentation?

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
Clarification on default behavior for incremental append sync Connector Development data-loading , connector-development	8	624	July 14, 2022
Is it possible to continue a sync after failure of a stream? Connector Development	7	419	July 14, 2022
Limit connection sync records returned Connector Questions & Issues connectors	2	518	March 8, 2023
Troubleshooting state_checkpoint_interval Connector Development connectors	3	364	February 20, 2023
Issue with Incremental Sync in Custom Python Connector Connector Questions connector , incremental-sync , bug , python , custom-connector	0	46	May 16, 2024

Large Backfill with unreliable API

Related topics