Initial Loading a large data set

James_Kwon · June 23, 2022, 4:52pm

Hi. I asked this question on Slack and was directed to this forum. I’m looking to load a large data set of 90+Million rows. I’d rather not have to do this in one initial load as it would impact the db server. I was wondering if there was a way to batch this into smaller data sets using the cursor (date) to limit each batch run into a range of dates. I’m told that is not possible but we could use the fetchSize parameter to run in small batches. My question regarding this… if the fetchSize is set and we decide to stop the initial load process after an hour and it still has a lot more rows to process… will rerunning the initial process be smart enough to resume where it left off or will it need to start from scratch?

marcosmarxm · June 23, 2022, 4:58pm

Unfortunately no. It will restart from scratch. Something you can do it create a view and manage the load through the view parameter.
create view as (select * from my_table where start_date >= 2021-01-1 and start_date <= 2021-02-01)

James_Kwon · June 23, 2022, 5:16pm

Would that still work? Would the process have a problem with the missing older data as the date is increased in subsequent loads? Or does the cursor only used to look for data newer than the cursor value?

marcosmarxm · June 23, 2022, 5:22pm

Cursor will only looks for new data after the current value.

James_Kwon · June 23, 2022, 5:51pm

K. Thanks. I will give it a shot.

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
How to sync 300Gb, or set cursor value? Q&A data-loading	4	480	February 6, 2023
Are there alternate methods for initial data loading? Q&A	1	389	July 13, 2022
Skipping Historical Data during Initial Loading in Airbyte OSS Connector Questions mysql , connector , airbyte-oss , bigquery , question	2	16	September 28, 2024
Incremental Loading to Destination Tables Connector Questions performance , connector , question , incremental-loading , destination-tables	15	80	May 16, 2024
Can I configure how many days or a WHERE clause in my database source to filter data? Q&A data-loading	0	514	September 20, 2021

Initial Loading a large data set

Related topics