Source-hubspot contact_list_membership/contacts extraction performance optimization

  • Is this your first time deploying Airbyte?: No
  • Deployment: Kubernetes via helm chart
  • Airbyte Version: Airbyte: 0.39.42-alpha and chart-verson= 0.34.2
  • Source name/version: Hubspot/0.1.80
  • Destination name/version: Redshift/0.3.47
  • Step: The issue is happening during incremental sync
  • Description:
    Hi Team,
    I have a basic question regarding Hubspot connector for Contacts Stream. We have almost ~7M records of contacts stream. I started with Incremental Sync and the data I need to pull from last year Nov.2021. I set the source-hubspot connector start date to 01.11.2021. It has been running since las 14 hours and still running. When I checked the logs i found that the Bookmark is progressing from seconds to seconds.
    Please check this log and especially timestamp at 2022-08-11 05:50:26 the bookmark progress from 14:11:05 for 2022-05-30 at 05:50:26 and running for more than 5hours at 10:36:47 the bookmark still at 2022-05-30T14:36:37.
    Check this detailed logs. If required I can also send the whole log.
2022-08-11 05:50:26 e[44msourcee[0m > Advancing bookmark for contacts stream from 2022-05-30T14:11:05.249000+00:00 to 2022-05-30T14:11:21.301000+00:00
Log4j2Appender says: Advancing bookmark for contacts stream from 2022-05-30T14:11:05.249000+00:00 to 2022-05-30T14:11:21.301000+00:00
2022-08-11 05:50:26 e[44msourcee[0m > Reading contacts associations of contact
Log4j2Appender says: Reading contacts associations of contact
.......
2022-08-11 05:53:30 e[44msourcee[0m > Advancing bookmark for contacts stream from 2022-05-30T14:11:21.301000+00:00 to 2022-05-30T14:11:37.649000+00:00
Log4j2Appender says: Advancing bookmark for contacts stream from 2022-05-30T14:11:21.301000+00:00 to 2022-05-30T14:11:37.649000+00:00
.....
.....
.....
2022-08-11 10:36:47 e[44msourcee[0m > Advancing bookmark for contacts stream from 2022-05-30T14:35:54.604000+00:00 to 2022-05-30T14:36:10.375000+00:00
Log4j2Appender says: Advancing bookmark for contacts stream from 2022-05-30T14:35:54.604000+00:00 to 2022-05-30T14:36:10.375000+00:00
2022-08-11 10:36:47 e[44msourcee[0m > Reading contacts associations of contact
Log4j2Appender says: Reading contacts associations of contact
2022-08-11 10:36:49 e[44msourcee[0m > Reading companies associations of contact
Log4j2Appender says: Reading companies associations of contact
 

I want to understand why this checkpoint progressing so slow? Is this is a bug or actual behavior?
What is the best way to backfill the old and Millions of data from Hubspot?
How I can improve its data extraction in terms of performance?
Should I start with Full Import and then move to Incremental?

For contact_list_membership: As only Full Refresh Sync mode is available and we have Millions of records. This stream is also keep running and running and eventually Try-able API exception trigger and then failed after some time.

Many Thanks

Do you mind sharing the complete log? I found another issue related to Hubspot in Github: https://github.com/airbytehq/airbyte/issues/11252 here user was talking about 32MM records. If I not wrong Hubspot after a number of pagination ask you to refresh the request. The state update is ackward too.