Is this your first time deploying Airbyte?: Yes
OS Version / Instance: Ubuntu, Windows or MacOS… are some example remove it with yours
Memory / Disk: you can use something like 4Gb / 1 Tb
Deployment: Kubernetes deployment
Airbyte Version: 0.39.16
Source name/version: 0.4.5
Destination name/version: 0.4.29
Step: The issue is happening during sync, creating the connection or a new source?
I am working on mssql to snowflake CDC jobs. we have big tables (700000653 rows) and cdc logs are available only ~3 days. Source connector process around ~3k rows per seconds hence, it would take more than 3 days to complete initial load hence, we may miss some records in incremental process as transaction logs expires in 3 days. We may need to load bigger than this table in future.
How to improve CDC Initial load speed ?
Is there any workaround ? I would appreciate your help on this issue.
Hey @sivankumar86, thanks for your post. As far as I know, improving source database performance is still something the team is working on: https://github.com/airbytehq/airbyte/issues/12532
Have you considered allocating more resources or increasing job parallelism by increasing the max number of sync workers? How has that impacted the load speed?
@sajarin Thanks for you reply. Good to know that it is in backlog. I have configured just one table per connection hence, it is going to run on single worker (correct me if I am missing) . I have configured 8GB limit for k8s container and I verified the usage and, noticed that it was using ~30% of resource (~2GB) only.
Hey @sivankumar86, I appreciate your patience with this issue. I don’t know if there’s a direct workaround for your specific problem. I think your best bet might be to play around with your kubernetes configuration and try to increase the job parallelism, if you haven’t done so already. I suspect this would increase the speed but this is only my guess. If this doesn’t work, I recommend that you create an issue on Github so that the team can track it better.
You can find more information in another thread: https://discuss.airbyte.io/t/scaling-airbyte-on-k8s-increased-job-parallelism/826
As well as in our documentation: https://docs.airbyte.com/understanding-airbyte/jobs/#worker-parallelization