Timeline for Parallel Source Processing?

Jordonkopp · March 21, 2023, 11:46pm

Trying to track this / this to see where we can improve the performance of the tool.

At this time the full load time are extremely slow when we get to source tables with 1+ Billion records.

Are there any timelines for those tickets being tackled/assigned? Even splitting tables in separate sources will not really help us here when 1+ Billion row tables can take upwards of 24+ hours to sync.

We are deploying a AWS hosted K8s cluster.

marcosmarxm · March 22, 2023, 3:03pm

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

sajarin · March 23, 2023, 4:22pm

Hey Jordan,

Thanks for the post. I’ve gone ahead and forwarded this to our product team. You can view our current roadmap here: https://app.harvestr.io/roadmap/view/pQU6gdCyc/launch-week-roadmap

We’re still working on trying to improve performance and hope to implement performance based improvements as best as we can. Wish I had a better answer for you but I hope it helps nonetheless.

Jordonkopp · March 28, 2023, 12:19am

@sajarin ,

Thanks for the reply - appreciate the link to the roadmap.I do see a Postgres performance task slated for Q1 curious is thats specific to the source or if its universal.

Apologies if I missed in a FAQ somewhere, but where would be the best place to look at where this problem is at in the connector to allow for parallel chucks in a given connector. Thinking short/long term if our team could contribute to the project, but may need some guidance on where to start.

Thanks,
Jordon Kopp

Topic		Replies	Views
Postgres Source - Slow initial Load Connector Questions & Issues source-postgres , data-loading	3	645	July 14, 2022
Very slow sync rate while ingesting clickhouse tables Connector Questions & Issues destination-bigquery , normalization , data-loading , connectors , kubernetes	1	385	March 24, 2023
MySQL source connector performance Connector Questions & Issues source-mysql , data-loading	6	3444	July 14, 2022
Source MSSQL - initial load is very slow (CDC run) Connector Questions & Issues source-microsoft-sql-server-mssql , data-loading , connectors	3	1114	July 2, 2022
Syncing Table Performance Issue with Airbyte Debezium Connector Connector Questions logs , connector , postgres-destination , postgres-source , performance-issue	0	79	October 25, 2024

Timeline for Parallel Source Processing?

Related topics