Summary
API endpoint for parent stream entities is slow when running requests one by one. User is looking for a way to parallelize the requests for faster execution.
Question
Hey! I’m sending requests to an API endpoint, which has a parent stream (roughly: the parent stream has a list of entities, and this endpoint returns the entity properties). The query takes ages being run 1-by-1, and been wondering wherher there’s a way to parallelise the requests? All of these requests would be independent from each other, so should be quite possible to run them side-by-side.
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["parallelize-api-requests", "parent-stream", "independent-requests", "parallel-processing"]
Actually, this might be just the MAX_SYNC_WORKERS
setting (as per the https://docs.airbyte.com/understanding-airbyte/jobs#worker-parallization|docs) and it wasn’t 1-by-1 either, it just felt like that due to the individual query speeds. Thus it appears Airbyte config issue, rather than connector development issue.
Nope, those I think are a platform setting, not related to connector threads.
Are you doing that in Connector Builder?
We already have a Concurrent CDK that provides an api to make a connector run syncs concurrently per partition in Python CDK, but we have not yet moved the low-code CDK to use it out of the box ;-(
We could tinker with it, I’m not sure how much work it would be to put it together.
Also thanks for hanging out with us here
I’ll get your issues sorted soon!
Hey <@U069EMNRPA4>, yeah, I was using the Connector Builder. It looks like a roadmap thing for the team, so definitely no hurry from my side (I’ve worked around this for the time being)
I’m quite literally pushing to start working on it next week. A bit of pushback because we have a few competing items.
I do have updates for all other threads we have going though!