Question about parent streams in custom connector builder UI (Airbyte Cloud)

Summary

The user is facing issues with a custom connector in Airbyte Cloud where the stream is only able to take one visit ID at a time, causing memory allocation problems during production runs. They are looking for a way to optimize the stream to handle multiple IDs efficiently.


Question

Hi there, question about parent streams in the custom connector builder UI (Airbyte Cloud):

• I’m building a connector that reads visit records from the Pardot API (https://developer.salesforce.com/docs/marketing/pardot/guide/visits-v4.html#visit-query).
• This endpoint requires you to send a visit ID as part of the request (or a list of visit IDs), then it’ll send back all the info about those visits.
• I want to set it up so that this stream has a parent stream that returns visit IDs as part of its output, then uses those IDs as inputs into this stream
• In theory I have something that works, but it’s only able to take one single visit ID at a time, make an API request with it, then return its data.
• Few problems with this:
◦ This isn’t how this endpoint is designed to work – it’s designed to take in a list of IDs then run all at once. This one-at-a-time method is extremely expensive and inefficient
◦ Although this stream method works during testing, in actual production runs, it’s not uncommon for this visits stream to reach thousands, tens of thousands, sometimes even hundreds of thousands of rows per sync – Airbyte runs out of memory when trying to do this in production, then my jobs always appear to get stuck trying to free memory when negative memory is allocated. Then I have to go in and manually cancel the job, it won’t fail on its own (I can dig up some logs as an example)
◦ When running in production, it seems to me like the entirety of the parent sync runs beginning to end, gets marked as complete/success, then the child stream (i.e. this visit stream) starts running, and then it essentially re-runs the parent stream again in order to get all the partitioned input values? Is that accurate/intended behavior?
• Ultimately, my question is: is there any way I can design a stream to have a parent stream, but rather than running by only doing one API call per parent partition stream value at a time, can I have it pull all values (or, more accurately, all unique new values found during the last incremental sync) for that parent stream column, then input them all into one single API call and run it that way?
◦ If that isn’t possible, do you have any tips/best practices/documentation to point me to in order to try to get the one-at-a-time approach to run more efficiently/stop running out of memory?
Thank you for your help!



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["parent-streams", "custom-connector-builder", "airbyte-cloud", "api-request", "memory-allocation", "production-runs"]

New message text here