Question about incremental syncs for partitioned data

Summary

Inquiry about how partitioned streams with different sync modes interact with the parent stream in Airbyte for data synchronization.


Question

Hello, I have a question about incremental syncs for partitioned data.

If my parent stream has a sync mode of “Incremental | Append + Deduped” and my partitioned streams have a sync mode of “Full refresh | Append” will the partitioned streams only loop through the ones we gather from the parent’s incremental stream?

The parent is getting orders and the partitioned ones are looping through each order, grabbing the id, and getting the products by the orderId



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["incremental-syncs", "partitioned-data", "sync-mode", "parent-stream", "partitioned-streams"]

New message text here

How would I do this in the low-code UI builder?

<@U02T7NVJ6A3> Hi, you had answered my other question, do you know about this one?

Hi Travis, this is currently unsupported. The child stream will always iterate over all records in the parent stream (not just the “new” ones), because it is possible that there are new child records associated with a parent record, but that parent record doesn’t get updated. With the desired behavior you described, this situation would mean that we would not see those new child records.

But I will note down this feedback and add it to our backlog. Would you be interested in this feature despite the issue I noted above? Maybe you should be able to check a box that says something like “I’m aware that if parent records are not updated when there are new child records, those new child records will be missed”?

Yea that would be a nice feature but I think you might be misunderstanding me.

What I am trying to do is have a parent stream that grabs orders on an incremental sync so it will only grab the latest orders when it runs based on updatedDateTime cursor field.

I know the partitioned streams will loop through each order and grab the orderId in order to make a call to /orders/orderId/products

I figured putting “Full refresh | append” on my partitioned streams wouldn’t matter since the parent was on an incremental so it wouldn’t grab any more orders than it had to. This resulted in the order stream doing its job, but the partitioned stream actually doing a full refresh of its parent orders stream first, grabbing orders we already have done, and then looping through them all to hit the partitioned stream endpoint.

What I would like happen is the partitioned streams be solely reliant on what orders the parent stream grabs in its incremental sync.

So for example, if the orders stream grabs 5 new orders since last updatedDateTime, the partitioned streams will only loop through those 5 orders, grab the orderId and use that in the /orders/orderId/products api call

Does that make sense? <@U02T7NVJ6A3> ^

Yes that makes sense, thank you for explaining your use case in detail. I will raise this request with the team

Okay awesome thank you very much!