Summary
How to make partitioned streams share the same incremental sync logic as the parent stream in Airbyte
Question
Hello, I have a question about incremental syncs for partitioned data.
If my parent stream has a sync mode of “Incremental | Append + Deduped” how would I make my partitioned streams share the same logic so that way it only loops through the data pulled from the parent stream?
For reference, I’m pulling orders in the parent stream, and using the orderId’s in the partitioned streams to get products and shipping information.
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["incremental-syncs", "partitioned-data", "parent-stream", "sync-mode", "partitioned-streams", "logic", "orders", "products", "shipping-information"]
To enable your partitioned streams to share the same logic as your parent stream with “Incremental | Append + Deduped” sync mode, you need to ensure that they share the same cursor field. This is the field that Airbyte uses to identify new or updated records in the source data. In your case, if you’re pulling orders in the parent stream and using the orderId
in the partitioned streams to get products and shipping information, you would set the “Parent Key” to id
and the “Current Parent Key Value Identifier” to order
. This configuration is done in the Parent Stream component for the responses stream. Here’s an example of how you would configure it: - “Parent Key” is set to id
- “Current Parent Key Value Identifier” is set to order
- In the general section of the stream configuration, the “URL Path” is set to /orders/{{ stream_partition.order }}/notes
When triggering a sync, the connector will first fetch all records of the orders stream. The records will look like this: { "id": 123, "currency": "EUR", "shipping_total": "12.23", ... } { "id": 456, "currency": "EUR", "shipping_total": "45.56", ... } { "id": 789, "currency": "EUR", "shipping_total": "78.89", ... }
To turn a record into a partition value, the “parent key” is extracted, resulting in the partition values 123
, 456
and 789
. This results in the following requests to fetch the records of the notes stream: curl -X GET <https://example.com/wp-json/wc/v3/orders/123/notes> curl -X GET <https://example.com/wp-json/wc/v3/orders/456/notes> curl -X GET <https://example.com/wp-json/wc/v3/orders/789/notes>
For the partitioned streams to share the same logic, you need to define your own stream_slices
method on the child stream, reading the parent stream with SyncMode.incremental
[^1^]. Please note that the current behavior of Incremental is not able to handle source schema changes yet, for example, when a column is added, renamed or deleted from an existing table etc. It is recommended to trigger a Full refresh - Overwrite to correctly replicate the data to the destination with the new schema changes [^2^]. [^1^]: (Source: Forum Post: How does sync mode work with sub stream / nested stream?) [^2^]: (Source: Documentation > Using-airbyte > Core-concepts > Sync-modes > Incremental Sync - Append > Known Limitations Part 2)
<@U04SE163WC8> so does that mean if the schema for the parent stream is different than the schema for the partitioned stream, i won’t be able to share the same logic for incremental syncs across both?
<@U04SE163WC8> How would I define the stream_slices on the child stream, reading from the parent stream using the Low Code UI Builder?
Based on the provided extracts, the Low-Code UI Builder in Airbyte is designed to simplify the process of building connectors, but it doesn’t provide explicit instructions on how to define stream_slices
on a child stream while reading from a parent stream. However, the concept of stream_slices
is explained in the documentation. A Stream Slice is a subset of the records in a stream which represent the smallest unit of data which can be re-synced. Once a full slice is read, an AirbyteStateMessage
will be output, causing state to be saved. In the context of the Low-Code UI Builder, you might need to define the logic for generating stream slices in the YAML configuration files. However, the exact steps or code to do this are not provided in the extracts. For more detailed guidance, you might want to refer to the Low-Code Connector Development documentation or seek advice from the Airbyte community on their Slack channel or forum.
<@U04SE163WC8> could you have someone from the community help me?
Absolutely, the Airbyte community is a great resource for getting help with your questions. You can post your questions on the Airbyte Github Discussions or join the Airbyte community Slack to connect with other Airbyte users. If you require personalized support, you can reach out to the Airbyte sales team to inquire about Airbyte Enterprise. For more immediate assistance, Airbyte provides a Daily Office Hour to discuss issues. It’s a 45-minute meeting where the first 20 minutes are reserved for a weekly topic presentation about Airbyte concepts and the remaining 25 minutes are for general questions. Remember to review the Airbyte’s Code of Conduct before interacting in the community.