Order of streams in a connection

Summary

Clarifying if streams in a connection run in parallel or serially in a specific order for the Confluence source.


Question

Hi are streams in a connection run in parallel or serially in some order? For example for the Confluence source, will my destination receive all confluence pages first and then all confluence spaces or will I receive pages and spaces interleaved?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["streams", "connection", "parallel", "serial", "order", "confluence-source"]

Hmm, I don’t know enough to give you a confident answer yet. However, you can find more information about how Airbyte handles streams in a connection in the Airbyte documentation.

<@U01MMSDJGC9> wasn’t able to find this in the docs. Is this something you could answer or point to the right person to answer?

Today API sources aren’t parallel, there is a large test been done to implement the concurrency feature to the Stripe connector and later change other connectors. What order streams will be ingested (as far I remember) the connector will use the streams output to process each stream and it isn’t applied any order function to that list result.

I see ok so if I depend on a particular ordering of streams - for example I need to ingest “spaces” before “pages” - I would need to setup different connections for each stream and orchestrate using something like Dagster? Is that right?

Why do you need to ingest spaces before pages? If pages depends on spaces you can enable caching to not run the request two times

In my case I rely on having spaces fully ingested in my destination before getting pages ingested because for each page that I ingest, I look up permissions information based on the space it belongs to - to do that I need to have all spaces already in my system. Does that make sense? Not sure caching would be helpful here?

<@U01MMSDJGC9> does that help clarify why I would need ordered streams?