Summary
Clarifying if streams in a connection run in parallel or serially in a specific order for the Confluence source.
Question
Hi are streams in a connection run in parallel or serially in some order? For example for the Confluence source, will my destination receive all confluence pages first and then all confluence spaces or will I receive pages and spaces interleaved?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["streams", "connection", "parallel", "serial", "order", "confluence-source"]
Hmm, I don’t know enough to give you a confident answer yet. However, you can find more information about how Airbyte handles streams in a connection in the Airbyte documentation.
<@U01MMSDJGC9> wasn’t able to find this in the docs. Is this something you could answer or point to the right person to answer?
Today API sources aren’t parallel, there is a large test been done to implement the concurrency feature to the Stripe connector and later change other connectors. What order streams will be ingested (as far I remember) the connector will use the streams
output to process each stream and it isn’t applied any order function to that list result.
I see ok so if I depend on a particular ordering of streams - for example I need to ingest “spaces” before “pages” - I would need to setup different connections for each stream and orchestrate using something like Dagster? Is that right?
Why do you need to ingest spaces
before pages
? If pages
depends on spaces
you can enable caching to not run the request two times
In my case I rely on having spaces
fully ingested in my destination before getting pages
ingested because for each page that I ingest, I look up permissions information based on the space it belongs to - to do that I need to have all spaces already in my system. Does that make sense? Not sure caching would be helpful here?
<@U01MMSDJGC9> does that help clarify why I would need ordered streams?