Single source feeding multiple destinations without reingesting


Exploring the possibility of using a single source to feed data to multiple destinations without the need to reingest for each destination.


hey is it possible to have a single source that feeds multiple destinations?
without having to reingest for every destination

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["single-source", "multiple-destinations", "reingest", "data-integration"]

without having to reingest for every destination
Hiya :wave: What are you trying to avoid by not re-ingesting per destination? Is the issue that the load on pulling the data from your source multiple times is expensive?

yes. I want to avoid ingesting the same source separately for each workflow
it would be great to ingest from the src and run multiple destination connectors

for now, I am using intermediate storage. but ideally, we would like to avoid this since the raw data isnt needed after processing

> avoid ingesting the same source separately
i think i’m asking something slightly more specific. can yo tell me why want to avoid ingesting the same source separately? Here are some example reasons:

  1. it puts a lot of load on my database to extract data from it, so i want to do it as infrequently as possible
  2. my data is ephemeral so i’d like to make sure i only have to pull it once.
  3. etc…

would you be able to clarify at that level of specificity, please?

we have some data sources that are used to hydrate other data sources
we’re currently storing this data in an intermediate DBs
the data is used by multiple downstream workflows which means we have to scale up read replicas
there is the added complexity of TTLs, replication latency, etc

one alternative we have is to write to use kafka as a generic storage platform to let downstream users ingest at their pace
but there is an eng cost

i was curious if there was a way to do this further upstream in airbyte which could simplify this process

awesome! super helpful. thanks for sharing. this definitely not something we support natively now (outside of how you’re suggesting doing it by using an intermediate data store). it’s something we will address, but it’s probably not something we’re going to hit in the first half of the year since there is a workaround.

thanks for letting me know
is there an open proposal or issue i can follow?