Removing Docker containers from Airbyte Docker Compose file when not using scheduler

Summary

The user is asking which Docker containers can be removed from the Airbyte Docker Compose file when not using Airbyte’s scheduler and instead using Prefect Flow for orchestrating Airbyte and DBT tasks individually with custom event triggers to reduce resource footprint.


Question

What are the some docker containers I can remove from the docker compose file if I’m not using the Airbyte’s scheduler and instead using the Prefect Flow for Orchestrating Airbyte and DBT tasks individually using custom event triggers so that the resource footprint is lower from Airbyte if I’m not going to use airbyte’s scheduler and typing and deduping?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["docker-containers", "remove", "airbyte", "scheduler", "prefect-flow", "dbt-tasks", "custom-event-triggers", "resource-footprint"]

the typing/deduping is handled in the destination/write pods, so nothing to eliminate there.

And while scheduling is handled by temporal, it isn’t the only thing . . . so I don’t think you can remove any of that without causing problems. but there shouldn’t be any real additional overhead from that unused functionality.

You CAN skip typing/deduping by unchecking the Create Final Tables option in the destination config (if you’re going to model off of the raw tables).

A lot of folks prefer to still run it and let that happen as part of the sync, but it’s MUCH faster than the old “Normalization” option (which was an internal dbt run). The plus side here is that you have a nice list of typing errors you can work with as part of your model run.

But ultimately your call to make, there is some overhead, so you can decide what’s best and eliminate the final tables if they don’t make sense for your use case

and on the Scheduler, just set everything to Manual and you can invoke it over the API (that’s what we do from within our SaaS application)

The typing and deduping is causing a real headache for us
as we have MYSQL as a source and Bigquery as Destination

And the mysql source is debezium based
so all BIT columns with values like 1 or 0 get converted to true or false
which is a hassle to make changes in all the reporting views we have where our custom solution(which was not real time but instead a batch method every 10mins) added whatever the data was on the source without changing the mapping

using DBT
I’m able to solve this conversion issue but not with typing and deduping
though i’m experimenting on modifying the destination-bigquery connector to handle this

You could also just trigger a dbt job that normalizes it back to your needs—that functionality still exists, it’s just the built-in dbt-based normalization that was removed.

(I agree some of the type mappings can be a little tricky. I understand the motivations, but I wish there was a way of specifying overrides in the schema details for simple cases like this)