Managing accumulation of data in 'airbyte_internal' tables

Summary

The user is seeking guidance on managing the accumulation of data in ‘airbyte_internal’ tables to avoid escalating storage costs. They are looking for best practices or configurations to ensure old data is removed or managed effectively.


Question

Hello <#C021JANJ6TY|>,

I have a Airbyte connection with Klaviyo as the source and BigQuery as the destination.
I’ve noticed that ‘airbyte_internal’ tables keep accumulating data with each sync and don’t delete old data.

How can I manage or handle the accumulation of data in ‘airbyte_internal’ to avoid escalating storage costs? Are there best practices or configurations to ensure old data is removed or managed effectively?

Thanks in advance for any insights or suggestions!



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte-internal", "data-accumulation", "storage-costs", "best-practices", "data-management"]

The airbyte_internal tables are part of how Typing and Deduping works. They help to support things like schema evolution and soft resets of the data. This data is not deduplicated like the final tables, but rather has a partition for each time a sync adds records containing those records. I wouldn’t recommend removing these, as it may limit your ability to do things like soft resets or schema evolution in the future (although this would be less impactful for a Full Refresh stream than an Incremental one). In theory Airbyte should recreate this data if it doesn’t exist, but you’d lose some ability to audit and may also force a full resync of the data (which is slow on large endpoints).

Also, BigQuery storage is cheap ($0.02/GiB/month USD; first 10GiB/month is free). And because these tables are ingestion-time partitioned, the storage costs will be cut in half (to $0.01/GiB; called “long-term storage” or “LTS”) for each partition after the first 90 days. (So if you had 1TB of storage and 90% of it is LTS, you’d only be paying $11 USD/month, minus the free tier credit)

Great!
Thank you for the insights <@U035912NS77>.

Hello all, where I can find more documentation related to airbyte internal tables?