Summary
Inquiring about reducing the size of the airbyte_internal schema in Redshift
Question
Hi Team!
I am currently using airbyte to ingest the data from various relational databases into cloud warehouse(redshift). Earlier today, I was checking the table size and row counts of tables in the cluster and found that “airbyte_internal” schema takes more space compared to others.
Is there anyway that this can be reduced or data can be truncated in this airbyte_internal
schema? Kindly advise.
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["airbyte-internal-schema", "redshift", "data-ingestion", "table-size", "row-count"]
Hi hareesh. We create raw tables
in airbyte_internal
schema for typing and deduping the final table you see. Relevant https://docs.airbyte.com/using-airbyte/core-concepts/typing-deduping|docs. We keep historical data in raw table to enable schema evolution if source schema changes in the future.
<@U05L8MN8H9S> - Thanks for the response. If sync is not in use or source data no longer required , do we still need to have this raw tables
? Can we go ahead and drop these tables associated with inactive connections/syncs?
<@U07A4UAEGR4> Yes you are free to delete them if you are not syncing them anymore. The table names will look like this <target_final_schema>_raw__stream_<stream_name>
.