Reducing size of airbyte_internal schema in Redshift

slack-user-airbyte · August 17, 2024, 6:10am

Summary

Inquiring about reducing the size of the airbyte_internal schema in Redshift

Question

Hi Team!

I am currently using airbyte to ingest the data from various relational databases into cloud warehouse(redshift). Earlier today, I was checking the table size and row counts of tables in the cluster and found that “airbyte_internal” schema takes more space compared to others.

Is there anyway that this can be reduced or data can be truncated in this airbyte_internal schema? Kindly advise.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["airbyte-internal-schema", "redshift", "data-ingestion", "table-size", "row-count"]}

slack-user-airbyte · August 19, 2024, 6:18am

Hi hareesh. We create raw tables in airbyte_internal schema for typing and deduping the final table you see. Relevant https://docs.airbyte.com/using-airbyte/core-concepts/typing-deduping|docs. We keep historical data in raw table to enable schema evolution if source schema changes in the future.

slack-user-airbyte · September 22, 2024, 6:17am

<@U05L8MN8H9S> - Thanks for the response. If sync is not in use or source data no longer required , do we still need to have this raw tables ? Can we go ahead and drop these tables associated with inactive connections/syncs?

slack-user-airbyte · October 29, 2024, 6:20am

<@U07A4UAEGR4> Yes you are free to delete them if you are not syncing them anymore. The table names will look like this <target_final_schema>_raw__stream_<stream_name> .

Topic		Replies	Views
Reducing size of raw data table in Airbyte Cloud Platform Questions platform , cdc , airbyte-cloud , data-replication , mysql-connector	3	318	June 28, 2024
Reducing table space for Airbyte internal tables Platform Questions platform , question , reduce-table-space , airbyte-internal-tables , table-size	0	53	May 14, 2024
Airbyte source read batch size for uploading records to S3 (Redshift target) Connector Questions redshift , airbyte , connector , error , question	0	40	May 14, 2024
Source Postgres, Destination Redshift : No table / data found after a successful sync Connector Questions & Issues source-postgres , destination-redshift , getting-started , data-loading	4	403	July 14, 2022
Replicating Large Database and Scaling Airbyte for Data Warehousing Platform Questions platform , airbyte , postgres , question , data-warehouse	3	48	July 19, 2024

Reducing size of airbyte_internal schema in Redshift

Summary

Question

Related topics