Destination BigQuery - Deduped + history generates too much processing costs

Is this your first time deploying Airbyte: No
OS Version / Instance: Debian 10 (buster) on AWS EC2, 8 cores, 16GB RAM
Deployment: Docker
Airbyte Version: 0.39,32-alpha
Source name: My SQL (0.6.7 ) - Incremental
Destination : BigQuery (1.1.11 ) - Deduped + history

Step : Writing data to BigQuery in incremental + deduped history

The DBT models that are used to normalize, historize & merge the data in the destination BigQuery table process 6 times more amounts of data than the destination BigQuery table ; for example if my table is 20GB, the DBT models will process 120GB :

2022-09-01 15:29:45 normalization > 15:29:45  1 of 3 START view model _airbyte_airbyte_mysql.incremental_model_stg................................................ [RUN]
2022-09-01 15:29:46 normalization > 15:29:46  1 of 3 OK created view model _airbyte_airbyte_mysql.incremental_model_stg........................................... [OK in 0.78s]
2022-09-01 15:29:46 normalization > 15:29:46  2 of 3 START incremental model airbyte_mysql.incremental_model_scd.................................................. [RUN]
2022-09-01 15:31:57 normalization > 15:31:57  2 of 3 OK created incremental model airbyte_mysql.incremental_model_scd............................................. [MERGE (24.5m rows, 124.1 GB processed) in 130.71s]
2022-09-01 15:31:57 normalization > 15:31:57  3 of 3 START incremental model airbyte_mysql.incremental_model...................................................... [RUN]
2022-09-01 15:32:06 normalization > 15:32:06  3 of 3 OK created incremental model airbyte_mysql.incremental_model................................................. [MERGE (1.0k rows, 530.1 MB processed) in 8.99s]

which is not acceptable cost-wise as BigQuery charges by GB processed.

Is there a way to fix this issue easily and using incremental sync without inducing so much costs ? For example by taking advantage of partitioned tables ?

1 Like

Hi @NahidOulmi, as a first step could you please update Airbyte and the BigQuery connector to the latest versions? Did this start happening recently after an upgrade or another change, or has it been this way from the beginning? Iā€™m looking into this, but more details would help!