Is this your first time deploying Airbyte: No
OS Version / Instance: Debian 10 (buster) on AWS EC2, 8 cores, 16GB RAM
Deployment: Docker
Airbyte Version: 0.39,32-alpha
Source name: My SQL (0.6.7 ) - Incremental
Destination : BigQuery (1.1.11 ) - Deduped + history
Step : Writing data to BigQuery in incremental + deduped history
The DBT models that are used to normalize, historize & merge the data in the destination BigQuery table process 6 times more amounts of data than the destination BigQuery table ; for example if my table is 20GB, the DBT models will process 120GB :
2022-09-01 15:29:45 normalization > 15:29:45 1 of 3 START view model _airbyte_airbyte_mysql.incremental_model_stg................................................ [RUN]
2022-09-01 15:29:46 normalization > 15:29:46 1 of 3 OK created view model _airbyte_airbyte_mysql.incremental_model_stg........................................... [OK in 0.78s]
2022-09-01 15:29:46 normalization > 15:29:46 2 of 3 START incremental model airbyte_mysql.incremental_model_scd.................................................. [RUN]
2022-09-01 15:31:57 normalization > 15:31:57 2 of 3 OK created incremental model airbyte_mysql.incremental_model_scd............................................. [MERGE (24.5m rows, 124.1 GB processed) in 130.71s]
2022-09-01 15:31:57 normalization > 15:31:57 3 of 3 START incremental model airbyte_mysql.incremental_model...................................................... [RUN]
2022-09-01 15:32:06 normalization > 15:32:06 3 of 3 OK created incremental model airbyte_mysql.incremental_model................................................. [MERGE (1.0k rows, 530.1 MB processed) in 8.99s]
which is not acceptable cost-wise as BigQuery charges by GB processed.
Is there a way to fix this issue easily and using incremental sync without inducing so much costs ? For example by taking advantage of partitioned tables ?