High costs due to data load into BigQuery

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Debian 11
  • Memory / Disk: 50Gb
  • Deployment: Docker
  • Airbyte Version: * 0.39.1-alpha
  • Source name/version: postgresql 0.4.16, salesforce 1.0.9
  • Destination name/version: BigQuery 1.1.6
  • Step: during sync

I setup a connection to import data from postgresql or salesforce every 5 minutes.
Postgresql needs to sync ~50 tables ; salesforce ~25 tables.
Each table is setup either in incremental, either in dedup+hist mode.

I noticed very high charges from BigQuery. When digging deeper, I noticed “get” queries each time a table has to be updated. My guess is is that Airbyte somehow needs to retrieve the latest _airbyte_emitted_at to retrieve the newest data.

My questions :

  • is my assertion right ? how does the update precisely work ?
  • is there a way to reduce the cost (except making an update every 24h instead of 5 min) ?

Thanks

Hey for incremental ideally we won’t be needing any get as we don’t need to fetch the records. But for dedup we also need the data from the start because there could be duplicate records anywhere in the data.

Is this helpful?

1 Like

It definitly is helpful. Thanks a lot