High costs due to data load into BigQuery

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Debian 11
  • Memory / Disk: 50Gb
  • Deployment: Docker
  • Airbyte Version: * 0.39.1-alpha
  • Source name/version: postgresql 0.4.16, salesforce 1.0.9
  • Destination name/version: BigQuery 1.1.6
  • Step: during sync

I setup a connection to import data from postgresql or salesforce every 5 minutes.
Postgresql needs to sync ~50 tables ; salesforce ~25 tables.
Each table is setup either in incremental, either in dedup+hist mode.

I noticed very high charges from BigQuery. When digging deeper, I noticed “get” queries each time a table has to be updated. My guess is is that Airbyte somehow needs to retrieve the latest _airbyte_emitted_at to retrieve the newest data.

My questions :

  • is my assertion right ? how does the update precisely work ?
  • is there a way to reduce the cost (except making an update every 24h instead of 5 min) ?

Thanks

Hey for incremental ideally we won’t be needing any get as we don’t need to fetch the records. But for dedup we also need the data from the start because there could be duplicate records anywhere in the data.

Is this helpful?

1 Like

It definitly is helpful. Thanks a lot

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.