Replicating 500 GB PostgreSQL to BigQuery with Fresh Data and Cost Concerns


The user is looking to replicate a 500 GB PostgreSQL database to BigQuery with relatively fresh data every 15 minutes. They are concerned about the cost and are seeking information on how the staging table is merged into the destination table and what precautions are taken to minimize costs and ensure proper partition pruning.



I tried the ask-ai, but it didn’t have an answer for me :slightly_smiling_face:. I’d like to replicate a 500 GB postgresql to BigQuery. I’d like the data to be relatively fresh (15 minutes), but I’m worried about the cost. If I understand the steps correctly, the postgresql data is first copied to GCS, which is then imported to a staging table using a load job. How is the staging table merged into the destination table? What precautions are used to minimize the cost/make sure proper partition pruning is applied?


This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["replicate", "postgresql", "bigquery", "data-freshness", "cost-concerns", "staging-table", "destination-table", "partition-pruning"]