Moving large data from PostgreSQL Aurora to BigQuery on Kubernetes

slack-user-airbyte · May 14, 2024, 6:22pm

Summary

The user is facing challenges moving around 17GB of data from PostgreSQL Aurora in AWS to BigQuery using Airbyte on Kubernetes. They are seeking advice on how to move more than 20GB of data from a PostgreSQL table.

Question

Hello team, I have a problem moving around 17GB from a PostgreSQL Aurora in AWS to BigQuery. My Airbyte deployment is on Kubernetes.
I’ve seen in the documentation that up to 500GB of information can be moved. Has anyone faced this problem, or how have you solved moving more than 20GB from a PostgreSQL table?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

_{["postgresql-aurora", "bigquery", "airbyte", "kubernetes", "data-migration"]}

slack-user-airbyte · May 14, 2024, 7:09pm

I am using PostgreSQL Aurora engine in AWS as a source, and our destination is Google BigQuery. I have checked all the configurations and they seem good.

slack-user-airbyte · May 14, 2024, 7:12pm

We gained a few insights from our environment. We resolved this issue by incrementally increasing the limit that our Kubernetes environment has, but we noticed another problem. This could lead to issues with the sizing of virtual machines, such as increased costs. So, instead of just cutting or dropping the limits in every pod, we tried to create another strategy. This involves extracting the information in one shot, and the latest jobs extract a smaller portion of data.

slack-user-airbyte · May 16, 2024, 1:22pm

What’s the error when trying to move this data?

slack-user-airbyte · May 16, 2024, 1:22pm

I have this log but I didnt find the really error

slack-user-airbyte · May 16, 2024, 1:22pm

And the log isn’t really helpful in identifying what the error is.

slack-user-airbyte · May 16, 2024, 1:22pm

I can add more logs:

  "failureOrigin" : "source",
  "internalMessage" : "Source process exited with non-zero exit code 3",
  "externalMessage" : "Something went wrong within the source connector",
  "metadata" : {
    "attemptNumber" : 4,
    "jobId" : 29984,
    "connector_command" : "read"
  },
  "stacktrace" : "io.airbyte.workers.internal.exception.SourceException: Source process exited with non-zero exit code 3\n\tat io.airbyte.workers.general.BufferedReplicationWorker.readFromSource(BufferedReplicationWorker.java:369)\n\tat io.airbyte.workers.general.BufferedReplicationWorker.lambda$runAsyncWithHeartbeatCheck$3(BufferedReplicationWorker.java:243)\n\tat java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n",
  "timestamp" : 1706027760665
} ]```

slack-user-airbyte · May 16, 2024, 1:22pm

thats odd - it looks like in the first log the GCS storage times out. I’m mostly AWS but I’ve seen that happen intermittently on other projects

slack-user-airbyte · May 16, 2024, 1:22pm

exit code 3 points to maybe path not found… does the BigQuery destination have options for the filename pattern? For long running jobs I had an issue with files being overwritten which was resolved by adding the {timestamp} variable to the pattern

slack-user-airbyte · May 16, 2024, 1:22pm

id double check permissions as well on the GCP side, they can be a pain

Topic		Replies	Views
Issue with Data Transfer from PostgreSQL to BigQuery API, Terraform and Other Topics kubernetes , deploy , airbyte , bigquery , data-transfer	0	2	November 28, 2024
Source Postgres: "Sync worker failed. Source cannot be stopped!" Connector Questions & Issues source-postgres , destination-bigquery , data-loading , connectors	2	734	July 19, 2022
Source BigQuery - Temporary file size reached at sync Connector Questions & Issues destination-postgres , data-loading , source-bigquery	1	466	November 9, 2022
Error syncing data to BigQuery from Airbyte on GKE cluster Platform Questions airbyte , bigquery , error , bug , gke	2	28	September 6, 2024
Destination BigQuery: 400 Bad Request Invalid credential Connector Questions & Issues source-postgres , destination-bigquery , data-loading , connectors	9	774	July 14, 2022

Moving large data from PostgreSQL Aurora to BigQuery on Kubernetes

Summary

Question

Related topics