Summary
The user is experiencing an issue where the worker does not shut down after syncing 32GB of data from BigQuery to S3 in a local deployment of Airbyte. The worker continues to run indefinitely, even though the workload is completed and the connection is closed.
Question
Hi All, Im trying to replicate Big Query → S3 through a local deployment of Airbytes. The workload syncs 32GB. Then worker doesn’t shutdown. This leads to below pattern repeating forever (over 16 hours and then I cancelled.)
2024-08-19 23:48:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:48:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:49:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:49:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:50:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:50:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:51:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:51:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:52:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:52:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:52:34 destination > INFO main i.a.c.i.d.a.FlushWorkers(close):243 Waiting for flush workers to shut down
I can see the workload is done and connection closed
2024-08-19 12:41:46 source > INFO i.a.c.i.s.r.AbstractDbSource(lambda$read$1):184 Closing database connection pool.
2024-08-19 12:41:46 source > INFO i.a.c.i.s.r.AbstractDbSource(lambda$read$1):186 Closed database connection pool.
2024-08-19 12:41:46 source > INFO i.a.c.i.b.IntegrationRunner(runInternal):231 Completed integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-08-19 12:41:46 source > INFO i.a.i.s.b.BigQuerySource(main):221 completed source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-08-19 12:41:46 replication-orchestrator > Stream status TRACE received of status: COMPLETE for stream analytics_230314869:ga4_au_site_events_weekly
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.r.SerializedBufferingStrategy(flushSingleBuffer):112 Flushing buffer of stream ga4_au_site_events_weekly (36 MB)
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.s.S3ConsumerFactory(flushBufferFunction$lambda$5):108 Flushing buffer for stream ga4_au_site_events_weekly ({FileUtils.byteCountToDisplaySize(writer.byteCount)}) to storage
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.s.p.ParquetSerializedBuffer(flush):184 Finished writing data to e410b430-e631-4d4d-8687-19f91c3418ac16355437261490658850.parquet (36 MB)```
But why is the worker not shutting down, and completing the sync?
Airbytes - 0.63.18
BigQuery - 0.4.2
S3 - 1.0.1
Please help. Thanks.
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1724112387016819) if you want
to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["bigquery", "s3", "airbyte", "worker-not-shutting-down", "sync-issue"]
</sub>