Worker Not Shutting Down After Syncing BigQuery to S3 in Airbyte Local Deployment

Summary

The user is experiencing an issue where the worker does not shut down after syncing 32GB of data from BigQuery to S3 in a local deployment of Airbyte. The worker continues to run indefinitely, even though the workload is completed and the connection is closed.


Question

Hi All, Im trying to replicate Big Query → S3 through a local deployment of Airbytes. The workload syncs 32GB. Then worker doesn’t shutdown. This leads to below pattern repeating forever (over 16 hours and then I cancelled.)

2024-08-19 23:48:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:48:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:49:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:49:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:50:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:50:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:51:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:51:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:52:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 190 MB (189.99992275238037 MB), %% used: 0.05331787752734885 | Queue `ga4_au_site_events_weekly`, num records: 0, num bytes: 0 bytes, allocated bytes: 0 bytes | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0 2024-08-19 23:52:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 1 2024-08-19 23:52:34 destination > INFO main i.a.c.i.d.a.FlushWorkers(close):243 Waiting for flush workers to shut down

I can see the workload is done and connection closed

2024-08-19 12:41:46 source > INFO i.a.c.i.s.r.AbstractDbSource(lambda$read$1):184 Closing database connection pool.
2024-08-19 12:41:46 source > INFO i.a.c.i.s.r.AbstractDbSource(lambda$read$1):186 Closed database connection pool.
2024-08-19 12:41:46 source > INFO i.a.c.i.b.IntegrationRunner(runInternal):231 Completed integration: io.airbyte.integrations.source.bigquery.BigQuerySource
2024-08-19 12:41:46 source > INFO i.a.i.s.b.BigQuerySource(main):221 completed source: class io.airbyte.integrations.source.bigquery.BigQuerySource
2024-08-19 12:41:46 replication-orchestrator > Stream status TRACE received of status: COMPLETE for stream analytics_230314869:ga4_au_site_events_weekly
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.r.SerializedBufferingStrategy(flushSingleBuffer):112 Flushing buffer of stream ga4_au_site_events_weekly (36 MB)
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.s.S3ConsumerFactory(flushBufferFunction$lambda$5):108 Flushing buffer for stream ga4_au_site_events_weekly ({FileUtils.byteCountToDisplaySize(writer.byteCount)}) to storage
2024-08-19 12:41:46 destination > INFO pool-3-thread-1 i.a.c.i.d.s.p.ParquetSerializedBuffer(flush):184 Finished writing data to e410b430-e631-4d4d-8687-19f91c3418ac16355437261490658850.parquet (36 MB)```
But why is the worker not shutting down, and completing the sync?
Airbytes - 0.63.18
BigQuery - 0.4.2
S3 - 1.0.1

Please help. Thanks.

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1724112387016819) if you want 
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["bigquery", "s3", "airbyte", "worker-not-shutting-down", "sync-issue"]
</sub>