Issue with S3 to PostgreSQL Connection

Summary

User reports a failure in syncing data from S3 to PostgreSQL using Airbyte. The source connection processes the files correctly, but the destination remains empty, showing an error related to attempting to execute a ‘CREATE FUNCTION’ in a read-only transaction.


Question

Hi all, I have some trouble with a S3 (6627 files containing 10000 lines each) to postgresql connection.
The source is read as expected and airbyte_internal schema seems correct, with all the entries into public__raw__stream_xxx
However, the destination is empty. I relaunched several times the sync but no results.
I can see this error regarding the destination:

4-12-06 08:55:11 destination > ERROR HikariPool-1 connection adder c.z.h.p.HikariPool(createPoolEntry):483 HikariPool-1 - Error thrown while acquiring connection from data source org.postgresql.util.PSQLException: ERROR: cannot execute CREATE FUNCTION in a read-only transaction

Then I have these traces:
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:55:13 source > Finished listing objects from S3 for prefix=processed/output-part-. Found 6627 objects.
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:55:13 source > Finished listing objects from S3. Found 6627 objects total (6627 unique objects).
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:55:13 source > Marking stream xxxx as RUNNING
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:55:13 replication-orchestrator > Stream status TRACE received of status: RUNNING for stream xxxx
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:56:10 destination > INFO pool-3-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):89 [ASYNC QUEUE INFO] Global: max: 296.96 MB, allocated: 10 MB (10.0 MB), %% used: 0.03367428551701215 | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:56:10 destination > INFO pool-6-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
orchestrator-repl-job-95-attempt-0 main 2024-12-06 08:57:10 destination > INFO pool-3-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):89 [ASYNC QUEUE INFO] Global: max: 296.96 MB, allocated: 10 MB (10.0 MB), %% used: 0.03367428551701215 | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0

Any help will be appreciated



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

['s3-connector', 'postgresql-connector', 'data-sync', 'hikaricp', 'readonly-transaction', 'error']

Which destination connector are you using?

The destination is a postgresql connector.
I think I have several issues in fact:
• the destination database is too small in terms of storage, cpu and memory assigned to the db pods
• the source contains too much files to manage in one shot
I started to split the source, to process small group of files representing 5 millions of lines each.
For now, the results sounds better. Let see…