Summary
The user is facing an issue with a pipeline syncing data from a REST API using a custom connector to S3. The job got stuck after successful initial requests and eventually failed with a source timeout issue. Logs show records being read up to a certain point before the failure.
Question
I am currently facing an issue with a pipeline syncing data from rest API built using low code custom connector to s3.
The first few requests were successful but then the job got stuck and showed no logs for a very long time and finally failed with a source timeout issue.
here are the logs for reference:
2024-01-08 22:24:37 destination > 2024-01-08 22:24:37 INFO i.a.c.i.d.s.S3ConsumerFactory(lambda$onStartFunction$1):102 - Preparing storage area in destination completed.
2024-01-08 22:24:38 destination > 2024-01-08 22:24:38 INFO i.a.c.i.d.r.SerializedBufferingStrategy(lambda$getOrCreateBuffer$0):108 - Starting a new buffer for stream events (current state: 0 bytes in 0 buffers)
2024-01-08 22:24:38 destination > 2024-01-08 22:24:38 WARN o.a.h.u.NativeCodeLoader(<clinit>):60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</clinit>
2024-01-08 22:25:02 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 5000 (2 MB)
2024-01-08 22:25:29 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 10000 (5 MB)
2024-01-08 22:25:56 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 15000 (8 MB)
2024-01-08 22:26:24 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 20000 (11 MB)
2024-01-08 22:26:50 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 25000 (14 MB)
2024-01-08 22:27:16 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 30000 (16 MB)
2024-01-08 22:27:43 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 35000 (19 MB)
2024-01-08 22:28:09 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 40000 (22 MB)
2024-01-08 22:28:36 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 45000 (25 MB)
2024-01-08 22:29:02 INFO i.a.w.g.ReplicationWorkerHelper(internalProcessMessageFromSource):246 - Records read: 50000 (28 MB)
2024-01-09 00:51:07 source > Backing off _send(...) for 0.0s (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: <URL>, Response Code: 500, Response Text: {
"error": "internalServerError",
"correlationId": "",
"requestId": "",
"message": "Internal Server Error"
})
2024-01-09 00:51:07 source > Retrying. Sleeping for 10.0 seconds
2024-01-09 00:51:07 destination > 2024-01-09 00:51:07 INFO i.a.c.i.d.b.BufferedStreamConsumer(periodicBufferFlush):258 - Periodic buffer flush started
2024-01-09 00:51:07 destination > 2024-01-09 00:51:07 INFO i.a.c.i.d.r.SerializedBufferingStrategy(flushAllBuffers):132 - Flushing all 1 current buffers (8 MB in total)
2024-01-09 00:51:07 destination > 2024-01-09 00:51:07 INFO i.a.c.i.d.r.SerializedBufferingStrategy(flushAllBuffers):136 - Flushing buffer of stream events (8 MB)
2024-01-09 00:51:07 destination > 2024-01-09 00:51:07 INFO i.a.c.i.d.s.S3ConsumerFactory(lambda$flushBufferFunction$2):119 - Flushing buffer for stream events (8 MB) to ```
```How to debug this? is there any way to figure out what the process is doing while it's in a stuck state? ```
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1704973685563319) if you want to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["pipeline-sync-issue", "rest-api", "custom-connector", "s3", "source-timeout", "logs", "debugging"]
</sub>