Custom connector stuck on "Starting a new buffer for stream ..."

Arkadiusz_Grzedzinsk · June 30, 2022, 9:26am

Is this your first time deploying Airbyte?: No
OS Version / Instance: Linux EC2 instance
Memory / Disk: 4Gb / 30Gb
Deployment: Docker
Airbyte Version: 0.39.28-alpha
Source name/version: Custom http api connector
Destination name/version: S3
Step: During sync
Description:

Hi team, when I sync the job it gets stuck on INFO i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):55 - Starting a new buffer for stream works (current state: 0 bytes in 0 buffers) - I tried leaving it for even an hour and nothing is happening.

I built a python connector that grabs data from S3 to use as input for requests to API and write output to S3. I’m grabbing rows from a CSV file in chunks with awswrangler lib. The connector works when I grab 200 rows at a time from CSV, but sometimes the headers end up too large so I had to reduce it to 150 rows (see log
logs-32.txt (125.5 KB)
). With 150 rows, it gets always stuck on starting a new buffer though. That’s the only change I make in the connector between two versions. No clue why changing that would make airbyte get stuck on that step.

I tried:

rebuilding the connector
retrying the job several times
reloading the connector (remove it in database, reupload to ECR, pull a new version from there etc)
updating airbyte to newest version
scaling up the instance
recreating the connector from scratch, just copying over parts of code
changing destination to another S3 location
running locally with the same inputs as on airbyte instance (works fine)
updating the S3 destination connector to newest version
writing to local JSON file on EC2 volume (works)
deploying fresh version of airbyte on another instance and installing connector there
changing the number of rows read from 150 to 160 (grasping at straws )

sajarin · June 30, 2022, 4:02pm

Hey @Arkadiusz_Grzedzinsk welcome to the community and thanks for your post! It looks like you’re building a connector for the crossref api. After reading through some of the api documentation, it seems like crossref has three different pools based on authentication. Are you using the public version of the api or are you using authentication of any kind?

For reference, I’m referring to this article: https://www.crossref.org/documentation/retrieve-metadata/rest-api/tips-for-using-the-crossref-rest-api/

Arkadiusz_Grzedzinsk · July 1, 2022, 7:57am

Hi sajarin,

Thank you for looking into this. I use the polite version of the api so with authentication.

In any case, I let the connector run for as long as it needs, and it seems that it takes over 2 hours before airbyte writes a first file to S3, so while it’s ‘stuck’ on starting a new buffer, it actually is running and downloading things but not showing anything in the log. So it’s not really stuck and everything is fine, but the lack of activity in the log is misleading. The reason why it seemed to work with 200 rows, is that it would fail quickly so it would write files to S3 quicker as well, and it wouldn’t stay on the ‘starting the buffer’ stage for a long time.

2022-06-30 12:11:46 destination > 2022-06-30 12:11:46 INFO i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):48 - Starting a new buffer for stream works (current state: 0 bytes in 0 buffers)
2022-06-30 14:37:41 destination > 2022-06-30 14:37:41 INFO i.a.i.d.r.SerializedBufferingStrategy(flushWriter):93 - Flushing buffer of stream works (200 MB)
2022-06-30 14:37:41 destination > 2022-06-30 14:37:41 INFO i.a.i.d.s.S3ConsumerFactory(lambda$flushBufferFunction$3):129 - Flushing buffer for stream works (200 MB) to storage

So the issue is solved as there was no real issue with airbyte It would be great if the logs could indicate clearer that something is happening rather than just starting the buffer, because if the log stops there then it seems like the buffer just can’t start and nothing is happening.

sajarin · July 1, 2022, 3:27pm

Hey @Arkadiusz_Grzedzinsk, I can understand that the issue with the logs is a frustrating one, it’s hard to tell what is going on if the logs are unclear. I will investigate it further and create an issue, if one doesn’t already exist. Nonetheless, I’m happy to hear you got it working. Thanks for working on this connector, we’re looking forward to reviewing and merging your contributions!

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
Sync Succeeded: 0 Bytes \| no records \| no records Connector Questions & Issues data-loading , connectors	0	515	May 19, 2023
Data Loading Issue with Custom Connector in Airbyte Cloud Connector Questions data-loading , airbyte-cloud , connector , question , custom-connector	0	5	July 23, 2024
Need advice on memory profiling a failing Airbyte Connector Connector Questions & Issues connectors , connector-development	4	418	July 14, 2022
Syncs getting stuck on Connector Questions & Issues connectors	2	573	August 26, 2022
Custom Connector not showing after successfull import Connector Questions & Issues connectors	5	202	May 31, 2022

Custom connector stuck on "Starting a new buffer for stream ..."

Related topics