Custom connector stuck on "Starting a new buffer for stream ..."

logs-33.txt (17.3 KB)

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Linux EC2 instance
  • Memory / Disk: 4Gb / 30Gb
  • Deployment: Docker
  • Airbyte Version: 0.39.28-alpha
  • Source name/version: Custom http api connector
  • Destination name/version: S3
  • Step: During sync
  • Description:

Hi team, when I sync the job it gets stuck on INFO i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):55 - Starting a new buffer for stream works (current state: 0 bytes in 0 buffers) - I tried leaving it for even an hour and nothing is happening.

I built a python connector that grabs data from S3 to use as input for requests to API and write output to S3. I’m grabbing rows from a CSV file in chunks with awswrangler lib. The connector works when I grab 200 rows at a time from CSV, but sometimes the headers end up too large so I had to reduce it to 150 rows (see log
logs-32.txt (125.5 KB)
). With 150 rows, it gets always stuck on starting a new buffer though. That’s the only change I make in the connector between two versions. No clue why changing that would make airbyte get stuck on that step.

I tried:

  • rebuilding the connector
  • retrying the job several times
  • reloading the connector (remove it in database, reupload to ECR, pull a new version from there etc)
  • updating airbyte to newest version
  • scaling up the instance
  • recreating the connector from scratch, just copying over parts of code
  • changing destination to another S3 location
  • running locally with the same inputs as on airbyte instance (works fine)
  • updating the S3 destination connector to newest version
  • writing to local JSON file on EC2 volume (works)
  • deploying fresh version of airbyte on another instance and installing connector there
  • changing the number of rows read from 150 to 160 (grasping at straws :sweat_smile:)

Hey @Arkadiusz_Grzedzinsk welcome to the community and thanks for your post! It looks like you’re building a connector for the crossref api. After reading through some of the api documentation, it seems like crossref has three different pools based on authentication. Are you using the public version of the api or are you using authentication of any kind?

For reference, I’m referring to this article: https://www.crossref.org/documentation/retrieve-metadata/rest-api/tips-for-using-the-crossref-rest-api/

Hi sajarin,

Thank you for looking into this. I use the polite version of the api so with authentication.

In any case, I let the connector run for as long as it needs, and it seems that it takes over 2 hours before airbyte writes a first file to S3, so while it’s ‘stuck’ on starting a new buffer, it actually is running and downloading things but not showing anything in the log. So it’s not really stuck and everything is fine, but the lack of activity in the log is misleading. The reason why it seemed to work with 200 rows, is that it would fail quickly so it would write files to S3 quicker as well, and it wouldn’t stay on the ‘starting the buffer’ stage for a long time.

2022-06-30 12:11:46 destination > 2022-06-30 12:11:46 INFO i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):48 - Starting a new buffer for stream works (current state: 0 bytes in 0 buffers)
2022-06-30 14:37:41 destination > 2022-06-30 14:37:41 INFO i.a.i.d.r.SerializedBufferingStrategy(flushWriter):93 - Flushing buffer of stream works (200 MB)
2022-06-30 14:37:41 destination > 2022-06-30 14:37:41 INFO i.a.i.d.s.S3ConsumerFactory(lambda$flushBufferFunction$3):129 - Flushing buffer for stream works (200 MB) to storage

So the issue is solved as there was no real issue with airbyte :sweat_smile: It would be great if the logs could indicate clearer that something is happening rather than just starting the buffer, because if the log stops there then it seems like the buffer just can’t start and nothing is happening.

Hey @Arkadiusz_Grzedzinsk, I can understand that the issue with the logs is a frustrating one, it’s hard to tell what is going on if the logs are unclear. I will investigate it further and create an issue, if one doesn’t already exist. Nonetheless, I’m happy to hear you got it working. Thanks for working on this connector, we’re looking forward to reviewing and merging your contributions!

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.