Source S3 with CSV File `CSV parser got out of sync with chunker`

  • Airbyte version: 0.35.27-alpha
  • OS Version / Instance: Ubuntu VM / GCP n1-standard-2 (2 vCPUs, 7.5 GB memory)
  • Deployment: Docker
  • Source Connector and version: S3 0.1.10
  • Destination Connector and version: BigQuery 0.6.7
  • Severity: Medium
  • Step where error happened: Loading data from source

Hi team! I am trying to run a sync from an S3 source to BigQuery. The data actually seem to be getting delivered successfully to the destination, but I am still seeing an error in the logs, and the app is marking the sync as “Failed” and always retries twice before finally cancelling.The error I am seeing is:

pyarrow.lib.ArrowInvalid: CSV parser got out of sync with chunker

Does anyone have thoughts on what might be going on here & how to troubleshoot? I am running several other similar S3->BigQuery syncs from the same bucket (but with different file paths) without issue - only this one is throwing an error.

Solution for this case was increasing the block size seems to have resolved the error! had to bump from 10K to 1M

Credits to Emily Cogsdill

1 Like

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.