Source S3 with CSV File `CSV parser got out of sync with chunker`

  • Airbyte version: 0.35.27-alpha
  • OS Version / Instance: Ubuntu VM / GCP n1-standard-2 (2 vCPUs, 7.5 GB memory)
  • Deployment: Docker
  • Source Connector and version: S3 0.1.10
  • Destination Connector and version: BigQuery 0.6.7
  • Severity: Medium
  • Step where error happened: Loading data from source

Hi team! I am trying to run a sync from an S3 source to BigQuery. The data actually seem to be getting delivered successfully to the destination, but I am still seeing an error in the logs, and the app is marking the sync as “Failed” and always retries twice before finally cancelling.The error I am seeing is:

pyarrow.lib.ArrowInvalid: CSV parser got out of sync with chunker

Does anyone have thoughts on what might be going on here & how to troubleshoot? I am running several other similar S3->BigQuery syncs from the same bucket (but with different file paths) without issue - only this one is throwing an error.

Solution for this case was increasing the block size seems to have resolved the error! had to bump from 10K to 1M

Credits to Emily Cogsdill

1 Like