Issues with CSV ingestion from S3 to S3 connection


Facing slow syncs and datatype modification issues when ingesting CSV files from S3 to S3 connection


Hello, I’m currently facing some issues when ingesting csv files with a S3 to S3 connection.

  1. Very slow syncs, ingesting multiple files can sometimes take an extremely long time (hours for a ~100MB)
  2. Modification of datatypes, when I attempt to use the transferred files after, for example using spark to read the file it returns merge errors suggesting columns contains different types.
    Interested to hear if anyone else has run into this issue with S3 source connections.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["csv-ingestion", "s3-connector", "slow-syncs", "datatype-modification", "spark", "merge-errors"]