Hi team,
- Is this your first time deploying Airbyte?: No
- OS Version / Instance: Ubuntu
- Memory / Disk: 2 vcpus, 8 GiB memory
- Deployment: Docker
- Airbyte Version: Airbyte Version: 0.36.3-alpha
- Source name/version: Source: File (0.2.10)
- Destination name/version: Destination: Snowflake (0.4.24)
- Step: The issue is happening during sync
- Description: :
I experienced an issue loading a file encoded with ISO-8859-1 (seems similar to Source S3: error with encoding reading csv file · Issue #9059 · airbytehq/airbyte · GitHub)
My source is a Csv file on a SFTP serveur
First file row contain caractere é, wich is encoded as 0xE9
I set my csv read option as {“encoding”:“latin-1”}, but it seems encoding options is not used :
But still having issue in the log
2022-05-11 18:52:14 source > Failed to read data of PERMONLY at scp://Fuze_BI_PermOnly.CSV: UnicodeDecodeError**(‘utf-8’,** b’“PERM_INV_NO”,“INVOICE_DATE”,“CLIENT_NO”,“CLIENT_NAME”,“PO”,“CANDIDATE_NAME”,“MANDATE”,“SALARY”,“BILLED_AMT”,“NET_MARGIN_PCT”,“TERRITORY”,“SALES_REP”,“SALES_REP_NAME”,“SALES_REP2”,“SALESREP2_NAME”,“SALES_PAID”,“RECRUITER”,“RECRUITER_NAME”,“RECRUITER_PAID”,“RECRUITER_2”,“RECRUITER_2_NAME”,“NOTES”\r\n"043573",“2019/01/15”,“01022”,“AESP GROUP “,” “,“Vincent Godard “,” Op\xe9rateur De Machine CNC BEAM”,” .00”," 1365.00"," .00",“QR”,"ME ","6443-CHENIER, MARIE-EVE “,” “,” “,” ",“PB”,"Pamela Badran “,” "\r\n’, 416, 417, ‘invalid continuation byte’)
NB1 I am able to read the file correctly with Panda and this csv read option
NB2 When I manually convert my file to utf8, it works fine (but not an option for Production)