Help setting up s3 source

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 8gb / 30 gb
  • Deployment: docker
  • Airbyte Version: 0.35.65-alpha
  • Source name/version: s3
  • Destination name/version: redshift
  • Step: sync
  • Description: Need help figuring out configuration settings for s3 source.

I am trying to setup airbyte for the first time. I have an s3 source setup and a redshift destination setup. I am running into problems during sync. I have tried a bunch of different settings…but can’t figure it out.

The current error is…

pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, got 9

I assume this is because of some config setting. There are no columns per see in these files. They are jsonnl files.

My configs…

encoding: utf-8
Delimiter: ,
Quote Char: "
Advanced Options: {"column_names": ["test_column"] }

The files I am trying to process are kept in this folder structure (in the s3 bucket)…

service-events/dt=2021-10-03/staging-firehose-datalake-stream-5-2021-10-03-16-45-26-b4fc185e-9133-4aaf-b6d4-f0616f0f2812

Here is content of one of these files…

{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"a bit","sourceRoot":"bit","sourcePrefix":"a","targetPhrase":"ein Bisschen","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"job","sourceRoot":"job","targetPhrase":"Arbeit","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"information","sourceRoot":"information","targetPhrase":"Information","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"example","sourceRoot":"example","targetPhrase":"Beispiel","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"greatly","sourceRoot":"greatly","targetPhrase":"sehr","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"example","sourceRoot":"example","targetPhrase":"Beispiel","incrementalViews":0,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"day","sourceRoot":"day","targetPhrase":"Tag","incrementalViews":1,"eventName":"translationViewed"}
{"userId":"615630dc81cd91175ee075fc","sourceLanguage":"en","targetLanguage":"de","sourcePhrase":"functionality","sourceRoot":"functionality","targetPhrase":"Funktionalität","incrementalViews":1,"eventName":"translationViewed"}

Hi @arunsivasankaran,
The root cause of your problem is that your input files are not CSV format but JSONL format. The s3 source connector does not yet support this format.
There was a WIP PR that tried to implement a JSON parser: 🎉 Source S3: add json parser by Tutuchan · Pull Request #9117 · airbytehq/airbyte · GitHub

Feel free to take over if you want to finish this integration!

can I not treat the data like a 1 column csv?

I would not recommend it but you could try to user \n delimiter.

does the delimiter not mean the separator between each column? Or am I misunderstanding the setting?. I want to treat the each line in the file as a single column with a json string being the value

A delimiter delimits columns, if you use , you will have 9 columns as you have 9 , per line.
As your file is not a CSV a hack could be to use \n to consider each line as a column and a record. I’m not sure if this works as \n also delimits lines, hence records. You’ll also have some changes to make to the quoting setting and also probably have to set the shema manually as something like: {"data": "string"}.

In a nutshell, through some hack with reader parameter you might achieve your desired output, but this is not really something I’d recommend.