- Source: source-s3/2.1.4
- Destination: destination-postgres:0.3.27
I’m trying to load csvs with embedded json from s3. Here is a simplified example of the data:
messageid,Data Payload
1,"{""user name"":""John""}"
2,"{""user name"":""Paul""}"
I have a manually enforced schema as follows:
{ "messageid": "integer", "Data Payload": "object" }
I have enabled the “double quote” setting, and am using " as the quote character. Here are the rest of the settings, as logged by the connection:
2023-05-05 11:26:40 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):317 - initialised stream with format:
{
'encoding': 'utf8',
'filetype': 'csv',
'delimiter': ',',
'block_size': 10000,
'quote_char': '"',
'double_quote': True,
'infer_datatypes': False,
'newlines_in_values': False,
'additional_reader_options': '{"strings_can_be_null": true, "quoted_strings_can_be_null": true, "null_values": ["NA", "NULL", ""], "auto_dict_encode": true}'
}
Sadly however, the “Data Payload” field ends up in the destination as a json string, rather than a json object:
messageid,Data Payload
1,"{\"user name\":\"John\"}"
2,"{\"user name\":\"Paul\"}"
Not sure what else I can try at this stage. Does anyone have an idea what I’m doing wrong?
Thanks, Mark