S3 csv - trouble parsing object from embedded json string

  • Source: source-s3/2.1.4
  • Destination: destination-postgres:0.3.27

I’m trying to load csvs with embedded json from s3. Here is a simplified example of the data:

messageid,Data Payload
1,"{""user name"":""John""}"
2,"{""user name"":""Paul""}"

I have a manually enforced schema as follows:

{   "messageid": "integer",   "Data Payload": "object" }

I have enabled the “double quote” setting, and am using " as the quote character. Here are the rest of the settings, as logged by the connection:

2023-05-05 11:26:40 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):317 - initialised stream with format: 
{
  'encoding': 'utf8',
  'filetype': 'csv',
  'delimiter': ',',
  'block_size': 10000,
  'quote_char': '"',
  'double_quote': True,
  'infer_datatypes': False,
  'newlines_in_values': False,
  'additional_reader_options': '{"strings_can_be_null": true, "quoted_strings_can_be_null": true, "null_values": ["NA", "NULL", ""], "auto_dict_encode": true}'
}

Sadly however, the “Data Payload” field ends up in the destination as a json string, rather than a json object:

messageid,Data Payload
1,"{\"user name\":\"John\"}"
2,"{\"user name\":\"Paul\"}"

Not sure what else I can try at this stage. Does anyone have an idea what I’m doing wrong?

Thanks, Mark