Salesforce > BigQuery encoding problem

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: debian 11
  • Memory / Disk: 100Go
  • Deployment: Docker
  • Airbyte Version: 0.39.1-alpha
  • Source name/version: Salesforce 1.0.9
  • Destination name/version: BigQuery 1.1.6
  • Step: The issue is happening during sync
  • Description:
    I have troubles with text encoding.
    First, I made a connection, Salesforce to BQ, and everything worked nicely.
    Unfortunately, I deleted this connection, and I had to replicate it. But now, special characters are a mess.
    Besides, for the same original text (for example : “Vente à venir court terme”), the recorded data in BigQuery can vary (“Vente à venir court terme”, “Vente àvenir court terme”).

@data-evs did you upgrade Airbyte or any connector (Salesforce/bq) after deleting the connection? Can you check the raw tables records to see if the text is correct coming from salesforce?

Hi @marcosmarxm, I don’t think the connector or Airbyte was updated.
I’ve checked the raw tables, and the text is not correct.

Hello, looks Salesforce connector doesn’t a decoding from the record received.
See here https://github.com/airbytehq/airbyte/blob/612ade9238415cc91ff5f99b104ea4856f76227c/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py#L51-L66

Can you check what is the encoding in Salesforce your account is using?

Looking a bit further, the problem affect only one table (Task). In an other table (Event) from the same connection, the encoding is right!

In the log, I found this: “2022-07-15 17:00:15 source > Could not decode chunk. Falling back to ISO-8859-1 encoding. Error: ‘utf-8’ codec can’t decode byte 0xc3 in position 10095: unexpected end of data”

It seems related to this topic : Problem in salesforce chunk decoding

1 Like

This should be corrected in latest version. Can you upgrade the connector to latest and try the sync again?

I did, but the problem is still there. When I reset the data, everything’s corrupted. But, new syncs are alright. It looks like there is one character in the SF data that the connector dislike, and it ruins the whole batch of data in the sync. But for the updates, there’s no problem.

Hi.
I am the author of the following issue.

What is the status of this issue?

There is an issue open in Github to fix it: https://github.com/airbytehq/airbyte/issues/14659 Any updates I’ll return to you.

Hello the version of Salesforce connector 1.0.25 solves the issue, can you try it?