Duplicate records with exactly matching _AIRBYTE_START_AT

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu 2.04.4 LTS
  • Memory / Disk: 16Gb / 1 Tb
  • Deployment: Docker
  • Airbyte Version: 0.39.20
  • Source name/version: Salesforce / 1.0.10
  • Destination name/version: Snowflake / 0.4.28
  • Step: Unknown
  • Description:

Whilst developing, we’ve realised that every SCD table contains a duplicate record for a given Airbyte run. Every record loaded in every table for that run is duplicated, with the table’s native fields absolutely identical and the Airbyte fields looking something like this:

_AIRBYTE_UNIQUE_KEY |_AIRBYTE_UNIQUE_KEY_SCD |_AIRBYTE_START_AT |_AIRBYTE_END_AT
123 |456 |2022-04-29 11:26:45.000 +0000 |2022-04-29 11:26:45.000 +0000
123 |789 |2022-04-29 11:26:45.000 +0000 |

What’s more, these datetimes pre-date the server’s existence (we think that the run was on 2022-06-20 and the server’s clock and Airbyte logs appear to give the correct time).

We’ve got no explanation other than some form of Airbyte bug / load failure. We’re obviously concerned that this could present issues and would like to understand what’s gone wrong.

Input welcome.

Thanks

Stuart
logs-49.txt (305.5 KB)

Pretty much as soon as I finished posting I noticed something potentially significant in the _AIRBYTE_EMITTED_AT and _AIRBYTE_NORMALIZED_AT fields.

_AIRBYTE_EMITTED_AT | _AIRBYTE_NORMALIZED_AT
2022-06-13 16:29:34.520 -0700 | 2022-06-20 12:32:13.848 +0000
2022-06-20 12:23:15.774 -0700 | 2022-06-20 12:32:13.848 +0000

It looks a bit like like the data didn’t normalise on an old run and then two records were normalised on a second run.

Still not sure that I understand, but it might give some flavour to the problem.

Stuart

Hey do you have dedup feature enabled over this connection?

Hi. All set to dedup + history.

Hey if there was no normalisation run that would have happened in the previous run and later sync had normalisation then yeah it would have dedup in the later run

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.