Data inconsistency between Mixpanel and Bigquery ETLs

Summary

Investigating data inconsistency between Airbyte managed ETL and Mixpanel managed ETL for Mixpanel to Bigquery integration.


Question

Need help with Mixpanel -> Bigquery
We are seeing data inconsistency between airbyte managed ETL and mixpanel managed ETL

  EXTRACT(YEAR FROM a.time) AS year,
  EXTRACT(MONTH FROM a.time) AS month,
  EXTRACT(DAY FROM a.time) AS day,
  EXTRACT(HOUR FROM a.time) AS hour,
  COUNT(*) AS missing_count
FROM
  `app_events.mp_master_event` a
LEFT JOIN
  `airbyte_us.export` b
ON
  a.mp_insert_id = b.insert_id
WHERE
  b.insert_id IS NULL
  AND EXTRACT(YEAR FROM a.time) = 2024
  AND EXTRACT(MONTH FROM a.time) = 3
  AND EXTRACT(DAY FROM a.time) > 20
GROUP BY
  EXTRACT(YEAR FROM a.time),
  EXTRACT(MONTH FROM a.time),
  EXTRACT(DAY FROM a.time),
  EXTRACT(HOUR FROM a.time) 
LIMIT 100;```


<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1711523294285109) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["mixpanel", "bigquery", "data-inconsistency", "etl", "airbyte-managed", "mixpanel-managed"]
</sub>

Without knowing what source (what version) you’re using, what env you’re using (cloud vs OSS), and what filters you’ve configured in your source, it’s pretty impossible to answer your question on why the counts are different.

source-mixpanel is also a community-supported source that might seems to need some cleanup.

A more detailed github issue with sync logs would be useful, a pull request — very welcome. Otherwise, feel free to reach out to our support team if you’re on Cloud.

Raised support ticket, thanks