Data mismatch between Klaviyo and Airbyte Shopify connector for user profiles

Summary

Investigating the discrepancy in user profile data pulled by Klaviyo and Airbyte Shopify connector


Question

hi Team, we are using Shopify connector to pull data from their tables. For one of the clients, the Klaviyo shows that it has 5.1 M user profiles whereas when we pull the customer table using Airbyte Shopify connector, it is roughly around 1.2 M only. Any idea what could be the reason between this large mismatch between data pulled by Klaviyo and Airbyte connector?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["data-mismatch", "klaviyo", "airbyte-shopify-connector", "user-profiles"]

Hmm, I don’t know enough to give you a confident answer yet. However, it’s worth noting that the Shopify connector in Airbyte is subject to Shopify’s rate limit restrictions. If the connector hits a 429 - Rate Limit Exceeded HTTP Error, it will continue the sync operation after a short backoff period. This could potentially affect the amount of data pulled during a single operation. Also, for the smoothest experience with Incremental Refresh sync mode, it’s recommended to sync certain streams together. For example, the Order Refunds, Order Risks, Transactions should be synced along with Orders stream. If a child stream is synced independently of its parent stream, a full sync will occur, followed by a filtering out of records. This process may be less efficient compared to syncing child streams alongside their respective parent streams. If you’re using OAuth2.0 authentication and encounter access errors, make sure you’ve followed this Shopify Article to request the access to the client’s store first. Once the access is granted, you should be able to proceed with OAuth2.0 authentication. For more detailed troubleshooting, you can check out common issues for the Shopify source connector on the Airbyte Forum. Source

<@U04SE163WC8> - the pipeline runs successfully. It syncs data completely and in success stage. The data pulled is less as compared to klaviyo data.

Based on the information provided in the extracts, it seems like you’re experiencing an issue where Airbyte is not pulling the complete data from Klaviyo. This could be due to a variety of reasons such as incorrect configuration, issues with the source connector, or problems with the sync process. In one of the forum posts, a user reported a similar issue where Airbyte was able to ingest data from Klaviyo but it was doing a full refresh at every sync for all streams despite the configurations. This resulted in a large amount of data (8GB+) being synced each time. In another Github issue, there were logs related to the syncing process with Klaviyo and BigQuery. However, it’s not clear if there were any issues or errors that could have led to incomplete data being pulled. Without more specific information or error logs, it’s hard to pinpoint the exact cause of your issue. I would recommend checking your configurations and ensuring that the source connector for Klaviyo is set up correctly. If the issue persists, you might want to consider reaching out to the Airbyte community on their forum or Github for further assistance.

<@U04G1FF2X29> Assuming Shopify is integrated to Klaviyo, all Shopify customers should be in Klaviyo (approximately, I’ve seen cases where low numbers of profiles do not make it in for whatever reason), but it isn’t true that all Klaviyo profiles should be present in Shopify. Lots of reasons here. Imported mailing lists, newsletters, server-side tracking services, etc.

Thanks <@U03LQ4PR082>.

However <@U03LQ4PR082>, the difference is huge. Has anyone else has reported this issue for Shopify connector?

I don’t think this is an issue. If you want to check, try querying the Shopify customers endpoint yourself.