- Is this your first time deploying Airbyte?: No
- OS Version / Instance: Ubuntu
- Memory / Disk: you can use something like 4Gb / 1 Tb
- Deployment: Kubernetes
- Airbyte Version: 0.39.25
- Source name/version: Mailchimp
- Destination name/version: Json destination
- Step: Run the Mailchimp source with the Json destination with more than 700 email events
- Description:
When I run the Mailchimp source to the JSON destination and compare it with what’s returned from the Mailchimp API’s, the output JSON is missing a lot of data. I spot-checked one campaign and found 61 events in the JSON output and the Mailchimp email_activity endpoint returns around 707.
Hello, @murph! Could you please show me the Airbyte/server logs so I can see if you are getting an errors?
We aren’t getting any errors but we were able to debug this a bit. It looks like every time you paginate the email activity endpoint you’re also incrementing the since
param: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub. Via the cursor_field: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub which is the timestamp of the newest record: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub
That means that when we do an incremental sync we lose a lot of records. The records returned from the Mailchimp API are NOT sorted by timestamp, so the timestamp selection is completely arbitrary. I don’t think this is the intended behavior?
Looks like you cannot sort what’s returned from the email activity endpoint so this kind of checkpointing wont work https://mailchimp.com/developer/marketing/api/email-activity-reports/list-email-activity/
Thanks for digging into this - you are right, this is definitely not the intended behavior. I’ve opened an issue on Github, I or another team member will start work on this soon!
Thank you for creating that issue! I just wanted to check in to see when the issue will be prioritized?
@murph sorry for the wait, we have a few team members out this week. I asked one of my colleagues to set aside some time for this issue, so you’ll be hearing something soon!
Hi just checking in on this, has there been any movement?
Hi, Amanda! Thank you for your patience. No movement on this yet but I have a few debugging ideas.
Could you possibly update Airbyte to the latest version and try the sync once more? I have tried to replicate the issue on my end, but from what I can see the connector is working correctly: all records emitted by Mailchimp are being committed to JSON.
Did you use an incremental sync?
I’m not sure why you need to debug more. If you look at the linked code it shows that you’re treating it like the data is sorted but the API is not sorted. You’re also paginating in multiple ways at the same time.