Missing Mailchimp email activity data

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: you can use something like 4Gb / 1 Tb
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.25
  • Source name/version: Mailchimp
  • Destination name/version: Json destination
  • Step: Run the Mailchimp source with the Json destination with more than 700 email events
  • Description:
    When I run the Mailchimp source to the JSON destination and compare it with what’s returned from the Mailchimp API’s, the output JSON is missing a lot of data. I spot-checked one campaign and found 61 events in the JSON output and the Mailchimp email_activity endpoint returns around 707.

Hello, @murph! Could you please show me the Airbyte/server logs so I can see if you are getting an errors?

We aren’t getting any errors but we were able to debug this a bit. It looks like every time you paginate the email activity endpoint you’re also incrementing the since param: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub. Via the cursor_field: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub which is the timestamp of the newest record: airbyte/streams.py at bfa54aca50115770530ca6fdff24d4125541d23b · airbytehq/airbyte · GitHub

That means that when we do an incremental sync we lose a lot of records. The records returned from the Mailchimp API are NOT sorted by timestamp, so the timestamp selection is completely arbitrary. I don’t think this is the intended behavior?

Looks like you cannot sort what’s returned from the email activity endpoint so this kind of checkpointing wont work https://mailchimp.com/developer/marketing/api/email-activity-reports/list-email-activity/

Thanks for digging into this - you are right, this is definitely not the intended behavior. I’ve opened an issue on Github, I or another team member will start work on this soon!

1 Like

Thank you for creating that issue! I just wanted to check in to see when the issue will be prioritized?

@murph sorry for the wait, we have a few team members out this week. I asked one of my colleagues to set aside some time for this issue, so you’ll be hearing something soon!