Source Mixpanel - Sync failing

Airbyte Version: 0.39.1-alpha
Source: Mixpanel (0.1.17)
Destination: Snowflake (0.4.29)

Hi all - we’re having some trouble establishing a successful connection with Mixpanel as a source. I’ve attached the logs from our latest attempt.

logs-387.txt (58.3 KB)

Thank you :smiley:

Hey @mattkohl-flex thanks for your post and welcome to the community! Could you provide some more context on how you’ve deployed your Airbyte instance? I took a look at your logs and there wasn’t any specific error, if you can provide some more context on your environment and how many resources you’ve allocated for the instance, that may help us trace the issue.

Matt and I work together so I can answer that.
We recently upgraded, but still seem to be having problems with the Mixpanel connector.

Airbyte v0.39.32-alpha
Source MixPanel 0.1.17
Destination Snowflake 0.4.30

Deployment is via docker-compose on an EC2 (r5.2xlarge).

I’ve attached the latest logs from the mixpanel connector and from the airbyte server.
Last night I tried to run the connector. It seemed to run for a while, albeit extremely slow (1MB/s). It ran a couple of times saying there was no data, then ran saying it was transferring data, and I left it overnight to run.
airbyte-server-logs.txt (24.8 KB)
mixpanel-connector-logs-485.txt (134.5 KB)

Then overnight the server was unresponsive, with the UI only showing that there was an issue connecting to the server. It stayed this way until I did a docker-compose restart. Interestingly the mixpanel docker containers stayed there until I killed them with docker kill (id)

I set the max memory usage parameters in the .env config to 28g but from our server monitoring it looks like the ec2 never approached even 25% memory or cpu usage so I don’t think it was a memory issue.

Hey @kyle-mackenzie-indee thanks for your patience and for posting here with the additional context. We’re trying to investigate this and it’s been difficult to trace.

Could you talk more about which streams you are trying to replicate from Mixpanel? Are you running any other connectors besides this Mixpanel one? If so, have you been experiencing performance issues with the other connectors as well?

Finally, could you try running a smaller sync by using a more recent start date value? Let us know if this shorter sync fails or if it succeeds.

1 Like

Tried with export for 1 day lookback and 1 week start date value and it worked ok. Not too slow (13GB in 5h40m), about the same speed as we’ve seen for other connectors.

Tried again for 30 day lookback and 2 week start date value for export, funnel and revenue streams. Crashed the EC2. I thought it might be because we had the docker memory limit set too high in the .env file, so I removed that, but it crashed again.

Trying again with 7 day lookback, 2 week start date value for expert, funnel and revenue streams. Will update if it seems ok.

What I’m wondering now is how to prevent it from crashing out if it’s just hitting memory limits. Is the issue that the data it’s trying to grab with a 30 day lookback is too big to fit into a single “batch” in the EC2 memory?

We had two MixPanel connections running, one for each region we have it in. Both starting up from scratch here so likely downloading a lot.

Logs attached.
Connection 1:
logs-2625.txt (141.3 KB)
Connection 2:
logs-2626.txt (89.9 KB)

Here’s the metrics in datadog. You can see the first EC2 crash, then the second one almost straight after when I rebooted and tried without the docker memory limit.

Hey @kyle-mackenzie-indee, are you still experiencing this issue with the Mixpanel connector? I believe the issue has to do with some of the rate-limiting policies Mixpanel enforces for their API. There’s a note about this in our docs that I initially missed: https://docs.airbyte.com/understanding-airbyte/basic-normalization/#normalization-metadata-columns, there’s also a note about high ram usage which may be pertinent to your performance issues. Let me know your thoughts when you get a chance!