Airbyte Version: 0.39.1-alpha
Source: Mixpanel (0.1.17)
Destination: Snowflake (0.4.29)
Hi all - we’re having some trouble establishing a successful connection with Mixpanel as a source. I’ve attached the logs from our latest attempt.
logs-387.txt (58.3 KB)
Hey @mattkohl-flex thanks for your post and welcome to the community! Could you provide some more context on how you’ve deployed your Airbyte instance? I took a look at your logs and there wasn’t any specific error, if you can provide some more context on your environment and how many resources you’ve allocated for the instance, that may help us trace the issue.
Matt and I work together so I can answer that.
We recently upgraded, but still seem to be having problems with the Mixpanel connector.
Source MixPanel 0.1.17
Destination Snowflake 0.4.30
Deployment is via docker-compose on an EC2 (r5.2xlarge).
I’ve attached the latest logs from the mixpanel connector and from the airbyte server.
Last night I tried to run the connector. It seemed to run for a while, albeit extremely slow (1MB/s). It ran a couple of times saying there was no data, then ran saying it was transferring data, and I left it overnight to run.
airbyte-server-logs.txt (24.8 KB)
mixpanel-connector-logs-485.txt (134.5 KB)
Then overnight the server was unresponsive, with the UI only showing that there was an issue connecting to the server. It stayed this way until I did a docker-compose restart. Interestingly the mixpanel docker containers stayed there until I killed them with docker kill (id)
I set the max memory usage parameters in the .env config to 28g but from our server monitoring it looks like the ec2 never approached even 25% memory or cpu usage so I don’t think it was a memory issue.
Hey @kyle-mackenzie-indee thanks for your patience and for posting here with the additional context. We’re trying to investigate this and it’s been difficult to trace.
Could you talk more about which streams you are trying to replicate from Mixpanel? Are you running any other connectors besides this Mixpanel one? If so, have you been experiencing performance issues with the other connectors as well?
Finally, could you try running a smaller sync by using a more recent start date value? Let us know if this shorter sync fails or if it succeeds.
Tried with export for 1 day lookback and 1 week start date value and it worked ok. Not too slow (13GB in 5h40m), about the same speed as we’ve seen for other connectors.
Tried again for 30 day lookback and 2 week start date value for export, funnel and revenue streams. Crashed the EC2. I thought it might be because we had the docker memory limit set too high in the .env file, so I removed that, but it crashed again.
Trying again with 7 day lookback, 2 week start date value for expert, funnel and revenue streams. Will update if it seems ok.
What I’m wondering now is how to prevent it from crashing out if it’s just hitting memory limits. Is the issue that the data it’s trying to grab with a 30 day lookback is too big to fit into a single “batch” in the EC2 memory?
We had two MixPanel connections running, one for each region we have it in. Both starting up from scratch here so likely downloading a lot.
logs-2625.txt (141.3 KB)
logs-2626.txt (89.9 KB)
Here’s the metrics in datadog. You can see the first EC2 crash, then the second one almost straight after when I rebooted and tried without the docker memory limit.
Hey @kyle-mackenzie-indee, are you still experiencing this issue with the Mixpanel connector? I believe the issue has to do with some of the rate-limiting policies Mixpanel enforces for their API. There’s a note about this in our docs that I initially missed: https://docs.airbyte.com/understanding-airbyte/basic-normalization/#normalization-metadata-columns, there’s also a note about high ram usage which may be pertinent to your performance issues. Let me know your thoughts when you get a chance!
Hey hey. So we managed to kind of get it working in that it completes runs.
But we’ve found that the incremental doesn’t seem to work. Could very well be an us issue.
We’re only pulling in the Export stream. It seems like we’re grabbing the entire dataset each time even though we have it set to incremental.
I believe incremental is supported for this stream.
But I can see in the raw json that we’re grabbing the same event repeatedly, with multiple duplicates. These rows have the same time field (cursor in replication settings) and the same insert_id, same data, but appear multiplpe times in the airbyte output with different _airbyte_ab_id and _airbyte_emitted_at values. It looks like the same data is being grabbed each run, even though the connection state is getting updated.
Thanks for the follow up. I don’t have an answer for you at the moment that explains the weird behavior with the export stream. There was an update to the some aspects of the Mixpanel connector yesterday (including the export stream) so try upgrading the connector to the latest version, if you haven’t already done so and rerun the sync. Report back if that did anything!
In the meantime, I’m going to try to duplicate this issue with our sandbox account and will report back with my findings. Thanks for being patient!
Looks to be working ok now after the upgrade. Export is working with incremental append, without repeating the full dataset each run. Thanks @sajarin and those work worked on the upgrade!