Error syncing MongoDB to Snowflake with Airbyte Cloud

Summary

Airbyte struggles and fails when syncing larger MongoDB collections to Snowflake, showing error ‘Destination process is still alive, cannot retrieve exit value’. User has tried changing Snowflake warehouse size without success.


Question

hi there
i’m trying to sync a mongodb db source to a snowflake db destination with Ayrbyte cloud.
for small mongodb collections, it seems to work fine. but for bigger collections (not that big actually, about 2Go, 1M documents), Airbyte seems to struggle and ends up failing (after 5 tries) with the following error:

Destination process is still alive, cannot retrieve exit value
how can i fix this?
thanks in advance

Notes:
at first I had a Xsmall snowflake warehouse with a little bit of queuing, I’ve changed it for a Small, no more queuing but the issue is still here
it also happens with only one collection selected for sync (one of the big ones, but still not that big imho, 1.9Go, 188k documents)



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["mongodb-connector", "snowflake-connector", "airbyte-cloud", "sync-error", "large-collections"]

hi <@U07FHPT0DDY>, could you share the log of this problematic sync with us?

or your workspace link

<@U0412N5EG7Q> Hi Tyler, could you help make Romain’s account able to sync with more memories? Thanks!

Done! You should be all set Romain, the memory allocation for the connection has been increased.

<@U07FHPT0DDY> please let us know how the syncs work afterwards :slightly_smiling_face: enjoy!

ok, i’m doing that right now

Hi
Thanks for your answer
I’ve shared the log in a zendesk issue, but it contains some sensitive informations so I would rather not share it here
but here’s my workspace link: https://cloud.airbyte.com/workspaces/30c1690d-5265-455c-b46d-c5eda4f6882d
the issue is with the connection df1a9e9f-3616-4f39-ad7e-83d39b666545

hi <@U071HD82KAB>
I had to create another test account since my trial period was almost finished and I couldn’t thoroughly test Airbyte due to this issue
Anyway, here’s the workspace link https://cloud.airbyte.com/workspaces/72d2903a-7116-4a14-bab8-340f4cf1ba37
I’m just trying to sync one collection here (1.9Go, 188k documents), the smaller ones seem to work fine. I’m testing Airbyte to check if it would fit our needs, and so far, except for this issue (which is quite blocking for the moment), the user experience is ok.
Link to the job: https://cloud.airbyte.com/workspaces/72d2903a-7116-4a14-bab8-340f4cf1ba37/connections/db142d75-6cd7-4cea-8bd6-e669d4daffb4/job-history#15828221::2
it failed after 5 attempts. it seems to copy some rows into aibyte internal stream tables, but not that much (about 2.6k rows, 44.2MB after 21min)
for what it’s worth, this job ran with a Medium Snowfalke warehouse, which is quite beefy imho (for most of our usage, xsmall warehouse are fine, we sometimes use small but medium is way oversized)

Thanks Romain, I looked at the log, and see each attempt has the following error from destination:

2024-08-06 14:27:19 destination &gt; Terminating due to java.lang.OutOfMemoryError: Java heap space

It seems OOM happened when writing to the destination. I’m from the source team, adding <@U0397RTD8E4> from the destination side to take another look

<@U07FHPT0DDY> since you are on Airbyte Cloud, please reach out to support - we’ve got a workflow to give your sync more memory as needed

Thanks for the answers both of you <@U071HD82KAB> and <@U0397RTD8E4>
I’ve already opened an issue on your zendesk, still waiting for an answer

could you send a link of your zendesk ticket? I can follow up with our support to give you larger memory

I think it’s this one: 7663

with mailto:roro@axeptio.eu|roro@axeptio.eu as email

Thanks a lot ! I’ve just tried the collection I was testing, and it worked!
I’m trying with a bigger one right now, I’ll let you know how it goes

Well, now I have a new error:
Some streams either received an INCOMPLETE stream status, or did not receive a stream status at all: <http://null.automator.jobs|null.automator.jobs>, null.audit io.airbyte.commons.exceptions.TransientErrorException: Some streams were unsuccessful due to a source error. See logs for details.
here’s the log link: https://cloud.airbyte.com/workspaces/72d2903a-7116-4a14-bab8-340f4cf1ba37/connections/db142d75-6cd7-4cea-8bd6-e669d4daffb4/job-history#15912715::4
I’ve just added a new collection to the schema to sync, and triggered a new sync

Hey Romain, can you please open a new ticket for the new issue you are encountering? Someone from support will assist as soon as possible on the new ticket!