Error creating connection between Mongo and Redshift

Summary

The user is encountering an error while trying to create a connection between Mongo and Redshift. The error message indicates a memory limit issue during aggregation in Mongo. The user is unsure if the configuration is needed in the Mongo instance or within Airbyte.


Question

Hey <@U0697SLH4TS> I’m trying to create a connection between Mongo and Redshift but am getting an error.
I haven’t set any configs within Airbyte except getting the successful response from the Source and Destination step and then selecting the source and destination from the setup steps. Does this seem like a config I need on my mongo instance or something within Airbyte?

Thanks in advance!

Internal message: com.mongodb.MongoCommandException: Command failed with error 292 (QueryExceededMemoryLimitNoDiskUseAllowed): 'PlanExecutor error during aggregation :: caused by :: Sort exceeded memory limit of 33554432 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.' on server {REMOVED}. The full response is {"ok": 0.0, "errmsg": "PlanExecutor error during aggregation :: caused by :: Sort exceeded memory limit of 33554432 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.", "code": 292, "codeName": "QueryExceededMemoryLimitNoDiskUseAllowed", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1724862367, "i": 4}}, "signature": {"hash": {"$binary": {"base64": "{REMOVED}", "subType": "00"}}, "keyId": {REMOVED}}}, "operationTime": {"$timestamp": {"t": 1724862367, "i": 4}}}



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["mongo", "redshift", "connection-error", "memory-limit", "config"]

Hey <@U07KAGFLJBT> Are you using a free version of MongoDB? I know that this can happen when an aggregation operation in MongoDB where the sort operation exceeds the memory limit of 32mb… specifically for in-memory sorting and external sorting when using the disk is not enabled.

I don’t believe setting the allowDiskUser option to true is available in MongoDB on the free tier. Below are the steps that should resolve this though:

Steps to Resolve the Issue:

Modify MongoDB Aggregation Query:
The key part of the error suggests adding allowDiskUse: true to your aggregation queries. This allows MongoDB to use disk space for sorting when the memory limit is exceeded.

To apply this, you would need to adjust the MongoDB queries within your Mongo instance or configure Airbyte to use queries with this option if possible.

Configuration in Airbyte:
Check MongoDB Source Settings: Airbyte allows you to set custom queries in some cases. You might be able to specify the allowDiskUse: true option in the MongoDB source configuration, depending on how your source connector is set up.
Custom Query Configuration: If Airbyte supports custom queries for MongoDB in your version, you can set this configuration there:

  "allowDiskUse": true
}```
*Advanced Options*: If this option is not available directly within Airbyte’s UI, consider creating a view in MongoDB that includes the allowDiskUse: true configuration and connect Airbyte to that view instead.

*Alternatively try to Optimize the Aggregation*:
Review the aggregation pipeline to minimize data size before the sort operation or consider optimizing the query to avoid hitting the memory limit.

It’s a paid instance of Mongo. Where should I be doing the aggregation?

So far my steps in Airbyte have been:

  1. Sources -> New Source -> Mongo
  2. Enter these connection fields (screenshot)
  3. “Setup Source” button which gave a successful connection response
  4. Destinations - > New Destination -> Redshift
  5. Enter these connection fields
  6. “Setup Destination” button which gave a successful connection response
  7. “Connections” -> “Create your first connection” -> Mongo source -> Redshift destination -> It starts trying to load the connection
    a. “We are fetching the schema of your data source.
    b. This should take less than a minute, but may take a few minutes on slow internet connections or data sources with a large amount of streams.”
  8. Then it gives the error
    I’m not configuring any kind of aggregation at the moment

If the aggregation query is being generated automatically by Airbyte, you might not have direct access to modify it within Airbyte’s interface. However, here are some approaches:

Creating a MongoDB View:

•	You can create a view in MongoDB that already has the allowDiskUse: true configuration and then point Airbyte to this view.
•	Example:
  { $match: { /* match conditions */ } },
  { $sort: { fieldName: 1 } }
], { allowDiskUse: true });```

If you are having difficulty with this and still having the issue it might be best to open up a ticket on github https://github.com/airbytehq/airbyte/issues

Maybe these might be helpful as well:
https://www.mongodb.com/community/forums/t/consume-change-streams-from-secondary-nodes/164101
Check size of OpLog: https://www.mongodb.com/docs/manual/tutorial/troubleshoot-replica-sets/#std-label-replica-set-troubleshooting-check-oplog-size

Thanks! I’ll take a look at these