Source MongoDB - Failed to fetch MongoDB of DigitalOcean Managed DB Schema

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu 20.04
  • Memory / Disk: 4GB / 50GB
  • Deployment: Docker Compose
  • Airbyte Version: 0.39.37-alpha
  • Source name/version: MongoDb 0.1.15
  • Destination name/version: BigQuery 1.1.11
  • Step:
  1. Add new source from MongoDb using DigitalOcean Managed DB through VPS (choosen MongoDB Atlas on instance type as that are the only one that will work)
  2. Select existing source to BigQuery
  3. Failed to fetch schema
  • Description:

I was trying to add a connection from a MongoDb that’s hosted on DigitalOcean Managed DB while the Airbyte is hosted also on their Droplets (VM) at the same region, so both are on the same VPC network. After adding new MongoDb source, and added the destination, however much I tried it always stop on “Failed to fetch schema. Try again later”.

Attached are the logs of airbyte-temporal, and worker.
temporal-logs.txt (71.6 KB)
worker-logs.txt (32.5 KB)

Hi @fauh45 thanks for your post. It seems like this might be related to this open issue here: 🐛 Source MongoDB v2: Failed to fetch schema MondoDB Atlas · Issue #8564 · airbytehq/airbyte · GitHub

Hmm the issue you linked seems to have a different error message on the logs to mine though (?). Also I’ve used the newest MongoDB source version there. And the MongoDB database I connect to is not even reaching 1k of document yet.

Hey @fauh45, are using something nginx or another proxy? Sometimes when you use something akin to it, the default timeouts can interfere with Airbyte since it can take some time to fetch from MongoDB. Other issues have reported the same issue and it was due to something with their configuration.

Umm the airbyte and mongodb are not behind any other proxy as I know. But maybe I’ll check using other instance of mongodb first

Actually found a new error in the logs, I redo the connection and the connection went all fine, then as it was getting the schema, the logs show

airbyte-worker | 2022-07-25 03:53:31 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - com.mongodb.MongoCommandException: Command failed with error 16872 (Location16872): 'Invalid $project :: caused by :: '$' by itself is not a valid FieldPath' on server ******-mongodb-*********.mongo.ondigitalocean.com:27017. The full response is {"ok": 0.0, "errmsg": "Invalid $project :: caused by :: '$' by itself is not a valid FieldPath", "code": 16872, "codeName": "Location16872", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1658721211, "i": 1}}, "signature": {"hash": {"$binary": {"base64": "oLCX8Zp145uTrWSNBNIsluDvtjQ=", "subType": "00"}}, "keyId": 7114990681650626562}}, "operationTime": {"$timestamp": {"t": 1658721211, "i": 1}}}

Hey @fauh45, looks like another user is also reporting the same issue: https://github.com/airbytehq/airbyte/issues/14987. Looks like there is an issue with the parsing of the $ character according to MongoDB docs: https://www.mongodb.com/docs/manual/reference/operator/aggregation/literal/

Ahhh I see, let me try to use their workaround first

Turns out at my current settings the profiling are set to 0 already. I even tried to set the profiling to 1, then do the connection setup, and do the workaround shown in the issues. Still nothing works.

Does the mongodb version could cause this error? But don’t $project already exist way long ago? Maybe just the parsing of the command given by the source connector?

Did you try to drop the collection the collection of profiling from the mongo database you are targeting?

db.system.profile.drop()

If that does not help, it might be a different collection that is causing the issue on your end
Unfortunately the airbyte logs are not helpful in identifying the culprit

There are 2 ways to figure it out:

The hard way:
To figure it out, if you your own custom mongo setup, you can try figure out the log output of mongo ( https://www.mongodb.com/docs/manual/reference/program/mongod/#std-option-mongod.--logpath ) and tail that log stream and filter out for $project call

tail -f /replace/with/your/path/to/mongo.log | grep '$project'

if you are using a SaaS like Atlas, you should download the full file and look into: https://www.mongodb.com/docs/atlas/mongodb-logs/

Ultimately you will be able to see error log and identify the culprit collection then figure out what can be done about it

The easy way:
Connect to mongodb with same credentials you are giving airbyte and run

db.runCommand( { listCollections: 1.0, authorizedCollections: true, filter: {type: 'collection'} } ).cursor.firstBatch

which is exactly the same call airbyte is doing airbytehq/airbyte/blob/436de264cbb9402cfb8d7b6b8d0cd996efc4f659/airbyte-integrations/connectors/source-mongodb-v2/src/main/java/io.airbyte.integrations.source.mongodb/MongoDbSource.java (can’t post link :frowning: )

If you are seeing a bunch of system tables (has “system.” in the name), that points out to your problem, you need to set up a user with lower access rights to the target db, a “read” user or “dbOwner”