Failed to convert json to parquet and save

  • Is this your first time deploying Airbyte: No
  • OS Version / Instance: Ubuntu 20.04, EC2 t3a.xlarge
  • Memory / Disk: 16Gb / 120Gb
  • Deployment: Docker
  • Airbyte Version: 0.35.62-alpha
  • Source name/version: Mongodb-v2/dev(fix this 🐛 Source MongoDB: Added support for Amazon DocumentDB with TLS by awill1 · Pull Request #10995 · airbytehq/airbyte · GitHub)
  • Destination name/version: S3/0.3.0
  • Description: I tried to ingest data from mongoDB database to S3 bucket, using CDC mode. Job was failling with tech.allegro.schema.json2avro.converter.AvroConversionException: Failed to convert JSON to Avro: Could not evaluate union, field win is expected to be one of these: NULL, STRING. If this is a complex type, check if offending field: win adheres to schema.
    The destination format is parquet file with Snappy compression.
    I guess the error occurs while converting json to avro

logs-1.txt (59.4 KB)

@zigbang-yeezy did you read about the Avro conversion in our website? Json to Avro Conversion - Airbyte Documentation

Also, for now it’s possible to use JSON to unblock you?

Is it possible to share the schema/sample data to our team reproduce locally the issue?

I’m sorry I don’t understand this part you said.
And, Exactly this problem occurs in two collections of mongoDB.
The first sample data are as follows.

{
    "_id" : ObjectId("objectid..."),
    "win" : {
        "version" : "string",
        "updatedAt" : ISODate("2009-12-12T01:28:44.545Z")
    },
    "osx" : {
        "version" : "string",
        "updatedAt" : ISODate("2009-12-12T01:28:44.545Z")
    },
    "ios" : {
        "version" : "string",
        "updatedAt" : ISODate("2009-12-12T01:28:44.545Z")
    },
    "android" : {
        "version" : "string",
        "updatedAt" : ISODate("2009-12-12T01:28:44.545Z")
    }
}

So at first, I thought it would be a problem caused by the date type.
However, the date type did not exist in the collection where the second problem occurred.
Here is the second sample data.

{
    "_id" : ObjectId("---"),
    "createdAt" : double, // 1258314237257.0
    "abled" : boolean,
    "Name" : "string",
    "memo" : "string",
    "platform" : "string",
    "size" : int32,
    "version" : "string"
}

I think there is a problem with both double type and date type.

I checked now and found that other collections also have double types, but they are synchronized well. The part where the problem arises is not consistent.
It is still unknown where the problem occurs.

What is the data type of the datetime type in Mongo?

My question is: can you use JSON format for now, its working right?

@marcosmarxm
I’m sorry. I answered late
And…Yes, it is possible with the Json format, so to avoid this error, it is currently stored in the Json format and processed and used.

I created the issue #12888 to solve the issue in the future. I’ll let you know when it’s solved.

1 Like

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.