Json to Avro Converter SchemaParseException exception

I’m developing Java Destination which uses io.airbyte.integrations.destination.s3.avro.JsonToAvroSchemaConverter#getAvroSchema method to convert incoming ConfiguredAirbyteCatalog into AVRO Schmea. But when I try to convert JSON schema of stripe source, I’m getting the following on customers stream:

{"type":"TRACE","trace":{"type":"ERROR","emitted_at":1.655727518862E12,"error":{"message":"Something went wrong in the connector. See the logs for more details.","internal_message":"org.apache.avro.SchemaParseException: Can't redefine: address"`

Basically, my destination iterates over configuredAirbyteCatalog and converts each JSON schema into Avro schema. I’m running it using the following command:

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/source-stripe read --config /secrets/stripe.json --catalog /sample_files/configured_catalog.json | docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/destination-my:dev write --config /secrets/myconfig.json --catalog /sample_files/configured_catalog.json

This happens, because Stripe Customer JSON schema has two identical address objects used in two places. Here’s that object Stripe API reference – The customer object – curl.

How to convert such JSON schema to AVRO schema?

Hi @gargatuma,
Do you face the same problem with another source?
Could you explain a bit more what do you mean by

Customer JSON schema has two identical address objects used in two places

with an example? Do you mean a nested key also has the address name and this is causing the error you shared?
I would suggest trying to implement an upstream logic before the Avro conversion to rename the key if needed. But I’m surprised you’d have to do this because I’m not sure this is something our connector writing Avro are doing.

@gargatuma, the clashing field name problem should have been resolved already. The JSON to Avro schema converter uses the Avro namespace to distinguish different field names.

Here is a test case where a nested field has the same name as a field in its parent. It is similar to Stripe’s address case:

Here is a test case where a nested array field has the same name as a field in its parent:

Could you paste the JSON schema you are using?

I’m using source-stripe configured_catalog.json as input AirbyteConfiguredCatalog. And the conversion of the “customers” stream schema is causing the issue. I don’t know why exactly it’s happening, but it’s due to the “address” object that is defined both at the root level and inside the “shipping” object.

And the place where I’m doing the conversion:

  private Map<String, Schema> createCollections(ConfiguredAirbyteCatalog catalog)  {
    Map<String, Schema> map = new HashMap<>();
    JsonToAvroSchemaConverter jsonToAvroSchemaConverter = new JsonToAvroSchemaConverter();
    
    for (ConfiguredAirbyteStream configuredAirbyteStream: catalog.getStreams()) {
      final AirbyteStream stream = configuredAirbyteStream.getStream();
      final Schema avroSchema = jsonToAvroSchemaConverter.getAvroSchema(stream.getJsonSchema(), stream.getName(),
              "com.example.data." + stream.getNamespace());
      map.put(stream.getName(), avroSchema);
    }
    return map;
  }

In the case of two “address” objects, not only their names but their JSON schema are identical 100%.
While “array_with_same_object_name” test case “type” objects have different schema. Maybe the problem is defining objects with the same name and JSON schema in two places. Because usually, in such cases Avro Schema reference is used.

I have tried changing the JSON schema of one of the “address” objects but the error persisted. That means the problem is only with their names.
When I changed the top-level “address” name to “address1” the schema conversion was completed without any exceptions.

@tuliren any progress on this?

Hey @gargatuma,
Would you feel comfortable opening on PR on the JsonToAvroSchemaConverter class and adding a test case similar to your schema?

@gargatuma, thank you for the information. That’s very helpful. Sorry that I had not looked into the fix yet due to bandwidth. Will send an update later today or tomorrow.

@alafanechere Yes, I can do that.
@tuliren thanks) I’ll be waiting)