Is this your first time deploying Airbyte?: No
OS Version / Instance: Ubuntu
Memory / Disk: you can use something like 12Gb / 1 Tb
Deployment: Kubernetes on GCP
Airbyte Version: 0.39.29-alpha
Source name/version: Microsoft SQL Server (MSSQL) - 0.4.8
Destination name/version: Google Cloud Storage (GCS) - 0.2.9
Step: Destination step i guess
Description:
Failed to save CDC data to parquet files in GGS.
Our problem seems to be the same as the following issue that hasn’t been updated for months, mp-pinheiro is a member of our team so any post in that issue by him is relevant to this topic
Would anyone know a solution?
opened 05:30AM - 18 May 22 UTC
type/bug
community
needs-triage
connectors/source/mssql
team/use
autoteam
## Environment
- **Airbyte version**: 0.35.38-alpha
- **OS Version / Instance… **: AWS EC2
- **Deployment**: Docker
- **Source Connector and version**: MSSQL - 0.3.22
- **Destination Connector and version**: Databricks built off commit 5ddef8639a88ca81e570fdd43830c89f1da0c266
- **Severity**: Medium
- **Step where error happened**: Sync job
## Current Behavior
When setting up a sync to use cdc incremental load from MSSQL to Databricks I get a `Failed to convert JSON to Avro` error on a decimal field. However when I run the sync using a full refresh there is no issue. This error occurs on a few table but not all
## Expected Behavior
It should be able to sync the data on either full refresh or incremental modes
## Logs
<details>
<summary>LOG</summary>
```
2022-05-18 04:52:45 destination > tech.allegro.schema.json2avro.converter.AvroConversionException: Failed to convert JSON to Avro: Could not evaluate union, field AmountAhead is expected to be one of these: NULL, DOUBLE. If this is a complex type, check if offending field (path: AmountAhead) adheres to schema: 0.00
2022-05-18 04:52:45 destination > at tech.allegro.schema.json2avro.converter.JsonGenericRecordReader.read(JsonGenericRecordReader.java:129) ~[converter-1.0.1.jar:?]
2022-05-18 04:52:45 destination > at tech.allegro.schema.json2avro.converter.JsonGenericRecordReader.read(JsonGenericRecordReader.java:118) ~[converter-1.0.1.jar:?]
2022-05-18 04:52:45 destination > at tech.allegro.schema.json2avro.converter.JsonAvroConverter.convertToGenericDataRecord(JsonAvroConverter.java:95) ~[converter-1.0.1.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.s3.avro.AvroRecordFactory.getAvroRecord(AvroRecordFactory.java:39) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.s3.parquet.S3ParquetWriter.write(S3ParquetWriter.java:113) ~[io.airbyte.airbyte-integrations.connectors-destination-s3-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.databricks.DatabricksStreamCopier.write(DatabricksStreamCopier.java:109) ~[io.airbyte.airbyte-integrations.connectors-destination-databricks-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.jdbc.copy.CopyConsumerFactory.lambda$recordWriterFunction$0(CopyConsumerFactory.java:104) ~[io.airbyte.airbyte-integrations.connectors-destination-jdbc-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.record_buffer.InMemoryRecordBufferingStrategy.lambda$flushAll$1(InMemoryRecordBufferingStrategy.java:86) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.base.sentry.AirbyteSentry.executeWithTracing(AirbyteSentry.java:54) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.record_buffer.InMemoryRecordBufferingStrategy.flushAll(InMemoryRecordBufferingStrategy.java:82) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.record_buffer.InMemoryRecordBufferingStrategy.addRecord(InMemoryRecordBufferingStrategy.java:65) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.acceptTracked(BufferedStreamConsumer.java:137) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:50) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
2022-05-18 04:52:45 destination > at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:194) ~[io.airbyte.airbyte-integrations.bases-base-java-0.38.4-alpha.jar:?]
```
</details>
## Steps to Reproduce
1.Create a connection for a cdc enabled table with a column of type decimal(28,2)
2.Attempt to sync table using incremental sync mode
Anthony for now the workaround it’s to use JSON format instead Parquet/Avro. I’ll ask the team to investigate further the issue.