Is this your first time deploying Airbyte?: No / Yes
OS Version / Instance: MacOS
Memory / Disk: you can use something like 4Gb / 1 Tb
Deployment: Docker
Airbyte Version: * 0.39.20-alpha
Source name/version: S3 0.1.15
Destination name/version: S3 0.3.10
Step: The issue is happening during sync
I was doing a test run to read the Airbyte S3 parquet file using Dremio. So for that, I put a sample .parquest file in my S3 source folder and do a sync to another folder. Then I try to read both the source file and copied file using Dremio from S3.
The source file is readable and the airbyte copied file is not readable. From Dremio logs I got this error:
Unable to coerce from the file’s data type “timestamp” to the column’s data type “int64” in table “2022_07_11_1657578941501_0. parquet”, column “_ab_source_file_last_modified.member0”
I attached the schema it was showing in my connector settings.
So "_ab_source_file_last_modified " should be a string.?
Also in a Dremio file preview feature, I can see that “_ab_source_file_last_modified " is detected as an Object.So what should be the " _ab_source_file_last_modified **” type ?
Why the parquet file created by Airebyte is not readable** Attached are the file and screenshots. Any help is appreciated
This is the schema details I got from pyarrow.parquet
- _airbyte_ab_id: string not null
- _airbyte_emitted_at: timestamp[ms, tz=UTC] not null
- extra: double
- mta_tax: double
- VendorID: int32
- ehail_fee: int32
- trip_type: double
- RatecodeID: double
- tip_amount: double
- fare_amount: double
- DOLocationID: int32
- PULocationID: int32
- payment_type: double
- tolls_amount: double
- total_amount: double
- trip_distance: double
- passenger_count: double
- store_and_fwd_flag: string
- _ab_source_file_url: string
- congestion_surcharge: double
- lpep_pickup_datetime: string
- improvement_surcharge: double
- lpep_dropoff_datetime: string
- _ab_source_file_last_modified: struct<member0: timestamp[us, tz=UTC], member1: string>
- child 0, member0: timestamp[us, tz=UTC]
- child 1, member1: string
- _airbyte_additional_properties: map<string, string (‘_airbyte_additional_properties’)>
- child 0, _airbyte_additional_properties: struct<key: string not null, value: string not null> not null
child 0, key: string not null
child 1, value: string not null
- – schema metadata –
- parquet.avro.schema: ‘{“type”:“record”,“name”:“s3_taxi_data”,“fields”:[{"’ + 1750
- writer.model.name: ‘avro’