Connection sync failed when saving parquet from mongodb to s3

Is this your first time deploying Airbyte: yes
OS Version / Instance: M1 MAC
Deployment: Docker
Airbyte Version: dev of 0.35.62-alpha
Source: mongodb-v2
Destination: S3
Description: The destination and connector connection is normal. However, when synchronizing with the connection, the attached error occurs and does not work. Maybe it’s a type of json parse error. Give me some any advice?
This is only problematic when you try to save it as a parquet or avro file.
Other output formats(json, CSV …) are normally stored in S3.

logs-16.txt (68.1 KB)

Hello @zigbang-yeezy I saw the source is in dev version, did you change anything for the Mongo DB connector?
The error message:

2022-04-04 07:44:59 e[43mdestinatione[0m > Cannot run program "chmod": error=0, Failed to exec spawn helper: pid: 79, exit value: 1
2022-04-04 07:44:59 e[43mdestinatione[0m > 	at java.lang.ProcessBuilder.start( ~[?:?]
2022-04-04 07:44:59 e[43mdestinatione[0m > 	at java.lang.ProcessBuilder.start( ~[?:?]
2022-04-04 07:44:59 e[43mdestinatione[0m > 	at io.airbyte.integrations.destination.record_buffer.SerializedBufferingStrategy.addRecord( ~[io.airbyte.airbyte-integrations.bases-base-java-0.35.61-alpha.jar:?]

Maybe is a permission issue, do you have any other connection writing to S3 destination? If not, can I ask you to create a PokeAPI to S3 to validate the connection to your destination is working?

Yes, as you said, the source is using the dev version. Because of TLS connection issue in documentDB (🐛 Source MongoDB: Added support for Amazon DocumentDB with TLS by awill1 · Pull Request #10995 · airbytehq/airbyte · GitHub).
I’m just curious. If this is a permission problem, why is it saved in s3 as a csv or json file?
Do you have any specific permissions to create parquet or avro files?

Not completely sure about this but Parquet and Avro use Hadoop to stores in S3 and have compression and organization, maybe for that reason you need to have others permissions to use it.

Well, I’d like to know. How to use it through another permission. Do you have a Bast practice or a document to refer to?

Hi @zigbang-yeezy,
Did you follow the instructions on this issue to run Airbyte on Apple M1? The Cannot run program "chmod" could be related to a lack of permission that might be solved by setting the following environment variable on your worker container: JAVA_OPTS:"-Djdk.lang.Process.launchMechanism=vfork"

I added the environment variable corresponding to the worker in the docker-compose file according to your comment, but the “chmod” problem still occurs.

Hi @alafanechere ,
Once the same process was carried out in ubuntu, the permission problem was solved. This time there was another error.
tech.allegro.schema.json2avro.converter.AvroConversionException: Failed to convert JSON to Avro: Could not evaluate union, field win is expected to be one of these: NULL, STRING. If this is a complex type, check if offending field: win adheres to schema.
It seems to be an error caused by the convert.
I will attach the log as a new topic and upload it again.

Could you please comment on the GitHub issue that you encounter this problem on M1 but not on Ubuntu. We’re focusing this quarter on making sure we have good M1 support so this kind of insight is really important for us.

Thank you for opening a new topic for your new problem :pray: