Summary
Guide on setting up Parquet source file in Airbyte and troubleshooting ‘no such file or directory’ error.
Question
hi everyone
I have a question, how do I setup the parquet source file in Airbyte? because I have followed the steps but still get the error no such file or directory. please guide
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["parquet-source-file", "airbyte", "error", "guide", "troubleshooting"]
What connector are you using?
S3 (https://docs.airbyte.com/integrations/sources/s3)?
Files (CSV, JSON, Excel, Feather, Parquet) (https://docs.airbyte.com/integrations/sources/file)?
Can you provide screenshots of your current configuration?
the connector i use is file (CSV, JSON,Excel, Feather, Parquet)
this is my current configurations
What is your setup? Local machine with run-ab-platform.sh
(docker compose)? abctl? helm charts?
Local machine with run-ab-platform.sh
(docker compose)
I noticed that files synchronization between /tmp/airbyte_local on local machine and Docker volume doesn’t work as it used to after those changes https://github.com/airbytehq/airbyte-platform/commit/901401a668738d7cb26c2821425e8a7432c26eb1
You can copy parquet file to Docker volume like this:
docker cp green_tripdata_2019-01.parquet temp-container:/data/green_tripdata_2019-01.parquet
docker exec temp-container ls /data
docker stop temp-container
docker rm temp-container```
:warning: purposefully I used different path in command above, because volume can be mounted at whatever path
Then you need to set URL: `/local/green_tripdata_2019-01.parquet`
Yes, it has to start with `/local/`. This is also mentioned in connector's docs.
I tested it on macOS 14.5, Airbyte 0.63.4, Docker Desktop 4.31.0
Thank you for your answer. I’ll try first, but I have another question: what if the parquet data I want to retrieve is from a different server? What is the format for writing the URL, and what storage provider should I choose?
I have tried it, and now I get the error : internal error
what do you have in stacktrace and logs?
I can see Could not find image: airbyte/source-file:0.5.3
in stacktrace/logs. Weird.
Can you try docker pull airbyte/source-file:0.5.3
and try again?
What operating system do you use? Some Linux distribution?
I have also identified that. and I have tried running the command and the image is up. Below is the capture
It looks like Docker containers has no access to Docker socket /var/run/docker.sock
Can you check what is the result of ls -l /var/run/docker.sock
?
here I found many different tips https://stackoverflow.com/questions/51342810/how-to-fix-dial-unix-var-run-docker-sock-connect-permission-denied-when-gro
I don’t know if any of them will help, but you may check
I’m not able to reproduce it as I’m running macOS
okay, that’s fine. Can you explain the minimum requirements needed for airbyte? and if I want to access parquet from a different server, what is the url writing format, and what storage provider do I choose?
What kind of minimum requirements? Memory/CPU/others?
When it comes to the second question, I don’t know where do you want to run your Airbyte, what infrastructure do you have. For cloud storage you can find connectors for S3/GCS/Azure.
Yes, memory, disk, and CPU. Server on local, not the cloud. which ssh, scp or sftp to use?
disk - roughly calculating 10GB for Docker images (Airbyte components, connectors), as much as needed for synchronized data + some extra to not run out of space
memory - it depends on how many connections and data will be synchronized simultaneously, on my macOS current usage is 2.94GB without any synchronization running, so bare minimum seems to be 4GB, but I’d suggest 8GB or more
cpu - 4 / 8 cores should be fine
regarding ssh, scp or sftp, check what works for you
hello, I want to ask why when I configure MongoDB destinations on Airbyte I always get a format specifier error %s