Setting up Parquet Source File in Airbyte


Guide on setting up Parquet source file in Airbyte and troubleshooting ‘no such file or directory’ error.


hi everyone

I have a question, how do I setup the parquet source file in Airbyte? because I have followed the steps but still get the error no such file or directory. please guide

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["parquet-source-file", "airbyte", "error", "guide", "troubleshooting"]

What connector are you using?
S3 (
Files (CSV, JSON, Excel, Feather, Parquet) (

Can you provide screenshots of your current configuration?

the connector i use is file (CSV, JSON,Excel, Feather, Parquet)

this is my current configurations

What is your setup? Local machine with (docker compose)? abctl? helm charts?

Local machine with (docker compose)

I noticed that files synchronization between /tmp/airbyte_local on local machine and Docker volume doesn’t work as it used to after those changes

You can copy parquet file to Docker volume like this:

docker cp green_tripdata_2019-01.parquet temp-container:/data/green_tripdata_2019-01.parquet
docker exec temp-container ls /data
docker stop temp-container
docker rm temp-container```
:warning: purposefully I used different path in command above, because volume can be mounted at whatever path

Then you need to set URL: `/local/green_tripdata_2019-01.parquet`
Yes, it has to start with `/local/`. This is also mentioned in connector's docs.
I tested it on macOS 14.5, Airbyte 0.63.4, Docker Desktop 4.31.0

Thank you for your answer. I’ll try first, but I have another question: what if the parquet data I want to retrieve is from a different server? What is the format for writing the URL, and what storage provider should I choose?

I have tried it, and now I get the error : internal error

what do you have in stacktrace and logs?

I can see Could not find image: airbyte/source-file:0.5.3 in stacktrace/logs. Weird.

Can you try docker pull airbyte/source-file:0.5.3 and try again?
What operating system do you use? Some Linux distribution?

I have also identified that. and I have tried running the command and the image is up. Below is the capture

It looks like Docker containers has no access to Docker socket /var/run/docker.sock

Can you check what is the result of ls -l /var/run/docker.sock?

here I found many different tips
I don’t know if any of them will help, but you may check
I’m not able to reproduce it as I’m running macOS

okay, that’s fine. Can you explain the minimum requirements needed for airbyte? and if I want to access parquet from a different server, what is the url writing format, and what storage provider do I choose?

What kind of minimum requirements? Memory/CPU/others?

When it comes to the second question, I don’t know where do you want to run your Airbyte, what infrastructure do you have. For cloud storage you can find connectors for S3/GCS/Azure.

Yes, memory, disk, and CPU. Server on local, not the cloud. which ssh, scp or sftp to use?

disk - roughly calculating 10GB for Docker images (Airbyte components, connectors), as much as needed for synchronized data + some extra to not run out of space
memory - it depends on how many connections and data will be synchronized simultaneously, on my macOS current usage is 2.94GB without any synchronization running, so bare minimum seems to be 4GB, but I’d suggest 8GB or more
cpu - 4 / 8 cores should be fine

regarding ssh, scp or sftp, check what works for you

hello, I want to ask why when I configure MongoDB destinations on Airbyte I always get a format specifier error %s
in comments, people downgraded connector to 0.1.9