Trouble with GCS source connector and Globs pattern

Summary

The user is facing issues with the GCS source connector in Airbyte, specifically with the Globs pattern not finding any files in the bucket. The error message indicates that no files were identified in the stream, possibly due to incorrect glob patterns or empty container.


Question

• I am having trouble with GCS (Google Cloud Storage) source connector. I can’t get the “Globs” pattern to find any files in the bucket. Any suggestions? I am using Airbyte Docker open source community version (0.50.43).
This is for CSV files in GCS bucket in a subfolder contacts: contacts/acme__contacts.csv

Error:

['Traceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 77, in _check_list_files\n file = next(iter(stream.get_files()))\nStopIteration\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 58, in check_availability_and_parsability\n file = self._check_list_files(stream)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 79, in _check_list_files\n raise CheckAvailabilityError(FileBasedSourceError.EMPTY_STREAM, stream=stream.name)\nairbyte_cdk.sources.file_based.exceptions.CheckAvailabilityError: No files were identified in the stream. This may be because there are no files in the specified container, or because your glob patterns did not match any files. Please verify that your source contains files last modified after the start_date and that your glob patterns are not overly strict. Contact Support if you need assistance.\nstream=acme_contacts\n']```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1705589933893289) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["gcs-source-connector", "globs-pattern", "csv-files", "error", "airbyte-docker", "configuration-check-failed"]
</sub>

<@U06EFMS2VR6> did you figure this out? I’ve hit the same problem and after a quick peek at the source code it seemed like an unintentional behaviour (not entirely sure though, not super well versed using Airbyte): https://github.com/airbytehq/airbyte/issues/34459

<@U06AMCVK25N> Unfortunately, I was not able to figure out this issue. I tried another approach using the CSV connector. Though, this was a bit limiting and required selecting a specific CSV (and not a Globs CSV pattern in a GCS bucket).

https://github.com/airbytehq/airbyte/issues/34459

Meanwhile took a look at the source code, I’ll try to propose a fix during the weekend. Thanks for the confirmation, though! :pray: