Configuration check failed for GCS source with Globs pattern issue

Summary

The user is facing an issue with setting up a GCS source due to the Globs pattern not finding any files in the specified bucket. The user is trying to access a compressed .CSV file in a subfolder ‘sales’ with the filename ‘salesreport_202401.zip’. They have attempted to use the pattern ‘sales/salesreport_*.zip’ without success.


Question

Configuration check failed while setting up the GCS source

bucket-pubsite_prod_rev_XXXXXXXXXXXXXXXXXXXX. I can’t get the “Globs” pattern to find any files in the bucket.

This is for compressed .CSV file in GCS bucket in a subfolder sales: salesreport_202401.zip. I tried with sales/salesreport_*.zip, but that doesn’t work

['Traceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 81, in _check_list_files\n file = next(iter(stream.get_files()))\nStopIteration\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 62, in check_availability_and_parsability\n file = self._check_list_files(stream)\n File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 83, in _check_list_files\n raise CheckAvailabilityError(FileBasedSourceError.EMPTY_STREAM, stream=stream.name)\nairbyte_cdk.sources.file_based.exceptions.CheckAvailabilityError: No files were identified in the stream. This may be because there are no files in the specified container, or because your glob patterns did not match any files. Please verify that your source contains files last modified after the start_date and that your glob patterns are not overly strict. Contact Support if you need assistance.\nstream=salesreport\n']```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1706554495650149) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["configuration-check", "gcs-source", "globs-pattern", "compressed-csv", "sales-subfolder", "salesreport-202401.zip"]
</sub>

Based on the information provided in the extracts, it seems like the issue you’re facing is related to the glob pattern not matching any files in your specified GCS bucket. The error message suggests that either there are no files in the specified container, or the glob patterns did not match any files. Unfortunately, the extracts do not provide a direct solution to your problem. However, they do suggest a few things you could check: 1. Verify that your source contains files last modified after the start_date. 2. Ensure that your glob patterns are not overly strict. In your case, you might want to check if the pattern sales/salesreport_*.zip correctly matches the files in your GCS bucket. If you’ve checked these and the issue persists, it might be a good idea to reach out to the Airbyte support for further assistance. You can do this through their forum or Slack channel.