Summary
The connector build fails due to nltk attempting to download the ‘punkt’ resource to a non-existent directory, resulting in a LookupError and subsequent PermissionError.
Question
I am seeing a similar issue with nltk failing to download a resource.
In my case I am trying to build a connector that uses a FileBasedSource. The unstructured_parser
tries to download punkt
to the /nonexistent
path, fails, and the build terminates.
...
LookupError:
**********************************************************************
Resource [93mpunkt[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
>>> nltk.download('punkt')
[0m
For more information see: <https://www.nltk.org/data.html>
Attempted to load [93mtokenizers/punkt.zip[0m
Searched in:
- '/nonexistent/nltk_data'
- '/usr/local/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
**********************************************************************
...
File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/file_based/file_types/unstructured_parser.py", line 49, in <module>
nltk.download("punkt")
...
File "/usr/local/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/nonexistent'. ```
Any assistance would be appreciated.
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1733339726564329) if you want
to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
['nltk', 'punkt', 'filebasedsource', 'connector-builder', 'permission-error']
</sub>