Summary
When testing the S3 connector on Airbyte OSS with Docker deployed on EC2, an error occurred with a traceback showing issues related to pyarrow and numpy libraries, leading to a WorkerException with exit code 130 and no status or failure reason outputted.
Question
hello all I am trying to setup an S3 connector on Airbyte OSS running on docker deployed on EC2 (Ubuntu 16.04.7 LTS (GNU/Linux 4.4.0-1128-aws x86_64). After setting up my source connectors (I am using Terraform) when I try to test it I get the error log below. I have tested the same deployment on another EC2 container, the only difference I can think of is that the other is running Amazon linux.
2024-03-14 21:47:43 [46mplatform[0m >
2024-03-14 21:47:43 [46mplatform[0m > Checking if airbyte/source-s3:4.5.10 exists...
2024-03-14 21:47:43 [46mplatform[0m > airbyte/source-s3:4.5.10 was found locally.
2024-03-14 21:47:43 [46mplatform[0m > Creating docker container = source-s3-check-6bdc4e7f-1dbd-444a-b706-ca1c305acd2c-0-pkztr with resources io.airbyte.config.ResourceRequirements@628e7773[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2024-03-14 21:47:43 [46mplatform[0m > Preparing command: docker run --rm --init -i -w /data/6bdc4e7f-1dbd-444a-b706-ca1c305acd2c/0 --log-driver none --name source-s3-check-6bdc4e7f-1dbd-444a-b706-ca1c305acd2c-0-pkztr --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-s3:4.5.10 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=<http://host.docker.internal:4317> -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.53.0 -e WORKER_JOB_ID=6bdc4e7f-1dbd-444a-b706-ca1c305acd2c airbyte/source-s3:4.5.10 check --config source_config.json
2024-03-14 21:47:43 [46mplatform[0m > Reading messages from protocol version 0.2.0
2024-03-14 21:47:45 [46mplatform[0m > <jemalloc>: arena 0 background thread creation failed (1)
2024-03-14 21:47:45 [46mplatform[0m > OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 2: Operation not permitted
2024-03-14 21:47:45 [46mplatform[0m > OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
2024-03-14 21:47:45 [46mplatform[0m > Traceback (most recent call last):
2024-03-14 21:47:45 [46mplatform[0m > File "/airbyte/integration_code/main.py", line 6, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from source_s3.run import run
2024-03-14 21:47:45 [46mplatform[0m > File "/airbyte/integration_code/source_s3/run.py", line 13, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from source_s3.v4 import Config, Cursor, SourceS3, SourceS3StreamReader
2024-03-14 21:47:45 [46mplatform[0m > File "/airbyte/integration_code/source_s3/v4/__init__.py", line 6, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from .cursor import Cursor
2024-03-14 21:47:45 [46mplatform[0m > File "/airbyte/integration_code/source_s3/v4/cursor.py", line 11, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from airbyte_cdk.sources.file_based.stream.cursor import DefaultFileBasedCursor
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/stream/__init__.py", line 1, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from airbyte_cdk.sources.file_based.stream.abstract_file_based_stream import AbstractFileBasedStream
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py", line 12, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from airbyte_cdk.sources.file_based.discovery_policy import AbstractDiscoveryPolicy
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/discovery_policy/__init__.py", line 1, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from airbyte_cdk.sources.file_based.discovery_policy.abstract_discovery_policy import AbstractDiscoveryPolicy
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/discovery_policy/abstract_discovery_policy.py", line 7, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from airbyte_cdk.sources.file_based.file_types.file_type_parser import FileTypeParser
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/__init__.py", line 13, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from .parquet_parser import ParquetParser
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/parquet_parser.py", line 11, in <module>
2024-03-14 21:47:45 [46mplatform[0m > import pyarrow as pa
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in <module>
2024-03-14 21:47:45 [46mplatform[0m > import pyarrow.lib as _lib
2024-03-14 21:47:45 [46mplatform[0m > File "pyarrow/lib.pyx", line 24, in init pyarrow.lib
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/numpy/__init__.py", line 130, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from numpy.__config__ import show as show_config
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/numpy/__config__.py", line 4, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from numpy.core._multiarray_umath import (
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/numpy/core/__init__.py", line 24, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from . import multiarray
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/numpy/core/multiarray.py", line 10, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from . import overrides
2024-03-14 21:47:45 [46mplatform[0m > File "/usr/local/lib/python3.9/site-packages/numpy/core/overrides.py", line 8, in <module>
2024-03-14 21:47:45 [46mplatform[0m > from numpy.core._multiarray_umath import (
2024-03-14 21:47:45 [46mplatform[0m > File "<frozen importlib._bootstrap>", line 203, in _lock_unlock_module
2024-03-14 21:47:45 [46mplatform[0m > KeyboardInterrupt
2024-03-14 21:47:45 [46mplatform[0m > Check connection job subprocess finished with exit code 130
2024-03-14 21:47:45 [46mplatform[0m > Unexpected error while checking connection:
io.airbyte.workers.exception.WorkerException: Error checking connection status: no status nor failure reason were outputted
at io.airbyte.workers.WorkerUtils.throwWorkerException(WorkerUtils.java:269) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:120) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:142) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:226) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.53.0.jar:?]
at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:211) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-03-14 21:47:45 [46mplatform[0m >
2024-03-14 21:47:45 [46mplatform[0m > ----- END CHECK -----
2024-03-14 21:47:45 [46mplatform[0m > ```
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1710453774482929) if you want to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["s3-connector", "airbyte-oss", "docker", "ec2", "pyarrow", "numpy", "workerexception"]
</sub>