Error Testing S3 Connector on Airbyte OSS with Docker

Summary

User encounters an error while testing the S3 connector on Airbyte OSS deployed via Docker on EC2. The error log indicates issues related to thread creation and connection status, potentially linked to resource limits or environment differences between EC2 instances.


Question

hello all :wave: I am trying to setup an S3 connector on Airbyte OSS running on docker deployed on EC2 (Ubuntu 16.04.7 LTS (GNU/Linux 4.4.0-1128-aws x86_64). After setting up my source connectors (I am using Terraform) when I try to test it I get the error log below. I have tested the same deployment on another EC2 container, the only difference I can think of is that the other is running Amazon linux.

2024-03-14 21:47:43 [46mplatform[0m > 
2024-03-14 21:47:43 [46mplatform[0m > Checking if airbyte/source-s3:4.5.10 exists...
2024-03-14 21:47:43 [46mplatform[0m > airbyte/source-s3:4.5.10 was found locally.
2024-03-14 21:47:43 [46mplatform[0m > Creating docker container = source-s3-check-6bdc4e7f-1dbd-444a-b706-ca1c305acd2c-0-pkztr with resources io.airbyte.config.ResourceRequirements@628e7773[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
2024-03-14 21:47:43 [46mplatform[0m &gt; Preparing command: docker run --rm --init -i -w /data/6bdc4e7f-1dbd-444a-b706-ca1c305acd2c/0 --log-driver none --name source-s3-check-6bdc4e7f-1dbd-444a-b706-ca1c305acd2c-0-pkztr --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-s3:4.5.10 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=0 -e OTEL_COLLECTOR_ENDPOINT=<http://host.docker.internal:4317> -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.53.0 -e WORKER_JOB_ID=6bdc4e7f-1dbd-444a-b706-ca1c305acd2c airbyte/source-s3:4.5.10 check --config source_config.json
2024-03-14 21:47:43 [46mplatform[0m &gt; Reading messages from protocol version 0.2.0
2024-03-14 21:47:45 [46mplatform[0m &gt; &lt;jemalloc&gt;: arena 0 background thread creation failed (1)
2024-03-14 21:47:45 [46mplatform[0m &gt; OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 2: Operation not permitted
2024-03-14 21:47:45 [46mplatform[0m &gt; OpenBLAS blas_thread_init: RLIMIT_NPROC -1 current, -1 max
2024-03-14 21:47:45 [46mplatform[0m &gt; Traceback (most recent call last):
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/airbyte/integration_code/main.py", line 6, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from source_s3.run import run
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/airbyte/integration_code/source_s3/run.py", line 13, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from source_s3.v4 import Config, Cursor, SourceS3, SourceS3StreamReader
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/airbyte/integration_code/source_s3/v4/__init__.py", line 6, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from .cursor import Cursor
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/airbyte/integration_code/source_s3/v4/cursor.py", line 11, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from airbyte_cdk.sources.file_based.stream.cursor import DefaultFileBasedCursor
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/stream/__init__.py", line 1, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from airbyte_cdk.sources.file_based.stream.abstract_file_based_stream import AbstractFileBasedStream
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py", line 12, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from airbyte_cdk.sources.file_based.discovery_policy import AbstractDiscoveryPolicy
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/discovery_policy/__init__.py", line 1, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from airbyte_cdk.sources.file_based.discovery_policy.abstract_discovery_policy import AbstractDiscoveryPolicy
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/discovery_policy/abstract_discovery_policy.py", line 7, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from airbyte_cdk.sources.file_based.file_types.file_type_parser import FileTypeParser
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/__init__.py", line 13, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from .parquet_parser import ParquetParser
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/file_based/file_types/parquet_parser.py", line 11, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     import pyarrow as pa
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/pyarrow/__init__.py", line 65, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     import pyarrow.lib as _lib
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "pyarrow/lib.pyx", line 24, in init pyarrow.lib
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/numpy/__init__.py", line 130, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from numpy.__config__ import show as show_config
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/numpy/__config__.py", line 4, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from numpy.core._multiarray_umath import (
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/numpy/core/__init__.py", line 24, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from . import multiarray
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/numpy/core/multiarray.py", line 10, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from . import overrides
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "/usr/local/lib/python3.9/site-packages/numpy/core/overrides.py", line 8, in &lt;module&gt;
2024-03-14 21:47:45 [46mplatform[0m &gt;     from numpy.core._multiarray_umath import (
2024-03-14 21:47:45 [46mplatform[0m &gt;   File "&lt;frozen importlib._bootstrap&gt;", line 203, in _lock_unlock_module
2024-03-14 21:47:45 [46mplatform[0m &gt; KeyboardInterrupt
2024-03-14 21:47:45 [46mplatform[0m &gt; Check connection job subprocess finished with exit code 130
2024-03-14 21:47:45 [46mplatform[0m &gt; Unexpected error while checking connection: 
io.airbyte.workers.exception.WorkerException: Error checking connection status: no status nor failure reason were outputted
	at io.airbyte.workers.WorkerUtils.throwWorkerException(WorkerUtils.java:269) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:120) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:142) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:226) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
	at io.airbyte.commons.temporal.HeartbeatUtils.withBackgroundHeartbeat(HeartbeatUtils.java:57) ~[io.airbyte-airbyte-commons-temporal-core-0.53.0.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:211) ~[io.airbyte-airbyte-workers-0.53.0.jar:?]
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-03-14 21:47:45 [46mplatform[0m &gt; 
2024-03-14 21:47:45 [46mplatform[0m &gt; ----- END CHECK -----
2024-03-14 21:47:45 [46mplatform[0m &gt; ```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1710453774482929) if you want
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
['s3-connector', 'airbyte-oss', 'docker', 'ec2', 'terraform', 'error-log']
</sub>