Error 513 when setting up connection to source DB with over 200+ tables

slack-user-airbyte · October 5, 2024, 6:19am

I’m not sure the new config can be easily applied to an existing cluster. What do you think of running uninstall and then install?

slack-user-airbyte · October 5, 2024, 6:19am

last week, when we tried to upgrade airbyte, we uninstalled and installed abctl and it filled our server memory and crashed it so haven’t tried that way with this new release yet.

slack-user-airbyte · October 5, 2024, 6:19am

Hm. The install crashed the server? or running syncs after the install? How much memory does the server have?

slack-user-airbyte · October 5, 2024, 6:19am

I currently don’t have access to that server (which crashed a week ago), but it was the Production server and was aptly provisioned.

slack-user-airbyte · October 9, 2024, 6:22am

<@U07FH2Y34A1> , I’m working with Nivi, <@U04GW684E2V> , and we are curious when one syncs a cdc airflow connection, which pods are involved to actually execute the work? Worker +? . We keep getting a java heap space error and have tried several configuration tweaks but can’t seem to unblock the java heap space error despite setting to 2+ gigs. The error we always get is something like this

[ASYNC WORKER INFO] Pool queue size: 0, Active threads: 0
2024-10-03 21:40:00 source > Terminating due to java.lang.OutOfMemoryError: Java heap space

Things we have tried are:

adjusting values.yaml (see below)
updating aibyte postgres tables for actor_definition for the mssql server source and general connection.

– sql server source definition
psql airbyte -d db-airbyte -t -A -c “UPDATE actor_definition SET resource_requirements=‘{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "2", "cpu_request": "2", "memory_limit": "10Gi", "memory_request": "2Gi"}}]}’ WHERE id = ‘b5ea17b1-f170-46dc-bc31-cc744ca984c1’;”

–general connection information for the failing cdc connection
psql -U airbyte -d db-airbyte -t -A -c “UPDATE connection SET resource_requirements = ‘{"cpu_limit": "0.5", "cpu_request": "0.5", "memory_limit": "10Gi", "memory_request": "1Gi"}’ WHERE id = ‘7a826998-1439-4a42-81ed-5a929a0774d6’;”

How do we determine which config point/setting is not sufficient to overcome the cdc sync java heap space error?
do connection level settings take precedence over source specific actor definitions? We are blocked for using airbyte with cdc for this particular use case and hoping it is a config point that we can adjust to get to the finish line.

btw- We are on the latest abctl local install version v0.18.0

abctl status
INFO Using Kubernetes provider:
Provider: kind
Kubeconfig: /root/.airbyte/abctl/abctl.kubeconfig
Context: kind-airbyte-abctl
SUCCESS Found Docker installation: version 24.0.5
SUCCESS Existing cluster ‘airbyte-abctl’ found
INFO Found helm chart ‘airbyte-abctl’
Status: deployed
Chart Version: 1.1.0
App Version: 1.1.0
INFO Found helm chart ‘ingress-nginx’
Status: deployed
Chart Version: 4.11.2
App Version: 1.11.2
INFO Airbyte should be accessible via http://localhost:8000

values.yaml file

global:
edition: “community”

jobs:
resources:
limits:
cpu: 1000m
memory: 12Gi ## e.g. 500m
requests:
cpu: 500m
memory: 2Gi

env_vars:
HTTP_IDLE_TIMEOUT: 1800s

webapp:
ingress:
annotations:
http://kubernetes.io/ingress.class|kubernetes.io/ingress.class: internal
http://nginx.ingress.kubernetes.io/proxy-body-size|nginx.ingress.kubernetes.io/proxy-body-size: 16m
http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout: 1800
http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout: 1800

airbyte-bootloader:
resources:
limits:
cpu: 1000m
memory: 12Gi ## e.g. 500m
requests:
cpu: 500m
memory: 2Gi

worker:
enabled: true

– Number of worker replicas

replicaCount: 1

image:
# – The repository to use for the airbyte worker image.
repository: airbyte/worker
# – the pull policy to use for the airbyte worker image
pullPolicy: IfNotPresent

worker resource requests and limits

ref: http://kubernetes.io/docs/user-guide/compute-resources/

We usually recommend not to specify default resources and to leave this as a conscious

choice for the user. This also increases chances charts run on environments with little

resources, such as Minikube. If you do want to specify resources, uncomment the following

lines, adjust them as necessary, and remove the curly braces after ‘resources:’.

resources:
#! – The resources limits for the worker container
limits:
memory: 10Gi
cpu: 500m
# – The requested resources for the worker container
requests:
memory: 2Gi
cpu: 250m

slack-user-airbyte · October 9, 2024, 6:22am

There can be a few pods involved in the lifecycle of a connector, but the main one doing the sync work starts with replication-job-

slack-user-airbyte · October 9, 2024, 6:22am

Which pod is giving you out of memory errors?

slack-user-airbyte · October 9, 2024, 6:22am

The error we receive doesnt indicate what pod as far as I can tell

slack-user-airbyte · October 9, 2024, 6:22am

<@U07FH2Y34A1> - Would adjusting the values.yaml for the section “workload-launcher” affect the replication-job- pod java heap memory? Or is there a databe update needed to tweak the workload-launcher memory settings

slack-user-airbyte · October 9, 2024, 6:22am

What does docker exec -it airbyte-abctl-control-plane kubectl -n airbyte-abctl get pods show?

slack-user-airbyte · October 9, 2024, 6:22am

The global.jobs.resources section affects the job pod resources. I’m less familiar with the database updates approach.

slack-user-airbyte · October 9, 2024, 6:22am

But if you update the global.jobs.resources values, you might need to restart the server and workload-launcher pods to pick up the new settings. We should try to track down one of the failing pods though, and see what settings it has.

slack-user-airbyte · October 9, 2024, 6:22am

NAME READY STATUS RESTARTS AGE
airbyte-abctl-airbyte-bootloader 0/1 Completed 0 3d19h
airbyte-abctl-connector-builder-server-5c6d48b574-tpm7h 1/1 Running 4 (3d19h ago) 3d22h
airbyte-abctl-cron-5ddb45bc4d-swvcz 1/1 Running 4 (3d19h ago) 3d22h
airbyte-abctl-pod-sweeper-pod-sweeper-7cbbf9cf6d-gfczc 1/1 Running 8 (3d19h ago) 4d21h
airbyte-abctl-server-579f67894b-9m9ht 1/1 Running 5 (2d ago) 3d22h
airbyte-abctl-temporal-d858d6866-zt6s4 1/1 Running 5 (3d19h ago) 3d22h
airbyte-abctl-webapp-7f5b9b7654-8bmdg 1/1 Running 12 (3d19h ago) 3d22h
airbyte-abctl-worker-5989b87ccf-2q59w 1/1 Running 1 (3d19h ago) 3d19h
airbyte-abctl-workload-api-server-d64449cb8-czc2z 1/1 Running 4 (3d19h ago) 3d22h
airbyte-abctl-workload-launcher-65854658f9-d57k9 1/1 Running 4 (3d19h ago) 3d22h
airbyte-db-0 1/1 Running 8 (3d19h ago) 4d21h
airbyte-minio-0 1/1 Running 8 (3d19h ago) 4d21h
replication-job-13291-attempt-0 0/3 Completed 0 7m39s
replication-job-13292-attempt-0 0/3 Completed 0 7m39s

slack-user-airbyte · October 9, 2024, 6:22am

Whenever I’ve made changes to configs I have restarted the docker container

slack-user-airbyte · October 9, 2024, 6:22am

I don’t see any failed pods in there. Looks like replication-job-13292-attempt-0 is the most recent job. Did that succeed?

slack-user-airbyte · October 9, 2024, 6:22am

You could look at the details of that job with
docker exec -it airbyte-abctl-control-plane kubectl -n airbyte-abctl describe pod replication-job-13292-attempt-0

slack-user-airbyte · October 9, 2024, 6:22am

It will be a few days before the data will be in the large delta cdc scenario

slack-user-airbyte · October 9, 2024, 6:22am

<@U07FH2Y34A1> following is the only mention to the replication-pod I see in the logs of the failing job
2024-10-04 23:08:22 INFO i.a.w.l.p.KubePodClient(launchReplication):84 - Launching replication pod: replication-job-13157-attempt-0 with containers:
144
2024-10-04 23:08:22 INFO i.a.w.l.p.KubePodClient(launchReplication):85 - [source] image: airbyte/source-mssql:4.1.14 resources: ResourceRequirements(claims=[], limits={memory=12Gi, cpu=1000m}, requests={memory=2Gi, cpu=500m}, additionalProperties={})
145
2024-10-04 23:08:22 INFO i.a.w.l.p.KubePodClient(launchReplication):86 - [destination] image: airbyte/destination-snowflake:3.5.0 resources: ResourceRequirements(claims=[], limits={memory=12Gi, cpu=1000m}, requests={memory=2Gi, cpu=500m}, additionalProperties={})
146
2024-10-04 23:08:22 INFO i.a.w.l.p.KubePodClient(launchReplication):87 - [orchestrator] image: airbyte/container-orchestrator:1.1.0 resources: ResourceRequirements(claims=[], limits={memory=12Gi, cpu=1000m}, requests={memory=2Gi, cpu=500m}, additionalProperties={})
147

slack-user-airbyte · October 9, 2024, 6:22am

Hm, ok, thanks. So it seems like the workload-launcher is starting the job containers with lots of memory (12Gi per container). Sounds like it’s the source (mssql) container that is running out of memory.

slack-user-airbyte · October 9, 2024, 6:22am

Debugging the memory usage of the mssql connector is outside my area of knowledge. Maybe <@U02TQLBLDU4> can point you in the right direction?

Topic		Replies	Views
Default 2GB memory limit not being observed Connector Questions & Issues source-postgres , data-loading	3	1838	December 15, 2022
MySQL to BigQuery connection failing - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 4k, detached Connector Questions & Issues source-mysql , destination-bigquery , connectors , mysql	11	979	July 14, 2022
Incremental sync failing for Postgres source connector on M1 Mac Docker deployment Connector Questions & Issues source-postgres , connectors , cdc	7	1809	May 11, 2022
Severe Performance Degradation Mongo Connector Connector Questions & Issues source-mongo-db , destination-redshift , data-loading , connectors	19	1079	July 6, 2022
CDC Errors on PostgreSQL Source using wal2json Connector Questions & Issues source-postgres	4	538	July 14, 2022