Salesforce connections stuck and subsequent workflow runs failing

logs-209566.txt (25.4 KB)

  • Is this your first time deploying Airbyte?: No

  • OS Version / Instance: VM Ubuntu 21.4

  • Memory / Disk: 32/1 TB

  • Deployment: Kubernetes
    logs-209566.txt (25.4 KB)

  • Airbyte Version: 0.39.3-alpha

  • Source name/version: Salesforce (1.0.2)

  • Destination name/version: S3 (0.2.13)

  • Step: The issue is happening during sync, creating the connection or a new source? Sync

  • Description: Hi team, we’re noticing connections stuck in progress, sometimes for more than a day (for now only Salesforce connections), when those connections are stuck all subsequent workflow runs (Argo Workflows) will fail on the sync step and in that case cancelling stuck connections from Airbyte UI clears the issue with subsequent workflow runs. We’re not sure what causes this and how to replicate this behavior. Attaching task log for reference.

Hi @dean,
Could you please upgrade your source and destination connectors to their latest version:

  • 1.0.10 for salesforce
  • 0.3.9 for S3
    After the upgrade let us know if the error persists.

The DEADLINE_EXCEEDED: deadline exceeded after 9.999970679s. [closed=[], open=[[remote_addr=airbyte-temporal-svc/100.69.93.195:7233]]] error I found in your log is usually related to lack of memory. Could you try to increase your Kube nodes memory and check if the error persist?
It would also help if you could share the logs of the temporal pod.

Thanks!

Hi @alafanechere

I’m on 0.39.3 so I’m guessing I can’t upgrade those connectors to version you suggested without upgrading to newer airbyte version, can I update directly from 0.39.3 to 0.39.28?

As for temporal, I’m attaching the log for today which also saw the same problem with hanging connections, our current stable_with_resource_limits looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-db
spec:
  template:
    spec:
      containers:
        - name: airbyte-db-container
          resources:
            limits:
              cpu: 2
              memory: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-scheduler
spec:
  template:
    spec:
      containers:
        - name: airbyte-scheduler-container
          resources:
            limits:
              cpu: 2
              memory: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-worker
spec:
  template:
    spec:
      containers:
        - name: airbyte-worker-container
          resources:
            limits:
              cpu: 2
              memory: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-server
spec:
  template:
    spec:
      containers:
        - name: airbyte-server-container
          resources:
            limits:
              cpu: 1
              memory: 6Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-temporal
spec:
  template:
    spec:
      containers:
        - name: airbyte-temporal
          resources:
            limits:
              cpu: 2
              memory: 512Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-webapp
spec:
  template:
    spec:
      containers:
        - name: airbyte-webapp-container
          resources:
            limits:
              cpu: 1
              memory: 512Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-minio
spec:
  template:
    spec:
      containers:
        - name: airbyte-minio
          resources:
            limits:
              cpu: 1
              memory: 512Mi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airbyte-volume-db
spec:
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: airbyte-minio-pv-claim
spec:
  resources:
    requests:
      storage: 70Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airbyte-pod-sweeper
spec:
  template:
    spec:
      containers:
        - name: airbyte-pod-sweeper
          resources:
            limits:
              cpu: 0.5
              memory: 128Mi

Also running describe on nodes yields the following results for the namespace where airbyte is deployed

Name:               ip-10-10-121-232.ec2.internal
Roles:              node
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c5a.2xlarge
                    beta.kubernetes.io/os=linux
                    dedicated=data_engineering
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1b
                    kops.k8s.io/instancegroup=data_engineering-b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-10-121-232.ec2.internal
                    kubernetes.io/os=linux
                    kubernetes.io/role=node
                    node-role.kubernetes.io/node=
                    node.kubernetes.io/instance-type=c5a.2xlarge
                    nodepool=data_engineering
                    topology.ebs.csi.aws.com/zone=us-east-1b
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1b
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-05d34811b3c76cdf1"}
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 24 Jun 2022 23:10:18 +0200
Taints:             dedicated=data_engineering:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-10-121-232.ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Tue, 28 Jun 2022 13:51:54 +0200
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Tue, 28 Jun 2022 13:48:50 +0200   Fri, 24 Jun 2022 23:10:18 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 28 Jun 2022 13:48:50 +0200   Fri, 24 Jun 2022 23:10:18 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 28 Jun 2022 13:48:50 +0200   Fri, 24 Jun 2022 23:10:18 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 28 Jun 2022 13:48:50 +0200   Fri, 24 Jun 2022 23:10:48 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:   10.10.121.232
  Hostname:     ip-10-10-121-232.ec2.internal
  InternalDNS:  ip-10-10-121-232.ec2.internal
Capacity:
  cpu:                8
  ephemeral-storage:  20145724Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16195992Ki
  pods:               58
Allocatable:
  cpu:                7900m
  ephemeral-storage:  18566299208
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16093592Ki
  pods:               58
System Info:
  Machine ID:                 ec2cf609e09606892b68fead901a074d
  System UUID:                ec2cf609-e096-0689-2b68-fead901a074d
  Boot ID:                    95cf5886-5c63-48ef-92c6-6760e1cd95b7
  Kernel Version:             5.13.0-1029-aws
  OS Image:                   Ubuntu 20.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.9
  Kubelet Version:            v1.22.10
  Kube-Proxy Version:         v1.22.10
PodCIDR:                      100.96.63.0/24
PodCIDRs:                     100.96.63.0/24
ProviderID:                   aws:///us-east-1b/i-05d34811b3c76cdf1
Non-terminated Pods:          (14 in total)
  Namespace                   Name                                         CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                         ------------  ----------  ---------------  -------------  ---
  de                          airbyte-pod-sweeper-64f9d9b5dd-fk4fs         500m (6%)     500m (6%)   128Mi (0%)       128Mi (0%)     61m
  de                          airbyte-scheduler-858d975c9f-9l9jl           2 (25%)       2 (25%)     1Gi (6%)         1Gi (6%)       61m
  de                          airbyte-temporal-759cd87c7b-vqxb6            2 (25%)       2 (25%)     512Mi (3%)       512Mi (3%)     61m
  de                          airbyte-webapp-6ccbf5757c-27rwp              1 (12%)       1 (12%)     512Mi (3%)       512Mi (3%)     61m
  de                          argo-workflows-server-6b88d6b678-zjvvf       50m (0%)      250m (3%)   256Mi (1%)       768Mi (4%)     3d14h
  de                          query-history-global-1656417000-840367556    50m (0%)      250m (3%)   64Mi (0%)        512Mi (3%)     6s
  de                          us-east-1-vector-847789cffb-vb9cf            200m (2%)     400m (5%)   256Mi (1%)       512Mi (3%)     3d14h
  kube-system                 aws-node-spkx9                               10m (0%)      0 (0%)      0 (0%)           0 (0%)         3d14h
  kube-system                 ebs-csi-node-8wd9g                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d14h
  kube-system                 kube-proxy-ip-10-10-121-232.ec2.internal     100m (1%)     0 (0%)      0 (0%)           0 (0%)         3d14h
  monitoring                  logging-operator-fluentbit-vt6r7             100m (1%)     200m (2%)   50M (0%)         100M (0%)      3d14h
  monitoring                  loki-canary-qf7k8                            50m (0%)      250m (3%)   64Mi (0%)        64Mi (0%)      3d14h
  monitoring                  loki-promtail-lj6hn                          100m (1%)     0 (0%)      256Mi (1%)       256Mi (1%)     3d14h
  monitoring                  prom-stack-prometheus-node-exporter-kcm7z    50m (0%)      50m (0%)    64Mi (0%)        64Mi (0%)      3d14h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests          Limits
  --------           --------          ------
  cpu                6210m (78%)       6900m (87%)
  memory             3338334336 (20%)  4663402752 (28%)
  ephemeral-storage  0 (0%)            0 (0%)
  hugepages-1Gi      0 (0%)            0 (0%)
  hugepages-2Mi      0 (0%)            0 (0%)
Events:
  Type     Reason               Age   From     Message
  ----     ------               ----  ----     -------
  Warning  FreeDiskSpaceFailed  60m   kubelet  failed to garbage collect required amount of images. Wanted to free 5116726067 bytes, but freed 2781091178 bytes
  Warning  FreeDiskSpaceFailed  40m   kubelet  failed to garbage collect required amount of images. Wanted to free 4176472883 bytes, but freed 2031066915 bytes
  Warning  FreeDiskSpaceFailed  30m   kubelet  failed to garbage collect required amount of images. Wanted to free 5228755763 bytes, but freed 1857015974 bytes
  Warning  FreeDiskSpaceFailed  20m   kubelet  failed to garbage collect required amount of images. Wanted to free 4112956211 bytes, but freed 2028062758 bytes


temporal.txt (343.5 KB)

Thank you for the details.
Please provide more RAM to the Airbyte temporal pod.

I can’t upgrade those connectors to version you suggested without upgrading to newer airbyte version

Nope, you can update a connector from the settings without requiring a full Airbyte upgrade.

can I update directly from 0.39.3 to 0.39.28?

Yes, you can directly upgrade to our latest version from 0.39.3 if you like. Here our upgrade guide

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.