Sync fails due to "Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)"

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: AWS Linux
  • Memory / Disk: 2 M6 2XLarge
  • Deployment: Kubernetes
  • Airbyte Version: 0.40.4
  • Source name/version: Salesforce
  • Destination name/version: 1.0.15
  • Step: Start a sync for a newly connected large salesforce account.
  • Description: The sync was moving along for the last 12 hours and managed to sync almost 40gbs of data. However, all of a sudden it fails due to “Failed to create pod orchestrator-repl-job-138756-attempt-0, pre-existing pod exists which didn’t advance out of the NOT_STARTED”. I’ve attached the logs below for troubleshooting.
2022-10-20 03:08:52 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(lambda$getWorkerThread$2):162 - Completing future exceptionally...
java.lang.RuntimeException: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed
	at io.airbyte.workers.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:341) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.sync.LauncherWorker.run(LauncherWorker.java:91) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:159) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at java.lang.Thread.run(Thread.java:1589) [?:?]
Caused by: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed
	at io.airbyte.workers.temporal.sync.LauncherWorker.lambda$run$3(LauncherWorker.java:184) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:336) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	... 3 more
Caused by: io.airbyte.workers.exception.WorkerException: Failed to create pod orchestrator-repl-job-138756-attempt-0, pre-existing pod exists which didn't advance out of the NOT_STARTED state.
	at io.airbyte.workers.temporal.sync.LauncherWorker.lambda$run$3(LauncherWorker.java:155) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:336) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	... 3 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PUT at: https://172.20.0.1/api/v1/namespaces/default/pods/orchestrator-repl-job-138756-attempt-0. Message: Pod "orchestrator-repl-job-138756-attempt-0" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
  core.PodSpec{
  	Volumes: []core.Volume{
  		{Name: "airbyte-config", VolumeSource: {EmptyDir: &{Medium: "Memory"}}},
+ 		{
+ 			Name: "kube-api-access-2h6wc",
+ 			VolumeSource: core.VolumeSource{
+ 				Projected: &core.ProjectedVolumeSource{Sources: []core.VolumeProjection{...}, DefaultMode: &420},
+ 			},
+ 		},
  	},
  	InitContainers: nil,
  	Containers: []core.Container{
  		{
  			... // 3 identical fields
  			Args:       nil,
  			WorkingDir: "",
  			Ports: []core.ContainerPort{
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9880,
+ 					ContainerPort: 9877,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9000,
+ 					ContainerPort: 9878,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9879, Protocol: "TCP"},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9878,
+ 					ContainerPort: 9000,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9877,
+ 					ContainerPort: 9880,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9000, Protocol: "TCP"},
  			},
  			EnvFrom: nil,
  			Env:     {{Name: "METRIC_CLIENT"}, {Name: "DD_AGENT_HOST"}, {Name: "DD_DOGSTATSD_PORT"}, {Name: "PUBLISH_METRICS", Value: "false"}, ...},
  			Resources: core.ResourceRequirements{
- 				Limits:   core.ResourceList{},
+ 				Limits:   nil,
- 				Requests: core.ResourceList{},
+ 				Requests: nil,
  			},
  			VolumeMounts: []core.VolumeMount{
  				{Name: "airbyte-config", MountPath: "/config"},
+ 				{
+ 					Name:      "kube-api-access-2h6wc",
+ 					ReadOnly:  true,
+ 					MountPath: "/var/run/secrets/kubernetes.io/serviceaccount",
+ 				},
  			},
  			VolumeDevices: nil,
  			LivenessProbe: nil,
  			... // 10 identical fields
  		},
  	},
  	EphemeralContainers: nil,
  	RestartPolicy:       "Never",
  	... // 4 identical fields
  	ServiceAccountName:           "airbyte-admin",
  	AutomountServiceAccountToken: &true,
- 	NodeName:                     "",
+ 	NodeName:                     "ip-10-0-99-102.us-west-2.compute.internal",
  	SecurityContext:              &{},
  	ImagePullSecrets:             nil,
  	... // 16 identical fields
  }
. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec, message=Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
  core.PodSpec{
  	Volumes: []core.Volume{
  		{Name: "airbyte-config", VolumeSource: {EmptyDir: &{Medium: "Memory"}}},
+ 		{
+ 			Name: "kube-api-access-2h6wc",
+ 			VolumeSource: core.VolumeSource{
+ 				Projected: &core.ProjectedVolumeSource{Sources: []core.VolumeProjection{...}, DefaultMode: &420},
+ 			},
+ 		},
  	},
  	InitContainers: nil,
  	Containers: []core.Container{
  		{
  			... // 3 identical fields
  			Args:       nil,
  			WorkingDir: "",
  			Ports: []core.ContainerPort{
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9880,
+ 					ContainerPort: 9877,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9000,
+ 					ContainerPort: 9878,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9879, Protocol: "TCP"},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9878,
+ 					ContainerPort: 9000,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9877,
+ 					ContainerPort: 9880,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9000, Protocol: "TCP"},
  			},
  			EnvFrom: nil,
  			Env:     {{Name: "METRIC_CLIENT"}, {Name: "DD_AGENT_HOST"}, {Name: "DD_DOGSTATSD_PORT"}, {Name: "PUBLISH_METRICS", Value: "false"}, ...},
  			Resources: core.ResourceRequirements{
- 				Limits:   core.ResourceList{},
+ 				Limits:   nil,
- 				Requests: core.ResourceList{},
+ 				Requests: nil,
  			},
  			VolumeMounts: []core.VolumeMount{
  				{Name: "airbyte-config", MountPath: "/config"},
+ 				{
+ 					Name:      "kube-api-access-2h6wc",
+ 					ReadOnly:  true,
+ 					MountPath: "/var/run/secrets/kubernetes.io/serviceaccount",
+ 				},
  			},
  			VolumeDevices: nil,
  			LivenessProbe: nil,
  			... // 10 identical fields
  		},
  	},
  	EphemeralContainers: nil,
  	RestartPolicy:       "Never",
  	... // 4 identical fields
  	ServiceAccountName:           "airbyte-admin",
  	AutomountServiceAccountToken: &true,
- 	NodeName:                     "",
+ 	NodeName:                     "ip-10-0-99-102.us-west-2.compute.internal",
  	SecurityContext:              &{},
  	ImagePullSecrets:             nil,
  	... // 16 identical fields
  }
, reason=FieldValueForbidden, additionalProperties={})], group=null, kind=Pod, name=orchestrator-repl-job-138756-attempt-0, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "orchestrator-repl-job-138756-attempt-0" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
  core.PodSpec{
  	Volumes: []core.Volume{
  		{Name: "airbyte-config", VolumeSource: {EmptyDir: &{Medium: "Memory"}}},
+ 		{
+ 			Name: "kube-api-access-2h6wc",
+ 			VolumeSource: core.VolumeSource{
+ 				Projected: &core.ProjectedVolumeSource{Sources: []core.VolumeProjection{...}, DefaultMode: &420},
+ 			},
+ 		},
  	},
  	InitContainers: nil,
  	Containers: []core.Container{
  		{
  			... // 3 identical fields
  			Args:       nil,
  			WorkingDir: "",
  			Ports: []core.ContainerPort{
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9880,
+ 					ContainerPort: 9877,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9000,
+ 					ContainerPort: 9878,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9879, Protocol: "TCP"},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9878,
+ 					ContainerPort: 9000,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{
  					Name:          "",
  					HostPort:      0,
- 					ContainerPort: 9877,
+ 					ContainerPort: 9880,
  					Protocol:      "TCP",
  					HostIP:        "",
  				},
  				{ContainerPort: 9000, Protocol: "TCP"},
  			},
  			EnvFrom: nil,
  			Env:     {{Name: "METRIC_CLIENT"}, {Name: "DD_AGENT_HOST"}, {Name: "DD_DOGSTATSD_PORT"}, {Name: "PUBLISH_METRICS", Value: "false"}, ...},
  			Resources: core.ResourceRequirements{
- 				Limits:   core.ResourceList{},
+ 				Limits:   nil,
- 				Requests: core.ResourceList{},
+ 				Requests: nil,
  			},
  			VolumeMounts: []core.VolumeMount{
  				{Name: "airbyte-config", MountPath: "/config"},
+ 				{
+ 					Name:      "kube-api-access-2h6wc",
+ 					ReadOnly:  true,
+ 					MountPath: "/var/run/secrets/kubernetes.io/serviceaccount",
+ 				},
  			},
  			VolumeDevices: nil,
  			LivenessProbe: nil,
  			... // 10 identical fields
  		},
  	},
  	EphemeralContainers: nil,
  	RestartPolicy:       "Never",
  	... // 4 identical fields
  	ServiceAccountName:           "airbyte-admin",
  	AutomountServiceAccountToken: &true,
- 	NodeName:                     "",
+ 	NodeName:                     "ip-10-0-99-102.us-west-2.compute.internal",
  	SecurityContext:              &{},
  	ImagePullSecrets:             nil,
  	... // 16 identical fields
  }
, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:342) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleUpdate(OperationSupport.java:322) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleUpdate(BaseOperation.java:649) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.lambda$replace$1(HasMetadataOperation.java:195) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:200) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.HasMetadataOperation.replace(HasMetadataOperation.java:141) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.replace(CreateOrReplaceHelper.java:69) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.utils.CreateOrReplaceHelper.createOrReplace(CreateOrReplaceHelper.java:61) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:318) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:308) ~[kubernetes-client-5.12.2.jar:?]
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.createOrReplace(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?]
	at io.airbyte.workers.process.AsyncOrchestratorPodProcess.create(AsyncOrchestratorPodProcess.java:320) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.sync.LauncherWorker.lambda$run$3(LauncherWorker.java:149) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	at io.airbyte.workers.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:336) ~[io.airbyte-airbyte-workers-0.40.4.jar:?]
	... 3 more

Hey so the data was pulled even after the error or there is a new attempt created?

It triggered a new attempt and progressed along. I did notice that for the streams we had finished syncing in the first attempt, it ended up resyncing those streams from the start as well. Seems like the connector state is updated at the end of the sync as opposed to when the stream finished syncing? (this is a tangential question.)

Core question I’m trying to figure out is why it errored in the first place. Our sources get pretty big (E.g. 50+ gbs of data ~ 10 hours of sync), any time it errors we end up having to do another 10 hours of syncing again. Let me know if there’s anything I can check! Appreciate your help.

Ideally the error in itself looks new. My guess, this is something related to the resource. But yeah incase you face this issue again feel free to create a github issue.

Ran another sync again and ran into the issue across attempt 1 and 2. Will create a github issue thanks!

If there’s anything I can check or tweak, let me know as well. Appreciate it