Step: The issue is happening during sync before parquet files are created on S3
Description: Hi team, we have a situation where we reset Airbyte connection programatically using argo workflows template and the result is cleared state in Airbyte db for that connection and removal of all corresponding files on S3 which is effectively a full refresh.
So far so good, now after those files are deleted and the external table built on top of those files on S3 is dropped, we sync this connection, but when you have a clear state, first sync acts as a full refresh so we obtain all records currently present in our Salesforce source, that is also working fine because we can compare the result set in parquet with what we see in Salesforce and it all checks out.
Now this is where problems start for us, we have an argo workflow scheduled to run on an hourly basis and it will always sync this connection in incremental mode because that’s what is set in the config file for this connection, this being incremental sync means we are expecting only updated records to sync. However what we’re noticing is for a very low number of records a parquet file will be created and it will contain a record already synced in the full refresh after connection was reset and it will be identical to the one already synced, so much so that even SystemModstamp will be the same which should never be the case with any record. So for example for 10 subsequent incremental syncs, you will have some records that constantly reoccur in every sync but they’re all identical to every previous run all the way back to the original full refresh result set. Please advise!
Hi @dean, would you be able to update Airbyte to the latest version and double-check that the connectors are up to date as a first step? I’m looking into this for you!
I just tried to upgrade to 0.40.3 and I’m getting this error in Argo CD which we use to sync deployments from github to kubernetes
ComparisonError
rpc error: code = Unknown desc = kustomize build /tmp/git@github.com_firebolt-analytics_de-gitops/airbyte/dev failed exit status 1: Error: no matches for Id apps_v1_Deployment|~X|airbyte-scheduler; failed to find unique target for patch apps_v1_Deployment|airbyte-scheduler
Does this mean you guys changed something about the airbyte-scheduler in the new version?
I’m attaching our .env and kustomization.yaml files which worked for us for every upgrade until now, in the meantime I’ll keep investigating.
However we’re still unable to achieve pod stability, bootloader pod seems to be recreated every few seconds for some reason so I’ll get back to you once we manage to handle that one
Thank you for all the follow ups - it’s great that you were finally able to update! We are working on making deployment to Kubernetes easier and more streamlined, but it definitely still needs improvement.
I finally arrived at the issue you referred to last and it’s the first bullet point there regarding multi-attach error for server pod
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m1s default-scheduler Successfully assigned de/airbyte-server-64cd88ff8c-97xx4 to ip-10-10-112-5.ec2.internal
Warning FailedAttachVolume 7m59s attachdetach-controller Multi-Attach error for volume "pvc-e405467c-aaf3-4068-a759-61ee9100bc4a" Volume is already used by pod(s) airbyte-cron-c68b7946b-dj4qx
Warning FailedMount 3m43s kubelet Unable to attach or mount volumes: unmounted volumes=[airbyte-volume-configs], unattached volumes=[gcs-log-creds-volume kube-api-access-fpp22 airbyte-volume-configs]: timed out waiting for the condition
Warning FailedMount 85s (x2 over 5m58s) kubelet Unable to attach or mount volumes: unmounted volumes=[airbyte-volume-configs], unattached volumes=[airbyte-volume-configs gcs-log-creds-volume kube-api-access-fpp22]: timed out waiting for the condition