Dbt failure with tabular normalization PostgreSQL -> Snowflake

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Kubernetes / AWS EKS
  • Memory / Disk: whatever the default kube deployment spec has
  • Deployment: Are you using Docker or Kubernetes deployment? yes
  • Airbyte Version: What version are you using now? 0.40.3
  • Source name/version: PostGIS 13
  • Destination name/version: Snowflake
  • Step: The issue is happening during sync, creating the connection or a new source? sync
  • Description:

I originally set up airbyte in the default namespace and we got the src / dst connections working and I setup a sync successfully just using the tabular normalization.

Later we wanted to move it into its own namespace, so I tore down the old airbyte deployment and we setup a new one in the airbyte namespace. I setup the connections just fine the same way – same schema settings. I setup the sync, and the sync is failing with the following error:

2022-09-12 16:55:45 normalization-orchestrator > 
2022-09-12 16:55:45 normalization > All checks passed!
2022-09-12 16:55:46 normalization-orchestrator > 
2022-09-12 16:55:46 normalization > Forward dbt output logs to diagnose/debug errors (0):
2022-09-12 16:55:46 normalization-orchestrator > 
2022-09-12 16:55:46 normalization > Running with dbt=1.0.0
2022-09-12 16:55:46 normalization > Partial parse save file not found. Starting full parse.
2022-09-12 16:55:46 normalization > [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.airbyte_utils.generated.airbyte_views
- models.airbyte_utils.generated.airbyte_incremental
2022-09-12 16:55:46 normalization > Found 4 models, 0 tests, 0 snapshots, 0 analyses, 572 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
2022-09-12 16:55:47 normalization-orchestrator > (pod: airbyte / normalization-snowflake-normalize-73-0-qpbnw) - Closed all resources for pod
2022-09-12 16:55:47 normalization-orchestrator > Normalization executed in 27 seconds.
2022-09-12 16:55:47 normalization-orchestrator > Normalization Failed.
2022-09-12 16:55:47 normalization-orchestrator > Normalization summary: io.airbyte.config.NormalizationSummary@6d6039df[startTime=1663001719329,endTime=1663001747325,failures=[io.airbyte.config.FailureReason@611c3eae[failureOrigin=normalization,failureType=system_error,internalMessage=,externalMessage=Normalization failed during the dbt run. This may indicate a problem with the data itself.,metadata=io.airbyte.config.Metadata@13e00016[additionalProperties={attemptNumber=0, jobId=73, from_trace_message=true}],stacktrace=AirbyteDbtError: 
Encountered an error:
Compilation Error in macro list_relations_without_caching (macros/adapters/metadata.sql)
  
        Too many schemas in schema  "DEV_HOSPENG_REPORTING".PUBLIC! dbt can only get
        information about schemas with fewer than 10000 objects.

The error is strange because the source and destination databases are not large with not much data in them. I have searched this forum for that error and came up empty. I also searched the airbyte and dbt github repos and also could not find these errors.

Hi and welcome to the forum @davisford! Let me look into this and see what I can find.

Thanks – just for reference, I also changed the sync to only do a single table with incremental append, and it still failed with the same error.

I got some feedback from a colleague with extensive Kubernetes experience and they think the data is being stored in a leftover persistent volume from the previous deploy.

Could you run kubectl get pv to verify the age and reclaim policy of the persistent volume? There’re a few different things we can do depending on whether it’s Delete or Retain.

Thanks – I think I found the issue. We have thousands of these AIRBYTE_TMP* tables that it appears to have generated multiples of with these randomized three letter codes embedded in them.

Were these left over from your initial installation?

I am unsure – but they seem to keep getting created on each failed sync job which ran over the weekend. It generated over 30k tables!

@natalyjazzviolin can you help me understand how we can avoid this in the future… what was the root cause and how we can always prevent something like this from happening.

To summarize the history:

  1. Deploy airbyte in our k8s cluster with zero modifications to the deployment specs (i.e. clone airbyte repo and kubectl apply the specs as is.
  2. Setup source (PostgreSQL which is in the k8s cluster) and destination (Snowflake) – OK
  3. Setup a sync between the two
  4. Run it OK – all is well, the only transform we set it to was tabular normalization default
  5. Re-deploy airbyte specs in its own airbyte namespace in kubernetes.
  6. Delete the original airbyte deployment in the default namespace
  7. Setup source (PostgreSQL which is in the k8s cluster) and destination (Snowflake) – OK
  8. Setup a sync between the two and let it run (Friday)
  9. Come back and check Monday and we have thousands and thousands of temp tables and all the syncs are failing and every continued attempt seems to generate more temp tables.

It can’t be a Persistent Volume Claim / Persistent Volume issue because these deployments are in totally separate namespaces and have no access to each other’s stuff. So the PVC that may have been created by the original deployment in the default namespace would not have been accessible when we re-deployed airbyte in the airbyte namespace and it would have just generated a new one. The only thing is, the Snowflake environment already had a sync of the data and a bunch of _AIRBYTE_* tables…but for whatever reason, the new deployment didn’t understand and went haywire, it seems.

So the temp tables issue I can answer to right now - unfortunately when a sync fails, temp tables are not deleted. This is on our roadmap to fix:
https://github.com/airbytehq/airbyte/issues/4059

The only time they are deleted is when a successful sync runs.

Did you have a chance to run the kubectl get pv command?

Hi @natalyjazzviolin – we just did another clean re-deploy of airbyte.

➜ kc -n airbyte get pv | grep airbyte
pvc-863c5bd2-a25f-4712-962a-b06f2243dd84   1Gi        RWO            Delete           Bound    airbyte/airbyte-volume-db           gp2                               47m
pvc-ba66483a-1330-45ce-bd5d-4b5f5f71fedc   1Gi        RWO            Delete           Bound    airbyte/airbyte-minio-pv-claim      gp2                               47m
pvc-f1c16632-a9ef-4db4-ad0b-edcdb091cb2c   1Gi        RWO            Delete           Bound    airbyte/airbyte-volume-configs      gp2                               47m
➜ kc -n airbyte get pod
NAME                                                              READY   STATUS      RESTARTS   AGE
airbyte-db-b6f485dd7-z7rgp                                        1/1     Running     0          48m
airbyte-minio-fdbd85cf7-t2gx6                                     1/1     Running     0          48m
airbyte-pod-sweeper-64f9d9b5dd-jq9sb                              1/1     Running     0          48m
airbyte-server-7f4d95b468-rz47d                                   1/1     Running     0          48m
airbyte-temporal-695d468d7f-scfnp                                 1/1     Running     0          48m
airbyte-webapp-dd8b789c8-8fswr                                    1/1     Running     0          48m
airbyte-worker-5db6465868-4gmqc                                   1/1     Running     0          48m
destination-snowflake-check-1-0-atpoi                             0/4     Completed   0          39m
destination-snowflake-check-1-1-evvpd                             0/4     Completed   0          32m
destination-snowflake-check-1-2-emkss                             0/4     Completed   0          25m
destination-snowflake-write-1-0-pqerg                             0/5     Completed   0          39m
destination-snowflake-write-1-1-fhxbn                             0/5     Completed   0          32m
destination-snowflake-write-1-2-fmxxp                             0/5     Completed   0          24m
normalization-snowflake-normalize-1-0-msncc                       0/4     Completed   0          35m
normalization-snowflake-normalize-1-1-mbtse                       0/4     Completed   0          27m
normalization-snowflake-normalize-1-2-dfgcg                       0/4     Completed   0          20m
on-snowflake-check-0d318336-27f4-4491-8b9c-daf77a48d957-0-rzxnc   0/4     Completed   0          41m
orchestrator-norm-job-1-attempt-0                                 0/1     Completed   0          35m
orchestrator-norm-job-1-attempt-1                                 0/1     Completed   0          28m
orchestrator-norm-job-1-attempt-2                                 0/1     Completed   0          21m
orchestrator-repl-job-1-attempt-1                                 0/1     Completed   0          32m
orchestrator-repl-job-1-attempt-2                                 0/1     Completed   0          25m
postgres-discover-529a5be6-ffb7-413e-a27a-957165ad76e9-0-cdrki    0/4     Completed   0          40m
rce-postgres-check-11a0a1bc-738f-4a94-a793-032c804981b2-0-wgmoy   0/4     Completed   0          43m
source-postgres-check-1-0-kcaxx                                   0/4     Completed   0          40m
source-postgres-check-1-1-lixhu                                   0/4     Completed   0          32m
source-postgres-check-1-2-ugwrr                                   0/4     Completed   0          25m
source-postgres-read-1-0-sverf                                    0/4     Completed   0          39m
source-postgres-read-1-1-avwov                                    0/4     Completed   0          32m
source-postgres-read-1-2-xawpx                                    0/4     Completed   0          24m

Setting up a new sync from Postgres => Snowflake with standard tabular normalization it fails with the following:

Database Error in model TAX_SETTINGS (models/generated/airbyte_tables/PUBLIC/TAX_SETTINGS.sql)
  002003 (42S02): SQL compilation error:
  Object 'DEV_HOSPENG_REPORTING.PUBLIC._AIRBYTE_RAW_TAX_SETTINGS' does not exist or not authorized.
  compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/PUBLIC/TAX_SETTINGS.sql

All the dbt model transforms appear to be failing to create the _airbyte_raw_ tables. Airbyte is creating the _airbyte_tmp_ tables so it wouldn’t seem to be a permissions problem. Can you help me understand what is wrong here?

Thank you!

It’s not just one table that fails here – they all fail the same way, i’m just posting one of the error messages

We can close this issue out. I resolved it. We had something set incorrectly in the destination configuration that was causing the above error.

Great to hear! Could you specify what setting was wrong so future readers might benefit?