Memory / Disk: whatever the default kube deployment spec has
Deployment: Are you using Docker or Kubernetes deployment? yes
Airbyte Version: What version are you using now? 0.40.3
Source name/version: PostGIS 13
Destination name/version: Snowflake
Step: The issue is happening during sync, creating the connection or a new source? sync
Description:
I originally set up airbyte in the default namespace and we got the src / dst connections working and I setup a sync successfully just using the tabular normalization.
Later we wanted to move it into its own namespace, so I tore down the old airbyte deployment and we setup a new one in the airbyte namespace. I setup the connections just fine the same way – same schema settings. I setup the sync, and the sync is failing with the following error:
2022-09-12 16:55:45 normalization-orchestrator >
2022-09-12 16:55:45 normalization > All checks passed!
2022-09-12 16:55:46 normalization-orchestrator >
2022-09-12 16:55:46 normalization > Forward dbt output logs to diagnose/debug errors (0):
2022-09-12 16:55:46 normalization-orchestrator >
2022-09-12 16:55:46 normalization > Running with dbt=1.0.0
2022-09-12 16:55:46 normalization > Partial parse save file not found. Starting full parse.
2022-09-12 16:55:46 normalization > [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 2 unused configuration paths:
- models.airbyte_utils.generated.airbyte_views
- models.airbyte_utils.generated.airbyte_incremental
2022-09-12 16:55:46 normalization > Found 4 models, 0 tests, 0 snapshots, 0 analyses, 572 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
2022-09-12 16:55:47 normalization-orchestrator > (pod: airbyte / normalization-snowflake-normalize-73-0-qpbnw) - Closed all resources for pod
2022-09-12 16:55:47 normalization-orchestrator > Normalization executed in 27 seconds.
2022-09-12 16:55:47 normalization-orchestrator > Normalization Failed.
2022-09-12 16:55:47 normalization-orchestrator > Normalization summary: io.airbyte.config.NormalizationSummary@6d6039df[startTime=1663001719329,endTime=1663001747325,failures=[io.airbyte.config.FailureReason@611c3eae[failureOrigin=normalization,failureType=system_error,internalMessage=,externalMessage=Normalization failed during the dbt run. This may indicate a problem with the data itself.,metadata=io.airbyte.config.Metadata@13e00016[additionalProperties={attemptNumber=0, jobId=73, from_trace_message=true}],stacktrace=AirbyteDbtError:
Encountered an error:
Compilation Error in macro list_relations_without_caching (macros/adapters/metadata.sql)
Too many schemas in schema "DEV_HOSPENG_REPORTING".PUBLIC! dbt can only get
information about schemas with fewer than 10000 objects.
The error is strange because the source and destination databases are not large with not much data in them. I have searched this forum for that error and came up empty. I also searched the airbyte and dbt github repos and also could not find these errors.
I got some feedback from a colleague with extensive Kubernetes experience and they think the data is being stored in a leftover persistent volume from the previous deploy.
Could you run kubectl get pv to verify the age and reclaim policy of the persistent volume? There’re a few different things we can do depending on whether it’s Delete or Retain.
Thanks – I think I found the issue. We have thousands of these AIRBYTE_TMP* tables that it appears to have generated multiples of with these randomized three letter codes embedded in them.
@natalyjazzviolin can you help me understand how we can avoid this in the future… what was the root cause and how we can always prevent something like this from happening.
To summarize the history:
Deploy airbyte in our k8s cluster with zero modifications to the deployment specs (i.e. clone airbyte repo and kubectl apply the specs as is.
Setup source (PostgreSQL which is in the k8s cluster) and destination (Snowflake) – OK
Setup a sync between the two
Run it OK – all is well, the only transform we set it to was tabular normalization default
Re-deploy airbyte specs in its own airbyte namespace in kubernetes.
Delete the original airbyte deployment in the default namespace
Setup source (PostgreSQL which is in the k8s cluster) and destination (Snowflake) – OK
Setup a sync between the two and let it run (Friday)
Come back and check Monday and we have thousands and thousands of temp tables and all the syncs are failing and every continued attempt seems to generate more temp tables.
It can’t be a Persistent Volume Claim / Persistent Volume issue because these deployments are in totally separate namespaces and have no access to each other’s stuff. So the PVC that may have been created by the original deployment in the default namespace would not have been accessible when we re-deployed airbyte in the airbyte namespace and it would have just generated a new one. The only thing is, the Snowflake environment already had a sync of the data and a bunch of _AIRBYTE_* tables…but for whatever reason, the new deployment didn’t understand and went haywire, it seems.
So the temp tables issue I can answer to right now - unfortunately when a sync fails, temp tables are not deleted. This is on our roadmap to fix: https://github.com/airbytehq/airbyte/issues/4059
The only time they are deleted is when a successful sync runs.
Did you have a chance to run the kubectl get pv command?
Setting up a new sync from Postgres => Snowflake with standard tabular normalization it fails with the following:
Database Error in model TAX_SETTINGS (models/generated/airbyte_tables/PUBLIC/TAX_SETTINGS.sql)
002003 (42S02): SQL compilation error:
Object 'DEV_HOSPENG_REPORTING.PUBLIC._AIRBYTE_RAW_TAX_SETTINGS' does not exist or not authorized.
compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/PUBLIC/TAX_SETTINGS.sql
All the dbt model transforms appear to be failing to create the _airbyte_raw_ tables. Airbyte is creating the _airbyte_tmp_ tables so it wouldn’t seem to be a permissions problem. Can you help me understand what is wrong here?
Hi there, future reader here! I’m running into a similar problem. Could you potentially share what the something was that was “set incorrectly in the destination configuration” ?