Issue with destination-postgresql connector staging table truncation

slack-user-airbyte · September 26, 2024, 6:12am

Summary

The user is experiencing an issue with the destination-postgresql connector where the staging table created by Airbyte gets truncated to 64 characters when the schema name + table name is longer. This leads to multiple streams writing to the same staging table and causing data mix-up. The user has raised an open issue on GitHub and is seeking clarification on whether this logic is controlled by the destination or if it’s a fundamental issue in Airbyte Core.

Question

Hi team, there is an issue we are seeing with the destination-postgresql connector (2.4.0) that I can’t tell is isolated to it , or how airbyte stages and writes data that is pulls from a source. Basically when the schema name + table name is longer than 64 characters, the staging table that the stream creates in airbtye_internal gets truncated to 64 characters. The problem is when two streams truncate to the same string, both write their streams to the same staging table, and both streams then try moving the data from that table to each of their final landed stream tables, leaving both streams a mess. I’ve documented what could be a better strategy in this open issue: https://github.com/airbytehq/airbyte/issues/45345. I’m open to helping implement a fix, but I’m not very familiar with this part of the architecture. Is this logic controlled by the destination or is this something more fundamental to Airbyte Core that would need looked at?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["issue", "destination-postgresql-connector", "staging-table", "truncation", "data-mix-up", "open-issue", "architecture"]}

slack-user-airbyte · September 27, 2024, 6:22am

I’ve put up a PR for a proposed fix for <Issues · airbytehq/airbyte · GitHub 45345>: <https://github.com/airbytehq/airbyte/pull/45941|PR 45941 - Destination Postgres - Fix Conflicting relations and truncation causing streams to write raw streams to same table>

I’ve tried to document the PR, the approach, and a debate that I think could be had if the actual fix for this should be deeper in the CDK vs at the connector level. I am also still working on getting all the unit tests passing, but I’m not a Java/Kotlin expert, and certainly not when the class hierarchy jumps from Koitlin to Java and Back to Kotlin Sharing here in the hope that the community or devs can help take this the rest of the way.

Topic		Replies	Views
Postgres destination table name truncation rules Q&A source-postgres , connectors	2	529	August 26, 2022
Proposed fix for conflicting relations and truncation in Destination Postgres Connector Questions destination-postgres , cdk , connector , java , pr	0	4	September 27, 2024
Airbyte tables truncated before July 22nd 10pm UTC Platform Questions platform , data-loss , question , postgres-destination-v2 , airbyte-tables	2	15	July 28, 2024
Issue with long table names not being auto-truncated in Postgres destination connector Connector Questions connector , bug , postgres-connector , long-table-names , auto-truncate	0	15	September 10, 2024
Reducing size of raw data table in Airbyte Cloud Platform Questions platform , cdc , airbyte-cloud , data-replication , mysql-connector	3	467	June 28, 2024

Issue with destination-postgresql connector staging table truncation

Summary

Question

Related topics