Handling upserts in Airbyte with Glue Catalog and Iceberg

Summary

Exploring how Airbyte handles upserts with Glue Catalog and Iceberg directly.


Question

Hey everyone, i just found out that airbyte supports glue catalog and was exploring it. I wanted to understand how airbyte handles upserts or does it even handle upserts into iceberg directly?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte", "glue-catalog", "iceberg", "upserts"]

Hey <@U05JENRCF7C>, any idea on this?

Nope. I haven’t used Iceberg format/connector with Airbyte. When in doubt, check the code https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-iceberg

Hey thanks for the response. Yeah, i have already started diving into the code. Running airbyte in local and trying to see how it works in real.
Currently testing, how the overwrite sync mode would work. I have found that spark dataframes support overwriting partitions but running MERGE INTO command is how upsert is suggested. However in the code, there was no such commands.

One interesting observation is, airbyte outputs a json text representing the extracted data. This was there in the docs but did not pay attention. This ideally means, i cannot directly query from iceberg after it is synced which has been disappointing to me.