Incremental sync from Postgres to S3 Data Lake with partition by date

slack-user-airbyte · May 14, 2024, 6:12pm

Summary

The user is asking if it is possible to perform an incremental sync from a Postgres database to an S3 Data Lake with partitioning by date. They provided an example of a source table partitioned by date and mentioned the desired format for storing data in S3.

Question

Is it possible Incremental sync from Postgres to S3 Data Lake with partition by date? For example:
I have source table:

(
    dt    timestamp not null,
    key   integer   not null,
    value integer,
    constraint pk_partitioned_table_cdc
        primary key (dt, key)
)
    partition by RANGE (dt);```
And two partition with a few records:
`partitioned_table_cdc_y2024m04d01`
`partitioned_table_cdc_y2024m04d02`

And I want put data in S3 in Hive format
```    partitioned_table_cdc/year=2024/month=04/day=01/*.parquet
    partitioned_table_cdc/year=2024/month=04/day=02/*.parquet```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1712125331600659) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["incremental-sync", "postgres", "s3-data-lake", "partition-by-date", "hive-format"]
</sub>

slack-user-airbyte · May 16, 2024, 1:40pm

I tried using connector AWS Datalake v0.1.6 but didn’t get the results I wanted. On the AWS side I have Glue Data Catalog and AWS Lakeformation. And I get a non-partitioned table in .parquet with data duplication after Incremental Sync . If select option Lake Formation Governed Tables an error occurs -

Could not create table airbyte_test_mwIXw in database data_quality: InvalidInputException('An error occurred (InvalidInputException) when calling the CreateTable operation: Location for GOVERNED table is not registered.')

Topic		Replies	Views
Implementing Incremental Sync from PostgreSQL to S3 Connector Questions cdc , connector , full-refresh , incremental-sync , postgresql	0	0	November 28, 2024
Incremental Sync on Partitioned Tables in PostgreSQL for BigQuery Sync Connector Questions connector , bigquery , incremental-sync , question , postgresql	2	1	November 21, 2024
Implementing incremental sync from PostgreSQL via CDC without primary key to S3 with Full refresh sync mode Connector Questions cdc , connector , full-refresh , incremental-sync , postgresql	9	40	May 16, 2024
Issue with Incremental Data Syncing from PostgreSQL to Snowflake Connector Questions cdc , connector , question , postgresql , snowflake	0	2	November 26, 2024
Issue with Incremental Data Sync from PostgreSQL to Snowflake Connector Questions cdc , airbyte , connector , question , postgresql	3	0	December 12, 2024

Incremental sync from Postgres to S3 Data Lake with partition by date

Summary

Question

Related topics