Incremental Sync Issue with Custom Connector

Summary

Custom connector has an issue with incremental sync on dates with no data, causing all dates to be skipped until the last day with empty results.


Question

Hello
i have a sync problem on a custom connector. i have an incremental sync on a date, but some days i have no data and therefore empty results. when syncing, all dates are skipped until the last of the day when empty results are returned to an api call.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["incremental-sync", "custom-connector", "empty-results", "api-call"]

Based on the information provided in the extracts, it seems like you’re facing an issue with incremental syncs on your custom connector when there’s no data for certain days. This could be due to how the connector is handling the state or the cursor field for incremental syncs. In some cases, the connector might not be saving the state correctly, which could lead to skipping dates until it encounters a day with data. You can check the state after a sync in the Settings tab in the Connection page to see if your custom connector is saving the state correctly (source). If your API doesn’t have a “last_modified” field or equivalent, you might need to implement a workaround. One approach could be to fetch all the data for the first sync, and then fetch the last month’s data for each subsequent sync and append the new data. However, this approach might not be ideal if there are no changes in the data (source). Another approach could be to pull data in overlapping time periods to prevent potential problems with incomplete data. For example, you could grab data every week with lastmodified=now - 8 days (source). If your API doesn’t allow filtering records by a date field and only provides a paginated “feed” of data ordered from newest to oldest, you could set the “API time filtering capabilities” option to “No filter”. The connector will then automatically request the next page of records until the cutoff datetime is encountered (source). Please note that these are general suggestions and the exact solution might depend on the specifics of your API and connector. If you’re still facing issues, it might be helpful to post your question on the Airbyte forum or GitHub for more targeted assistance.

<@U04SE163WC8>
my API allows filtering by a date and only returns records from that date

but somedays, no record exist. In that case i want the datetime cursor to increment by 1 day and not skip to today’s date

New message text here

<@U04SE163WC8>
When synchronizing, airbyte no longer creates new tables for array or jsonified data columns

A message from http://kapa.ai|kapa.ai

Yes, you’re correct. As part of Airbyte’s normalization process, it no longer creates new tables for array or JSONified data columns. Instead, it transforms the data after a sync to denest columns into their own tables. This process is called Basic Normalization. For every connection, you can choose between two options: 1. Basic Normalization: Airbyte converts the raw JSON blob version of your data to the format of your destination. Note that not all destinations support normalization. 2. Raw data (no normalization): Airbyte places the JSON blob version of your data in a table called _airbyte_raw_&lt;stream name&gt; When basic normalization is enabled, Airbyte transforms data after the sync in a step called Basic Normalization, which structures data from the source into a format appropriate for consumption in the destination. For example, when writing data from a nested, dynamically typed source like a JSON API to a relational destination like Postgres, normalization is the process which un-nests JSON from the source into a relational table format which uses the appropriate column types in the destination. Without basic normalization, your data will be written to your destination as one data column with a JSON blob that contains all of your data. This is the _airbyte_raw_ table that you may have seen before. If you have Basic Normalization enabled, Airbyte automatically uses this JSON blob to create a schema and tables with your data in mind, converting it to the format of your destination. This runs after your sync and may take a long time if you have a large amount of data synced. If you don’t enable Basic Normalization, you’ll have to transform the JSON data from that column yourself. For more details, you can refer to the Basic Normalization documentation (source).

<@U04SE163WC8> how to enable typing and deduping in airbyte v2 ?

Typing and Deduping is a feature of Airbyte Destinations V2 and is the default method of transforming datasets within data warehouse and database destinations after they’ve been replicated. It transforms the raw data which is transmitted during a sync into easy-to-use final tables for database and data warehouse destinations. To enable Typing and Deduping, you need to use a destination connector that supports this feature. You can check each destination’s documentation to learn if Typing and Deduping is supported. Here are some key features of Airbyte Destinations V2: - One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. - Improved per-row error handling with _airbyte_meta: Airbyte will now populate typing errors in the _airbyte_meta column instead of failing your sync. You can query these results to audit misformatted or unexpected data. - Internal Airbyte tables in the airbyte_internal schema: Airbyte will now generate all raw tables in the airbyte_internal schema. We no longer clutter your desired schema with raw data tables. - Incremental delivery for large syncs: Data will be incrementally delivered to your final tables when possible. No more waiting hours to see the first rows in your destination table. For more information on how typing and deduplication works, you can refer to the Typing and Deduping documentation (source). Please note that not all destinations support Typing and Deduping. For example, the “Destination Clickhouse” connector had its normalization process removed and will only support Typing and Deduping if there is sufficient interest (source).

<@U04SE163WC8> how can i still have a basic normalization ?

To enable Basic Normalization in Airbyte, you can toggle it on or disable it in the “Normalization and Transformation” section when setting up your connection. This is an optional feature and can be configured during the connection setup and after in the Transformation Tab. You would need to select the option: Normalized tabular data. Please note that not all destinations support Basic Normalization. The destinations that currently support Basic Normalization include BigQuery, MS Server SQL, MySQL, Postgres, Redshift, and Snowflake. For MySQL, the server must support the WITH keyword and require MySQL >= 8.0, or MariaDB >= 10.2.1. When Basic Normalization is enabled, Airbyte transforms data after the sync in a step called Basic Normalization, which structures data from the source into a format appropriate for consumption in the destination. For example, when writing data from a nested, dynamically typed source like a JSON API to a relational destination like Postgres, normalization is the process which un-nests JSON from the source into a relational table format which uses the appropriate column types in the destination. For more details, you can refer to the Basic Normalization documentation (source).

<@U04SE163WC8> i am on airbyte v2. how can i normalize data being synced when basic normalization no longer exists ?

In Airbyte V2, Basic Normalization is being replaced by Typing and Deduping as part of the new Destinations V2. This new feature transforms the raw data which is transmitted during a sync into easy-to-use final tables for database and data warehouse destinations. Here are some key features of Airbyte Destinations V2: - One-to-one table mapping: Data in one stream will always be mapped to one table in your data warehouse. No more sub-tables. - Improved per-row error handling with _airbyte_meta: Airbyte will now populate typing errors in the _airbyte_meta column instead of failing your sync. You can query these results to audit misformatted or unexpected data. - Internal Airbyte tables in the airbyte_internal schema: Airbyte will now generate all raw tables in the airbyte_internal schema. We no longer clutter your desired schema with raw data tables. - Incremental delivery for large syncs: Data will be incrementally delivered to your final tables when possible. No more waiting hours to see the first rows in your destination table. To enable Typing and Deduping, you need to use a destination connector that supports this feature. You can check each destination’s documentation to learn if Typing and Deduping is supported. For more information on how typing and deduplication works, you can refer to the Typing and Deduping documentation (source). Please note that not all destinations support Typing and Deduping. For example, the “Destination Clickhouse” connector had its normalization process removed and will only support Typing and Deduping if there is sufficient interest (source).

<@U04SE163WC8> tell me how i can do with v2 to unnest data after sync

<@U04SE163WC8> you didn’t answer correctly. i want to unnest data with v2. how can i do it ?

I apologize for the confusion. In Airbyte Destinations V2, the automatic unnesting of JSON columns into separate tables is no longer performed. This is a significant change from the previous version where Airbyte would unnest JSON columns into separate tables as part of the Basic Normalization process. In the new version, the data from one stream will be mapped to one table in your schema. For example, if you have a JSON column named address with properties city and zip, these will no longer be unnested into a separate public.users_address table with columns city and zip. Instead, the address column will be kept as a JSON column in the public.users table (source). If you still need to unnest your data after it has been synced, you would need to perform this operation manually in your destination using the appropriate SQL commands or functions provided by your data warehouse or database system. For example, in BigQuery, you can use the JSON_EXTRACT function to extract a specific property from a JSON string, or the JSON_EXTRACT_ARRAY function to extract an array from a JSON string. In Postgres, you can use the jsonb_array_elements function to unnest a JSON array. Please note that the actual SQL commands or functions you need to use will depend on the specific requirements of your data and the capabilities of your data warehouse or database system.

<@U04SE163WC8>
What’s the last version of airbyte without v2 ?

I’m sorry, but the provided extracts do not contain information about the last version of Airbyte without V2. For the most accurate information, please refer to the Airbyte Changelog or the Airbyte GitHub repository.