Using State from Existing Connection for New Connection in API Extraction

slack-user-airbyte · May 14, 2024, 4:01pm

Summary

Exploring the possibility of utilizing state from an existing connection to establish a new connection in API-based extractions to avoid full sync.

Question

Can I use State from an existing connection to create a new connection so that I do not do a full sync this is for API based extractions

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

_{["state", "existing-connection", "new-connection", "API", "extraction"]}

slack-user-airbyte · May 16, 2024, 1:28pm

Yes, you can use the state from an existing connection to create a new connection and avoid a full sync. This is particularly useful for incremental syncs. Here are the steps you can follow: 1. Navigate to the existing connection you are duplicating, and go to the Settings tab. Open the Advanced settings to see the connection state, which manages incremental syncs. Copy the state to your clipboard[^1^]. 2. Go to your newly created connection, replace the state with the copied contents from the previous step, then click Update State. This will ensure historical data is not replicated again[^1^]. 3. Enabling the connection will now provide a parallel copy of all streams in the updated format[^1^]. 4. You can move your dashboards to rely on the new tables, then pause the out-of-date connection[^1^]. Please note that updates to the connection state should be handled with extreme care. Updates may break your syncs, requiring a reset to fix. Make changes only as directed by the Airbyte team[^5^]. [^1^]: Source [^5^]: Source

slack-user-airbyte · May 16, 2024, 1:28pm

<@U04SE163WC8> - Does Aitbyte use parallel burst when extracting data using low code approach

slack-user-airbyte · May 16, 2024, 1:28pm

Hmm, I don’t know enough to give you a confident answer yet. The extracts provided do not contain specific information about Airbyte using parallel burst when extracting data using the low-code approach. However, there is an ongoing discussion about parallel data replication in Airbyte on their Github page. For more detailed information, you might want to check out the Airbyte Documentation.

slack-user-airbyte · May 16, 2024, 1:28pm

does airbyte use parallelization for data extraction and processing

slack-user-airbyte · May 16, 2024, 1:28pm

<@U04SE163WC8> does airbyte use parallelisation for data extraction and processing

slack-user-airbyte · May 16, 2024, 1:28pm

A message from http://kapa.ai|kapa.ai

slack-user-airbyte · May 16, 2024, 1:28pm

Yes, Airbyte does use parallelization for data extraction and processing. This is achieved through the configuration of workers. Airbyte exposes several environment variables to change the maximum number of each type of worker allowed to run in parallel. These include: - MAX_SPEC_WORKERS: Maximum number of Spec workers allowed to run in parallel. - MAX_CHECK_WORKERS: Maximum number of Check connection workers allowed to run in parallel. - MAX_DISCOVERY_WORKERS: Maximum number of Discovery workers allowed to run in parallel. - MAX_SYNC_WORKERS: Maximum number of Sync workers allowed to run in parallel. The current default value for these environment variables is set to 5. Tweaking these values might help you run more jobs in parallel and increase the workload of your Airbyte instance. In addition, there is an ongoing project around parallel data replication in Airbyte, which is focused on increasing the performance of replication. This includes handling a source with a lot of streams, but each individual stream is a manageable size, and handling a source that has one huge stream. For more information, you can refer to the Airbyte Documentation and the Github Issue on parallel data replication.

slack-user-airbyte · May 16, 2024, 1:28pm

<@U04SE163WC8> do you still use Octavia CLI

slack-user-airbyte · May 16, 2024, 1:28pm

Yes, Octavia CLI is still used to manage Airbyte configurations. It provides commands to import, edit, and apply Airbyte resource configurations such as sources, destinations, and connections. It is particularly useful for version controlling resource configurations, reviewing resource configuration changes, and copying resource configurations between Airbyte instances. Octavia CLI also allows you to programmatically create resources when you have more connections or your connections contain several Airbyte streams. You can define configurations in YAML programmatically, bootstrap an Octavia CLI project, import the configuration of an existing instance, create source and destination configurations in YAML, create connection configurations in YAML, edit YAML configuration files, and apply configurations to a different instance. For more details, you can check the Octavia CLI documentation and join the octavia-cli channel in the Airbyte community slack. Please note that Octavia CLI manages Airbyte Configurations as Code (CaC) and not Airbyte Infrastructure as Code (IaC). Thus, Octavia CLI cannot provision an Airbyte instance. Sources: - Version control Airbyte configurations with Octavia CLI - Octavia CLI use cases - Programmatically create resources

slack-user-airbyte · May 16, 2024, 1:28pm

<@U04SE163WC8> - Do we have plan for Datadog as a destination

slack-user-airbyte · May 16, 2024, 1:28pm

Based on the provided extracts, there is no explicit mention of plans to include Datadog as a destination in Airbyte. Currently, Airbyte has integrated Datadog as a source and for monitoring purposes. For monitoring, Airbyte provides extensive logs for each connector and has an integration with Datadog that allows you to monitor and analyze your data pipelines within your Datadog dashboards at no additional cost. This integration is still in development and is expected to be available only for Airbyte Enterprise customers in the future (source). As a source, the Datadog connector in Airbyte collects data from the Datadog API. It supports full refresh sync, incremental sync, and SSL connection (source). For the most accurate and up-to-date information, you may want to check the Airbyte Github page or the Airbyte Documentation.

slack-user-airbyte · May 16, 2024, 1:29pm

<@U04SE163WC8> Can we mark a stream to execute in parallel calls in UI Builder

slack-user-airbyte · May 16, 2024, 1:29pm

Based on the provided extracts, there is no explicit mention of a feature in the Airbyte UI Builder that allows marking a stream for execution in parallel calls. The UI Builder is designed to help you create and configure connectors, but it doesn’t specify options for parallel execution of streams. However, there is an ongoing effort to implement a concurrency feature for API sources in Airbyte, as mentioned in the Support QA. This feature is being tested with the Stripe connector and may be extended to other connectors in the future. For the most accurate and up-to-date information, you may want to check the Airbyte Github page or the Airbyte Documentation.

slack-user-airbyte · May 16, 2024, 1:29pm

<@U04SE163WC8> - Is parallel stream execution is possible

slack-user-airbyte · May 16, 2024, 1:29pm

Yes, parallel stream execution is possible in Airbyte. However, as of the current information provided, API sources aren’t parallel. There is ongoing work to implement a concurrency feature, which is being tested with the Stripe connector and may later be extended to other connectors. This means that in the future, multiple streams could potentially be run in parallel, increasing the efficiency of data ingestion. In terms of worker configuration, Airbyte allows you to set the maximum number of each type of worker that can run in parallel. This includes MAX_SPEC_WORKERS, MAX_CHECK_WORKERS, MAX_DISCOVERY_WORKERS, and MAX_SYNC_WORKERS. Adjusting these values can help you run more jobs in parallel and increase the workload of your Airbyte instance. For more information, you can refer to the Support QA and the Airbyte Documentation on Configuring Jobs & Workers.

slack-user-airbyte · May 16, 2024, 1:30pm

Okay there’s a lot

Is low code using concurrent execution: no, not yet. We will add it at some point, but no target date yet. We have concurrency support in Python CDK though.
datadog destination: not planned yet.

slack-user-airbyte · May 16, 2024, 1:30pm

kapa was keeping me company

Topic		Replies	Views
Maintaining Cursor in Connection State for Incremental Parent Stream Connector Development api , connector , question , incremental-parent-stream , sync	0	51	May 14, 2024
A detailed tutorial on how to build a real-world Airbyte source connector Connector Development	3	2151	July 14, 2022
Working with API Custom Connectors Connector Questions connector , question , data-retrieval , api-custom-connectors , multiple-endpoints	0	41	May 16, 2024
Adding new low-code connector with API Connector Development documentation , api , connector , question , low-code-connector	5	170	May 16, 2024
Testing API connection for destination creation API, Terraform and Other Topics api , question , testing , api-connection , destination-creation	0	31	May 16, 2024

Using State from Existing Connection for New Connection in API Extraction

Summary

Question

Related topics