The airbyte-database

This service is the internal database used by Airbyte to manage all connectors, states, jobs and connections.

You can split the database into two:

  • Configs: store data about connectors (sources and destination) and the information about the instance (workspace id, migrations

  • Jobs: anything related to the sync jobs

In this module Airbyte uses two tools: jOOQ and Flyway (which we talked previously)

:memo: jOOQ, is a library that helps developers interact with databases in a more convenient and type-safe way using the Java programming language. It provides a way to write database queries using Java code instead of writing raw SQL statements. In simple terms, jOOQ acts as a bridge between your Java code and the database. It allows you to define your database schema in Java classes, representing tables, columns, and relationships. This helps in maintaining a clear and structured representation of the database structure in your code.

Databases and Tables

There are a lot of tables inside the airbyte-db we’re going to talk about the most important ones. To extract this result I have a local setup and access it using Docker with following command:

docker exec -ti airbyte-db psql -U docker airbyte

\d will output all tables

\d actor give all the columns to the specified table

Schema Name Type Database
public active_declarative_manifest table configs
public actor table configs
public actor_catalog table configs
public actor_catalog_fetch_event table configs
public actor_definition table configs
public actor_definition_config_injection table configs
public actor_definition_workspace_grant table configs
public actor_oauth_parameter table configs
public airbyte_configs table configs
public airbyte_configs_migrations table configs
public airbyte_jobs_migrations table
public airbyte_metadata table jobs
public attempts table jobs
public connection table configs
public connection_operation table configs
public connector_builder_project table configs
public declarative_manifest table configs
public jobs table jobs
public normalization_summaries table jobs
public notification_configuration table configs
public operation table configs
public secrets table
public state table configs
public stream_reset table configs
public stream_stats table jobs
public sync_stats table jobs
public workspace table configs
public workspace_service_account table configs

Airbyte Database Diagram

This diagram was generated using Airbyte 0.44.5 version with DBeaver tool.

actor: stores all info about connectors created during the operation

actor_catalog: store the schema for source connectors

actor_definition: store the Airbyte connector catalog and their definitions

connection: all data about the connection created

jobs: store data about all jobs, if they succeeded or failed

attempts: each job will trigger some attempts

state: stores all states row individualy, which makes easier to read

operation: Airbyte calls Normalization and other tasks as operation.

secrets: OSS basic structure to store secrets

Guides

How to import/export Airbyte to a new instance (docker to docker deploy)

How to restore a deleted connection

Additional Resources:

  • Airbyte documentation about the database

  • (some thoughts) Is there a better way to keep the documentation updated reflect the current tables and columns?