Managing and Deleting Inactive and Deprecated Connections in Airbyte

Summary

Airbyte doesn’t delete connections but alters their status in the database. User facing challenges with UI loading due to a large number of connections, seeking guidance on safely removing unnecessary inactive and deprecated connections.


Question

Hello everyone :wave:

We have been using Airbyte for an extended period, and recently, we encountered a challenge with the UI taking a considerable time to load connections. To address this, we extended the gateway timeout, but the root cause appears to be the substantial number of connections.

Upon investigation, it came to our attention that Airbyte doesn’t delete connections. Instead, it alters their status in the database, transitioning between “active,” “inactive,” and “deprecated.”

Presently, we have 98 connections marked as “active,” but a whopping 1000 connections reside in the “deprecated” category. We are considering whether this abundance might be contributing to the delay in UI loading.

Our preference is to clean up (delete permanently) unnecessary inactive and deprecated connections. Is there a recommended, non-intrusive method to remove them without directly accessing the production database and risking unintended consequences?

If any of you have encountered a similar issue or have insights on how to efficiently manage this situation, your guidance would be highly appreciated.

Thank you in advance.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["airbyte", "connections", "ui-loading", "inactive-connections", "deprecated-connections", "database", "cleanup"]

You can use the Config API to delete connections:
https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/delete

or the OSS API: https://reference.airbyte.com/reference/deleteconnection

probably use the OSS API if you can

Or are you saying that these connections in deprecated status are impacting performance?

Because at the moment; I’m pretty sure nothing is ever deleted in the airbyte DB; its just marked deleted/deprecated/etc…

there is a method (that is never called) which does delete a connection: it looks something like this:

   * Deletes a connection (sync) and all of dependent resources (state and connection_operations).
   *
   * @param syncId - id of the sync (a.k.a. connection_id)
   * @throws IOException - error while accessing db.
   */
  @Override
  public void deleteStandardSync(UUID syncId) throws IOException {
    database.transaction(ctx -> {
      deleteConfig(NOTIFICATION_CONFIGURATION, NOTIFICATION_CONFIGURATION.CONNECTION_ID, syncId, ctx);
      deleteConfig(CONNECTION_OPERATION, CONNECTION_OPERATION.CONNECTION_ID, syncId, ctx);
      deleteConfig(STATE, STATE.CONNECTION_ID, syncId, ctx);
      deleteConfig(CONNECTION, CONNECTION.ID, syncId, ctx);
      return null;
    });
  }```

so from that, you can pretty much infer that you need to make a transaction that deletes the object from these 4 tables by connection id; however afaik there is no way to trigger this code in airbyte from any API:

DELETE FROM "notification_configuration" WHERE connection_id = 'connection-uuid';
DELETE FROM "connection_operation" WHERE connection_id = 'connection-uuid';
DELETE FROM "state" WHERE connection_id = 'connection-uuid';
DELETE FROM "connection" WHERE connection_id = 'connection-uuid';
COMMIT;```

maybe they are saving this for a periodic cleanup routine that is planed to run and clean up all deprecated resources past a certain threshold.

Thank you so much for your reply Justen. Yes, I think the deprecated connections are impacting performance because there are a lot of them

Airbyte general performance is normal, the only impact is on the UI when loading the connections. Maybe is not the deprecated connections, but I have no other hypothesis right now

I’m not a DBA, but possibly one or more of these tables is missing an index that would help it more efficiently enumerate active workflows. I checked the code an there is an index on status; but perhaps how it is being queried makes this index less useful. Would have to figure out what query is actually being performed and analyze it to learn more.

the use of jooq obfuscates the actual query; so may need to look at postgres logs if you have query logging enabled

        .select(
            CONNECTION.asterisk(),
            groupConcat(CONNECTION_OPERATION.OPERATION_ID).separator(OPERATION_IDS_AGG_DELIMITER).as(OPERATION_IDS_AGG_FIELD),
            SCHEMA_MANAGEMENT.AUTO_PROPAGATION_STATUS)
        .from(CONNECTION)

        // left join with all connection_operation rows that match the connection's id.
        // left join includes connections that don't have any connection_operations
        .leftJoin(CONNECTION_OPERATION).on(CONNECTION_OPERATION.CONNECTION_ID.eq(CONNECTION.ID))
        // The schema management can be non-existent for a connection id, thus we need to do a left join
        .leftJoin(SCHEMA_MANAGEMENT).on(SCHEMA_MANAGEMENT.CONNECTION_ID.eq(CONNECTION.ID))
        // join with source actors so that we can filter by workspaceId
        .join(ACTOR).on(CONNECTION.SOURCE_ID.eq(ACTOR.ID))
        .where(ACTOR.WORKSPACE_ID.eq(standardSyncQuery.workspaceId())
            .and(standardSyncQuery.destinationId() == null || standardSyncQuery.destinationId().isEmpty() ? noCondition()
                : <http://CONNECTION.DESTINATION_ID.in|CONNECTION.DESTINATION_ID.in>(standardSyncQuery.destinationId()))
            .and(standardSyncQuery.sourceId() == null || standardSyncQuery.sourceId().isEmpty() ? noCondition()
                : <http://CONNECTION.SOURCE_ID.in|CONNECTION.SOURCE_ID.in>(standardSyncQuery.sourceId()))
            .and(standardSyncQuery.includeDeleted() ? noCondition()
                : CONNECTION.STATUS.notEqual(
                    StatusType.deprecated)))

        // group by connection.id so that the groupConcat above works
        .groupBy(CONNECTION.ID, SCHEMA_MANAGEMENT.AUTO_PROPAGATION_STATUS)).fetch();

    final List&lt;UUID&gt; connectionIds = connectionAndOperationIdsResult.map(record -&gt; record.get(CONNECTION.ID));

    return getStandardSyncsFromResult(connectionAndOperationIdsResult, getNotificationConfigurationByConnectionIds(connectionIds));```
the code looks list this

that’s great, I will investigate that, it will be great if an index solves the problem. Thank you! :keanu-thanks:

if you find an index that speeds this up; it would probably be good to open a PR on http://github.com/airbytehq/airbyte-platform|github.com/airbytehq/airbyte-platform

I run into server slowness a lot, I suspect db contention especially if you have lots of list queries. lots of FTS most likely causing lock contention with updates

but depends on your use-case; if your configuration is mostly static; then I don’t know why there would be lock contention

I don’t think that’s the case, I don’t have a lot of list queries

~Actually, this may be N+1 query problem~ nvm