c.*,
string_agg(co.operation_id::text,',') as operation_ids_agg,
sm.auto_propagation_status
FROM
connection c
LEFT JOIN connection_operation co ON co.connection_id = c.id
LEFT JOIN schema_management sm ON sm.connection_id = c.id
JOIN actor a ON c.source_id = a.id
WHERE
a.workspace_id = '47d889b1-8b87-42ee-8082-f04e13765c7a'::uuid
AND c.status <> 'deprecated'
GROUP BY
c.id, sm.auto_propagation_status;```
The query should be something like this in postgres.
my explain plan on this ended up like this:
Group Key: c.id, sm.auto_propagation_status
-> Sort (cost=1924.96..1925.27 rows=124 width=1841) (actual time=1.584..1.595 rows=122 loops=1)
Sort Key: c.id, sm.auto_propagation_status
Sort Method: quicksort Memory: 240kB
-> Nested Loop Left Join (cost=6.86..1920.65 rows=124 width=1841) (actual time=0.070..1.432 rows=122 loops=1)
-> Nested Loop Left Join (cost=6.58..1873.01 rows=124 width=1837) (actual time=0.064..1.051 rows=122 loops=1)
-> Nested Loop (cost=6.43..1840.10 rows=124 width=1821) (actual time=0.061..0.945 rows=122 loops=1)
-> Bitmap Heap Scan on actor a (cost=6.14..717.10 rows=240 width=16) (actual time=0.052..0.312 rows=240 loops=1)
Recheck Cond: (workspace_id = '47d889b1-8b87-42ee-8082-f04e13765c7a'::uuid)
Heap Blocks: exact=226
-> Bitmap Index Scan on actor_workspace_id_idx (cost=0.00..6.08 rows=240 width=0) (actual time=0.024..0.024 rows=264 loops=1)
Index Cond: (workspace_id = '47d889b1-8b87-42ee-8082-f04e13765c7a'::uuid)
-> Index Scan using connection_source_id_idx on connection c (cost=0.28..4.67 rows=1 width=1821) (actual time=0.002..0.002 rows=1 loops=240)
Index Cond: (source_id = a.id)
Filter: (status <> 'deprecated'::status_type)
-> Index Scan using connection_operation_connection_id_idx on connection_operation co (cost=0.15..0.23 rows=4 width=32) (actual time=0.000..0.000 rows=0 loops=122)
Index Cond: (connection_id = c.id)
-> Index Scan using connection_idx on schema_management sm (cost=0.28..0.37 rows=1 width=20) (actual time=0.003..0.003 rows=1 loops=122)
Index Cond: (connection_id = c.id)
Planning Time: 0.577 ms
Execution Time: 1.794 ms```
Most of the cost/time is spent joining actor
to connection
– probably not a lot that can be done there given the current table architecture. Possibly if the connections
table already had the workspace_id
as a column; this would be faster?
so yeah, i guess the only way to speed this up at the moment is have less rows to scan - ie delete the connections and their associated source/dest configs
Yeap, but the weird thing is: The query runs fast on the database. So maybe it’s not the problem .
Group Key: c.id, sm.auto_propagation_status
-> Sort (cost=468.53..468.79 rows=102 width=1376) (actual time=20.420..20.436 rows=102 loops=1)
Sort Key: c.id, sm.auto_propagation_status
Sort Method: quicksort Memory: 194kB
-> Nested Loop (cost=314.28..465.13 rows=102 width=1376) (actual time=16.508..20.199 rows=102 loops=1)
-> Hash Right Join (cost=314.01..343.48 rows=102 width=1376) (actual time=16.490..16.608 rows=102 loops=1)
Hash Cond: (co.connection_id = c.id)
-> Seq Scan on connection_operation co (cost=0.00..24.91 rows=991 width=32) (actual time=0.816..1.680 rows=991 loops=1)
-> Hash (cost=312.73..312.73 rows=102 width=1360) (actual time=14.693..14.695 rows=102 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 141kB
-> Hash Right Join (cost=282.20..312.73 rows=102 width=1360) (actual time=11.782..14.606 rows=102 loops=1)
Hash Cond: (sm.connection_id = c.id)
-> Seq Scan on schema_management sm (cost=0.00..27.71 rows=1071 width=20) (actual time=0.009..2.663 rows=1071 loops=1)
-> Hash (cost=280.93..280.93 rows=102 width=1356) (actual time=11.765..11.766 rows=102 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 140kB
-> Seq Scan on connection c (cost=0.00..280.93 rows=102 width=1356) (actual time=0.337..11.614 rows=102 loops=1)
Filter: (status <> 'deprecated'::status_type)
Rows Removed by Filter: 1092
-> Index Scan using actor_pkey on actor a (cost=0.28..1.19 rows=1 width=16) (actual time=0.035..0.035 rows=1 loops=102)
Index Cond: (id = c.source_id)
Filter: (workspace_id = '26204905-8ef0-47c4-a86a-a3e8cc3950aa'::uuid)
Planning Time: 8.428 ms
Execution Time: 20.961 ms```
yeah, possibly not the DB; could be tied up inside the app then. Good thing you tested it; would be going down a rabbit-hole for minimal gain
I kind of still suspect N+1 query problem
the web responds with a lot more data than exists in the results of this query; so it has to hydrate it from somewhere
also just the sheer size of the data it responds with can also contribute to slowness
I’ve also noticed that sometimes the db queries return quickly, and other times they are painfully slow (which suggests contention with a write lock)
you may want to try running that query periodically and seeing what kind of variance you get in the execution times
sure, I let you know!
There is an issue related that relates to the number of connections: https://github.com/airbytehq/airbyte/issues/27364
And another interesting discovery was that when I hit this endpoint: https://airbyte.loadsmart.io/api/v1/connections/list|api/v1/connections/list, the response is fast, which contributes to the conclusion that the database answers fast (right?)
however, this endpoint is slow ~2min to list the connections:
https://airbyte.loadsmart.io/api/v1/web_backend/connections/list|api/v1/web_backend/connections/list
It seems that when the front is not involved the performance is good
Yeah, the web Backend hydrates connections with more info. I think the response embeds svg images too
Still I think the DB does have latency spikes
Yeah, I will run that query periodically
Thanks a lot for now