HTTP 504 Error During Schema Discovery in Airbyte

Summary

User encounters HTTP 504 error when discovering schemas in Airbyte deployed on Kubernetes with a large number of tables (900) in MS SQL connector. The issue arises despite increasing HTTP and read timeouts, while smaller schemas work fine.


Question

Hi ,

I am getting HTTP 504 error while selecting Streams / Discovering Schemas in Airbyte , Its able to connect & fetch the data from same source, while creating new connection it throws error

  1. Airbyte Deployed on Kubernetes , Airbyte Version :- 1.1.0 , Helm Chart Version :- 1.1.1
  2. MS SQL Connector Version :- 4.1.15
    I have tried this parameters in https://docs.airbyte.com/enterprise-setup/scaling-airbyte#schema-discovery-timeouts
    server:
    extraEnvs:
    • name: HTTP_IDLE_TIMEOUT
      value: 10m
    • name: READ_TIMEOUT
      value: 10m

Still I got HTTP 504 Error exactly around 60 Sec , In Same Database different schema which have lesser number of Tables its working Fine, In this schema I have around 900 Tables which throws HTTP 504 Error while discovering schema in around 60 Sec

Can anyone please help me out as its pretty urgent , I am at Production stage in Dev it was working fine as it has lesser number of Tables

Logs of pods which is Created by Airbyte While Discovering Schema

2024-11-04 18:46:04 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):308 - INFO main c.z.h.HikariDataSource(close):351 HikariPool-1 - Shutdown completed.
2024-11-04 18:46:04 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):308 - INFO main c.z.h.HikariDataSource(<init>):79 HikariPool-2 - Starting...
2024-11-04 18:46:04 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):308 - INFO main c.z.h.HikariDataSource(<init>):81 HikariPool-2 - Start completed.
2024-11-04 18:46:04 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):308 - INFO main i.a.c.i.b.IntegrationRunner(runInternal):224 Completed integration: io.airbyte.cdk.integrations.base.ssh.SshWrappedSource
2024-11-04 18:46:04 INFO i.a.w.i.VersionedAirbyteStreamFactory(internalLog):308 - INFO main i.a.i.s.m.MssqlSource(main):577 completed source: class io.airbyte.integrations.source.mssql.MssqlSource
2024-11-04 18:46:04 INFO i.a.c.ConnectorMessageProcessor(updateConfigFromControlMessage):231 - Checking for optional control message...
2024-11-04 18:46:04 INFO i.a.c.ConnectorMessageProcessor(setOutput):176 - Writing catalog result to API...
2024-11-04 18:46:05 INFO i.a.c.ConnectorMessageProcessor(setOutput):180 - Finished writing catalog result to API.
2024-11-04 18:46:05 INFO i.a.c.ConnectorWatcher(saveConnectorOutput):162 - Writing output of b5ea17b1-f170-46dc-bc31-cc744ca984c1_4a1e274f-a661-441b-a3ce-8b08c0912ecd_0_discover to the doc store
2024-11-04 18:46:06 INFO i.a.c.ConnectorWatcher(markWorkloadSuccess):167 - Marking workload b5ea17b1-f170-46dc-bc31-cc744ca984c1_4a1e274f-a661-441b-a3ce-8b08c0912ecd_0_discover as successful
2024-11-04 18:46:06 INFO i.a.c.ConnectorWatcher(exitProperly):215 - Deliberately exiting process with code 0.
2024-11-04 18:46:06 INFO i.a.c.i.LineGobbler(voidCall):166 -
2024-11-04 18:46:06 INFO i.a.c.i.LineGobbler(voidCall):166 - ----- END DISCOVER -----
2024-11-04 18:46:06 INFO i.a.c.i.LineGobbler(voidCall):166 -```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1730746054364089) if you want
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
['http-504-error', 'schema-discovery', 'kubernetes', 'ms-sql-connector', 'timeout-parameters']
</sub>

What are you using to expose Airbyte . . . e.g. ingress/load balancer?

And if this is cloud hosted, please include details on the Cloud and how k8s is deployed (e.g. GKE Autopilot, AWS EKS, etc.)

<@U035912NS77>

Thanks Mate , I got the solution via adding this piece of code into Web APP ingress , BTW I have deployed it on AWS EKS

      <http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout>: "600"
      <http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout>: "600"```

Perfect; glad you got it sorted!

This is the most common cause, and usually happens during connection check because it’s the only time when the end user is waiting on it to provision the node, schedule the pod, it to initialize, and then the conncheck to run. In normal jobs this all happens in the background, so there isn’t an issue with timeouts. But the config varies wildly depending on the type of ingress/load balancer being used so it’s hard to publish good instructions that work for everyone :upside_down_face:

What are you using to expose Airbyte . . . e.g. ingress/load balancer?

And if this is cloud hosted, please include details on the Cloud and how k8s is deployed (e.g. GKE Autopilot, AWS EKS, etc.)

<@U035912NS77>

Thanks Mate , I got the solution via adding this piece of code into Web APP ingress , BTW I have deployed it on AWS EKS

      <http://nginx.ingress.kubernetes.io/proxy-send-timeout|nginx.ingress.kubernetes.io/proxy-send-timeout>: "600"
      <http://nginx.ingress.kubernetes.io/proxy-read-timeout|nginx.ingress.kubernetes.io/proxy-read-timeout>: "600"```

Perfect; glad you got it sorted!

This is the most common cause, and usually happens during connection check because it’s the only time when the end user is waiting on it to provision the node, schedule the pod, it to initialize, and then the conncheck to run. In normal jobs this all happens in the background, so there isn’t an issue with timeouts. But the config varies wildly depending on the type of ingress/load balancer being used so it’s hard to publish good instructions that work for everyone :upside_down_face: