Destination - ElasticSearch via AWS OpenSearch: 429 Too Many Requests

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: MacOS but using docker example
  • Memory / Disk: Docker Resources - CPUs: 8, Memory: 8GB, Swap: 1 GB
  • Deployment: Docker
  • Airbyte Version: 0.40.0-alpha
  • Source name/version: Snowflake 6.30.0
  • Destination name/version: ElasticSearch 7.10
  • Step: The issue is happening during sync with a little over 7K rows but works with 60 rows
  • Description: I suspect this may be a configuration related either in Airbyte to throttle the requests or AWS side to open a bigger pipe but not sure where. Anyway, I am trying to sync a table in Snowflake to ElasticSearch instance via AWS OpenSearch. The table initially had over a million rows but I kept downsizing it until it succeeded at 60 rows. I suspect the sweet spot is somewhere between 7k and of course 60 where it was finally successful. The following is a snippet from the logs around the error:
2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):35 - Queueing query for table: RISKEVENTS_FLAT
2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.d.j.s.AdaptiveStreamingQueryConfig(initialize):38 - Set initial fetch size: 10 rows
2022-09-07 22:23:53 source > 2022-09-07 22:23:53 INFO i.a.d.j.s.TwoStageSizeEstimator(getTargetBufferByteSize):72 - Max memory limit: 6262095872, JDBC buffer size: 1073741824
2022-09-07 22:23:54 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 1000 (1 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 2000 (2 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 3000 (3 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 4000 (4 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 5000 (5 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 6000 (7 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):86 - Flushing RISKEVENTS_FLAT: 6518 records (31 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.e.ElasticsearchAirbyteMessageConsumerFactory(lambda$recordWriterFunction$3):77 - writing 6518 records in bulk operation
2022-09-07 22:23:57 destination > 2022-09-07 22:23:57 ERROR i.a.i.b.FailureTrackingAirbyteMessageConsumer(accept):52 - Exception while accepting message
2022-09-07 22:23:57 destination > org.elasticsearch.client.ResponseException: method [POST], host [https://search-airbyte-es-poc-4dn6yfexyvp2m2wmuynjzd7nsi.us-west-2.es.amazonaws.com:443], URI [/_bulk?refresh=true], status line [HTTP/1.1 429 Too Many Requests]

Hi @satar, ElasticSearch via AWS OpenSearch isn’t a stable connection at this point, unfortunately. I’ve looked through our issues and forum posts and it looks like some other users have run into problems with this setup:

https://discuss.airbyte.io/t/destination-elasticsearch-with-aws-opensearch-failed-to-ping-elasticsearch/1380/4

https://github.com/airbytehq/airbyte/issues/4577#issuecomment-930960891

Could you open a GitHub issue for a new OpenSearch connector?

Thanks Nataly, I saw the ping issue when I searched the forums. That I thought was addressed as I’ve been able to get it to work successfully on a small amount of data. This was initially failing for me with the same ping issue until I changed the AWS OpenSource domain to use ElasticSearch engine version 7.10 instead of the latest OpenSource release. My suspicion is that the reason for failure with a larger dataset is throttling how many request are made to ElasticSearch and was curious if this is something that can be configured.

Gotcha! This can’t be configured currently, unfortunately. But definitely feel free to make a feature request on Github!