Destination - ElasticSearch via AWS OpenSearch: 429 Too Many Requests

satar · September 8, 2022, 3:54am

Is this your first time deploying Airbyte?: Yes
OS Version / Instance: MacOS but using docker example
Memory / Disk: Docker Resources - CPUs: 8, Memory: 8GB, Swap: 1 GB
Deployment: Docker
Airbyte Version: 0.40.0-alpha
Source name/version: Snowflake 6.30.0
Destination name/version: ElasticSearch 7.10
Step: The issue is happening during sync with a little over 7K rows but works with 60 rows
Description: I suspect this may be a configuration related either in Airbyte to throttle the requests or AWS side to open a bigger pipe but not sure where. Anyway, I am trying to sync a table in Snowflake to ElasticSearch instance via AWS OpenSearch. The table initially had over a million rows but I kept downsizing it until it succeeded at 60 rows. I suspect the sweet spot is somewhere between 7k and of course 60 where it was finally successful. The following is a snippet from the logs around the error:

2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):35 - Queueing query for table: RISKEVENTS_FLAT
2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.d.j.s.AdaptiveStreamingQueryConfig(initialize):38 - Set initial fetch size: 10 rows
2022-09-07 22:23:53 source > 2022-09-07 22:23:53 INFO i.a.d.j.s.TwoStageSizeEstimator(getTargetBufferByteSize):72 - Max memory limit: 6262095872, JDBC buffer size: 1073741824
2022-09-07 22:23:54 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 1000 (1 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 2000 (2 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 3000 (3 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 4000 (4 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 5000 (5 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 6000 (7 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):86 - Flushing RISKEVENTS_FLAT: 6518 records (31 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.e.ElasticsearchAirbyteMessageConsumerFactory(lambda$recordWriterFunction$3):77 - writing 6518 records in bulk operation
2022-09-07 22:23:57 destination > 2022-09-07 22:23:57 ERROR i.a.i.b.FailureTrackingAirbyteMessageConsumer(accept):52 - Exception while accepting message
2022-09-07 22:23:57 destination > org.elasticsearch.client.ResponseException: method [POST], host [https://search-airbyte-es-poc-4dn6yfexyvp2m2wmuynjzd7nsi.us-west-2.es.amazonaws.com:443], URI [/_bulk?refresh=true], status line [HTTP/1.1 429 Too Many Requests]

natalyjazzviolin · September 8, 2022, 1:17pm

Hi @satar, ElasticSearch via AWS OpenSearch isn’t a stable connection at this point, unfortunately. I’ve looked through our issues and forum posts and it looks like some other users have run into problems with this setup:

https://discuss.airbyte.io/t/destination-elasticsearch-with-aws-opensearch-failed-to-ping-elasticsearch/1380/4

https://github.com/airbytehq/airbyte/issues/4577#issuecomment-930960891

Could you open a GitHub issue for a new OpenSearch connector?

satar · September 9, 2022, 1:44pm

Thanks Nataly, I saw the ping issue when I searched the forums. That I thought was addressed as I’ve been able to get it to work successfully on a small amount of data. This was initially failing for me with the same ping issue until I changed the AWS OpenSource domain to use ElasticSearch engine version 7.10 instead of the latest OpenSource release. My suspicion is that the reason for failure with a larger dataset is throttling how many request are made to ElasticSearch and was curious if this is something that can be configured.

natalyjazzviolin · September 9, 2022, 2:03pm

Gotcha! This can’t be configured currently, unfortunately. But definitely feel free to make a feature request on Github!

Topic		Replies	Views
Source Oracle - Sync to Destination Elasticsearch failed Connector Questions & Issues source-oracle-db , connectors	3	379	September 14, 2022
Postgres to Snowflake only sync part of the data Connector Questions & Issues source-postgres , destination-snowflake , data-loading , connectors	3	460	August 11, 2022
Normalization process fails for the MySQL source and Snowflake destination Connector Questions & Issues normalization	2	206	March 21, 2023
Source-postgres to snowflake failing with broken pipe error Connector Questions & Issues source-postgres	1	367	December 14, 2022
Struggling with S3 to Snowflake connection Connector Questions & Issues destination-snowflake , data-loading , connectors , source-s3	10	805	October 12, 2022

Destination - ElasticSearch via AWS OpenSearch: 429 Too Many Requests

Related topics