Step: The issue is happening during sync with a little over 7K rows but works with 60 rows
Description: I suspect this may be a configuration related either in Airbyte to throttle the requests or AWS side to open a bigger pipe but not sure where. Anyway, I am trying to sync a table in Snowflake to ElasticSearch instance via AWS OpenSearch. The table initially had over a million rows but I kept downsizing it until it succeeded at 60 rows. I suspect the sweet spot is somewhere between 7k and of course 60 where it was finally successful. The following is a snippet from the logs around the error:
2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):35 - Queueing query for table: RISKEVENTS_FLAT
2022-09-07 22:23:52 source > 2022-09-07 22:23:52 INFO i.a.d.j.s.AdaptiveStreamingQueryConfig(initialize):38 - Set initial fetch size: 10 rows
2022-09-07 22:23:53 source > 2022-09-07 22:23:53 INFO i.a.d.j.s.TwoStageSizeEstimator(getTargetBufferByteSize):72 - Max memory limit: 6262095872, JDBC buffer size: 1073741824
2022-09-07 22:23:54 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 1000 (1 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 2000 (2 MB)
2022-09-07 22:23:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 3000 (3 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 4000 (4 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 5000 (5 MB)
2022-09-07 22:23:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 6000 (7 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):86 - Flushing RISKEVENTS_FLAT: 6518 records (31 MB)
2022-09-07 22:23:56 destination > 2022-09-07 22:23:56 INFO i.a.i.d.e.ElasticsearchAirbyteMessageConsumerFactory(lambda$recordWriterFunction$3):77 - writing 6518 records in bulk operation
2022-09-07 22:23:57 destination > 2022-09-07 22:23:57 ERROR i.a.i.b.FailureTrackingAirbyteMessageConsumer(accept):52 - Exception while accepting message
2022-09-07 22:23:57 destination > org.elasticsearch.client.ResponseException: method [POST], host [https://search-airbyte-es-poc-4dn6yfexyvp2m2wmuynjzd7nsi.us-west-2.es.amazonaws.com:443], URI [/_bulk?refresh=true], status line [HTTP/1.1 429 Too Many Requests]
Hi @satar, ElasticSearch via AWS OpenSearch isn’t a stable connection at this point, unfortunately. I’ve looked through our issues and forum posts and it looks like some other users have run into problems with this setup:
Thanks Nataly, I saw the ping issue when I searched the forums. That I thought was addressed as I’ve been able to get it to work successfully on a small amount of data. This was initially failing for me with the same ping issue until I changed the AWS OpenSource domain to use ElasticSearch engine version 7.10 instead of the latest OpenSource release. My suspicion is that the reason for failure with a larger dataset is throttling how many request are made to ElasticSearch and was curious if this is something that can be configured.