Iterable fails on initial sync, even after greatly reducing the total count & size of streams

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 64GB disk, t3.2xlarge, 32 GiB of Memory, 8 vCPUs, 64-bit platform
  • Deployment: Docker
  • Airbyte Version: 0.40.10
  • Source name/version: Iterable
  • Destination name/version: Snowflake
  • Step: Initial sync
  • Description: Iterable fails on initial sync, even after greatly reducing the total count & size of streams

I’ve tried several times and have not finished the initial sync of Iterable data. This includes both on the SaaS and the self-hosted versions of Airbyte. The problem appears related to rate-limiting by Iterable, and Airbyte eventually not being to take so much rejection.

I have turned off all but 11 (out of 44) streams, set most of the active streams to full sync w/overwrite, and set the time window to just 5 days. All to no avail.

It seems that the larger streams, e.g. user and list-user are the main culprits. I’ve attached the log from the most recent failure (on self-hosted Docker-based Airbyte).

Here is a snippet of the latest failure from the logs:

2022-09-30 06:05:54 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):301 - failures: [ {
  "failureOrigin" : "source",
  "failureType" : "system_error",
  "internalMessage" : "Request URL: https://api.iterable.com/api/lists/getUsers?listId=1659364, Response Code: 500, Response Text: {\"msg\":\"An error occurred. Please try again later. If problem persists, please contact your CSM\",\"code\":\"GenericError\",\"params\":null}",
  "externalMessage" : "Something went wrong in the connector. See the logs for more details.",
  "metadata" : {
    "attemptNumber" : 0,
    "jobId" : 2,
    "from_trace_message" : true,
    "connector_command" : "read"
  },
  "stacktrace" : "Traceback (most recent call last):\n  File \"/airbyte/integration_code/main.py\", line 13, in <module>\n    launch(source, sys.argv[1:])\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 123, in launch\n    for message in source_entrypoint.run(parsed_args):\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py\", line 114, in run\n    for message in generator:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 128, in read\n    raise e\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 114, in read\n    yield from self._read_stream(\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 179, in _read_stream\n    for record in record_iterator:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py\", line 277, in _read_full_refresh\n    for record in records:\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 421, in read_records\n    response = self._send_request(request, request_kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 339, in _send_request\n    return backoff_handler(user_backoff_handler)(request, request_kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/backoff/_sync.py\", line 105, in retry\n    ret = target(*args, **kwargs)\n  File \"/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/streams/http/http.py\", line 297, in _send\n    raise UserDefinedBackoffException(backoff=custom_backoff_time, request=request, response=response)\nairbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: https://api.iterable.com/api/lists/getUsers?listId=1659364, Response Code: 500, Response Text: {\"msg\":\"An error occurred. Please try again later. If problem persists, please contact your CSM\",\"code\":\"GenericError\",\"params\":null}\n",
  "timestamp" : 1664517894149
}

c48fb6ce_7a0b_45f8_a009_c62667781496_logs_2_txt.txt (2.7 MB)

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

I’ve created a Github issue to request improvement around this: https://github.com/airbytehq/airbyte/issues/17654

It sounds like you have taken many steps to try and get around this, but I’m going to see if there is anything else that can be done. Maybe restricting by start_date?

It does look like end_date has been added as well in v0.40.11 https://github.com/airbytehq/airbyte/pull/17573