CDK Yandex Metrica Multiple step API

Hello fellow developers.
I recently was tasked to build a data pipeline. The requirements are Postgres and Yandex Metrica API sources to Snowflake/Redshift destination.

There are few ETL services that offer Yandex Metrica as a source and those that do are quite expensive, so I thought I’d give Airbyte a shot and develop the connector myself. I soon ran into some problems as the Yandex API isn’t very straight-forward to use and I’m new to connector development.

The process to retrieve the data from the Yandex Metrica API is as follows:

  1. First the consumer makes a request to evaluate if the report can be generated.

  2. After the evaluation returns true, we create the request to generate the report.

  3. Next we check the status of the report. When the status is “processed” the report is ready to download. This can take from 30s to a few minutes, so we need to check the status every 20 seconds or so. Now that the report is ready, we extract the parts portion of the response.

Here’s an example of this response:

{
    "log_request": {
        "request_id": 28102748,
        "counter_id": 00000001,
        "source": "visits",
        "date1": "2022-07-03",
        "date2": "2022-07-12",
        "fields": [
            "ym:s:visitID"
        ],
        "status": "processed",
        "size": 25711,
        "parts": [
            {
                "part_number": 0,
                "size": 15711
            },
            {
                "part_number": 1,
                "size": 10000
            }
        ],
        "attribution": "LASTSIGN"
    }
}
  1. Now that we have all the parts of the report, we make a download request for each part.

  2. When we are done downloading everything we should clear the logrequest from Yandex servers.

Here is the API docs. We will use the Logs API:

So what I’m asking is how would I go about implementing this API as a source connector? Should I create a stream for each step? How would I nest those streams? How would I go about implementing incremental stream?

Hello and welcome to the community, @ernestasg! Please look at nested streams and caching, I think this is what you’re looking for:
https://docs.airbyte.com/connector-development/cdk-python/http-streams/#nested-streams--caching

Hey! Take a look at Adventum’s implementation of Yandex Metrika Logs Api source connector. airbyte/airbyte-integrations/connectors/source-yandex-metrika at master · adventum/airbyte · GitHub
It could be useful for you, even if you want to implement your own version of this source.