Developing custom connector with incremental pull using low code SDK

Summary

Exploring the use of low code SDK for developing a custom connector in Airbyte to support an API endpoint with incremental pull based on date/time filtering.


Question

Hi! New to Airbyte connector development. Looks like Airbyte supports my source (HTTP/API), but does not have the endpoint I want. So I need to make one my own. Is the low code SDK suitable? We checked all the boxes, I just have one question on incremental pull. The API supports a filter by date/time in the request body, as below.

"filter": {
"fromDateTime": "2024-01-16T00:00:00-05:00",
"toDateTime": "2024-01-17T00:00:00-05:00"
},```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1705615042958559) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["airbyte", "connector-builder", "low-code-sdk", "custom-connector", "incremental-pull", "api-endpoint", "date-time-filtering"]
</sub>

Yes, the Airbyte Connector Development Kit (CDK) is suitable for your needs. It allows you to build an HTTP API source connector, and it supports incremental pulls, which is what you need for your date/time filter. To configure incremental syncs for a stream in the connector builder, you have to specify how the records will represent the “last changed” / “updated at” timestamp, the initial time range to fetch records for, and how to request records from a certain time range. In your case, you can use the “Inject start time / end time into outgoing HTTP request” option in the incremental sync form to inject the start and end time into the request body. However, if the API has special requirements that can’t be handled this way, you can disable injection in the incremental sync form and use the generic parameter section at the bottom of the stream configuration form to freely configure query parameters, headers, and properties of the JSON body, by using jinja expressions and available variables. You can also use these variables as part of the URL path. Here is an example of how you can use the stream_interval variable to configure a query parameter: json { "filter": { "fromDateTime": "{{stream_interval.start_time}}", "toDateTime": "{{stream_interval.end_time}}" } } This will filter down to the right window in time. For more information, you can refer to the Airbyte documentation on incremental sync and the YouTube tutorial on how to build ETL sources.

Oh, so I need the CDK, not the low code connector?

You can use the low-code to achieve that. Take a look in the Zendesk Sunshine implementation it uses the stream interval to create a complex json body request, you can change to use the query parameters request_parameters instead of request_body_json

Thank you! Can you point me towards the github where the query parameters are? I assume it’d be somewhere here, https://github.com/airbytehq/airbyte/tree/ae343436797360305c8df1d2b228fefcf92b6bd9/airbyte-integrations/connectors/source-zendesk-sunshine/source_zendesk_sunshine|https://github.com/airbytehq/airbyte/tree/ae343436797360305c8df1d2b228fefcf92b6bd9[…]ions/connectors/source-zendesk-sunshine/source_zendesk_sunshine but I do not know what to search for.

https://github.com/airbytehq/airbyte/blob/ae343436797360305c8df1d2b228fefcf92b6bd9/airbyte-integrations/connectors/source-zendesk-sunshine/source_zendesk_sunshine/manifest.yaml#L91-L97|https://github.com/airbytehq/airbyte/blob/ae343436797360305c8df1d2b228fefcf92b6bd9[…]s/source-zendesk-sunshine/source_zendesk_sunshine/manifest.yaml

you can change to something like:

      type: SimpleRetriever
      requester:
        $ref: "#/definitions/requester"
        path: objects/query
        http_method: POST
        request_body_json:
          filter:
            fromDateTime: "{{ stream_interval.start_time }}"
            toDateTime: "{{ stream_interval.end_time}}"```

this looks great! in practice. we need to iterate through the timestamps. say i want it to be yesterday’s data. how would i achieve that? i.e. pass a date parameter to the configuration file

            toDateTime: "{{ stream_interval.end_time}}"```

yes, the stream_interval will use the window period to retrieve the data

i think i can set the values with testing params. would it use an environment variable in a deployed environment?

No, it uses values from the config and you can create a config with any value for tests

Sorry, I meant in the deployed (e.g helm) environment - are those values populated by helm config files? pod env vars?