Implementing custom pagination for new connector development

Summary

Need help with implementing custom pagination for a new connector development to stop reading the API based on a custom condition. The user is facing a situation where the API responds only once and wants to replicate the logic used in Python code to check file size and stop the loop.


Question

Hello All,

Need help with new connector development:
I’m trying to implement custom pagination, I want to implement a custom condition when to stop reading the API.
Now I have a situation in which API respond only once. Previously in Python code, I checked the file size and then it stopped the loop.
(code in Thread).



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["custom-pagination", "new-connector-development", "custom-condition", "api", "python-code", "file-size"]
import os
import json

# Set up the headers for the POST request
headers = {
    'Content-Type': 'application/json'
}

token = os.environ.get("MOBSTATION_TOKEN")

# Directory where the call history JSON files will be saved
save_directory = "\\Data\\call_history"
if not os.path.exists(save_directory):
    os.makedirs(save_directory)

# Delete existing files in the directory
for root, dirs, files in os.walk(save_directory):
    for file in files:
        os.remove(os.path.join(root, file))

# Function to save response to a file
def save_response_to_file(response, filename):
    with open(filename, 'w') as f:
        json.dump(response, f)

# URL for the POST request
url = '<https://mob-station:8000/api/call_history>'

# Loop to send up to 100,000 POST requests
for i in range(100000):
    offset = i * 100
    body = {
        "token": token,
        "offset": offset
    }
    response = <http://requests.post|requests.post>(url, headers=headers, json=body).json()
    filename = os.path.join(save_directory, f"call_history{i}.json")
    save_response_to_file(response, filename)
    
    # Sleep for 1 second between requests
    time.sleep(1)
    
    # Break the loop if the file size is less than 2KB
    if os.path.getsize(filename) &lt; 2048:
        break```

I’m assuming calls are ordered by callStarted in descending order, and you want to paginate all the way until you get a call from 2023-01-01, correct?

Two naive thoughts:

  1. The way you access cursor value: response.body.data.[].callStarted — should it be data[0].callStarted to get callStarted of the first call on the current page? Y<YAML Reference | Airbyte Documentation can use >last_record<YAML Reference | Airbyte Documentation in interpolation instead>.
  2. I am not sure you can compare callStarted (a string) to 2023-01-01 and get what you want — I don’t know if it’s going to compare string values or cast them to dates, but I’d recommend that you cast them both to dates first and then compare dates. <YAML Reference | Airbyte Documentation example, you can use >timestamp<YAML Reference | Airbyte Documentation macro in the jinja expression>.
    If your paginator only performs one request and then stops, that’s because quit condition is met, which means your comparison is truthy. It clearly should’t be.

Note to us:
• It would be cool to be able to “debug” what values are interpolated, and what happens in those expressions — I wonder what that would take to make.
<@U02N7UR7S5A> does that help?

Thank you for the suggestion. Ill try to configure.