Bug Report: Cursor Pagination Documentation vs Implementation

Summary

User reports a discrepancy between documented and actual behavior of cursor pagination in API calls, specifically related to URL path handling and request formation.


Question

Hi — I’d like to report a bug, either in the documentation or in the implementation.

The <https://arc.net/l/quote/gvpraegr|Cursor Pagination> documentation states (emphasis mine):

For cursor pagination, if path is selected as the Inject into option, then the entire request URL for the subsequent request will be replaced by the cursor value.
(a_uthor’s note:_ if this is the intended behavior, this injection method should be called URL, not Path)

In practice, this isn’t the case. Take a look at the attached photo.

Here’s are the relevant variables:
• API Base URL: https://coda.io/apis/v1
• Stream URL Path: /docs
• nextPageLink: https://coda.io/apis/v1/docs/wgFd-3K0OL/pages?pageToken=<token>
• Requested URL: https://coda.io/docs/wgFd-3K0OL/pages?pageToken= <token>
As you can see, the nextPageLink includes /apis/v1 , but the requested URL does not.

I suspect (with no evidence and very little conviction) that your algorithm is trying to do some sort of path replacement with the ultimate effect of removing the API Base URL’s path component from the nextPageLink



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

['cursor-pagination', 'documentation-bug', 'api-url', 'path-replacement', 'nextPageLink']

Out of curiosity, what part of the URL is in your URL is in the global config (API Base URL) and what part is in the stream-level config?

That may be a workaround for now (moving the prefix out), but I’d agree that this is unexpected behavior the Airbyte team should look into. I’m just thinking maybe we can define the problem a little better and get a GitHub issue in

I got it working extracting the token and reappending it as a query string , but agree this definitely seems like a bug and should be addressed either in code or in docs.

To answer your questions:
• The API Base URL in the global configuration is: <https://coda.io/apis/v1>
• The stream configuration adds /docs to the base URL.
I’ve updated my original post to reflect the stream path.

Let me know if I didn’t interpret your question correctly!

hm, definitely a weird one. I’ve been fighting this feature this week while trying to refactor the broken Pardot connector, but my issue is that when it says “replaced” it’s definitely merging . . . as other parameters still get appended. And Pardot’s v5 API doesn’t allow those parameters when a nextPageToken is passed :upside_down_face:

Took a lot of handling to get that right, but I agree that it should both be URL (or maybe a separate option for URL and Path since different APIs provide one or the other) . . . and then a separate option on whether to merge other parameters or not, since Pardot isn’t the only API that expects only the token on paging requests.

I’m very new to connectors (like 3 hours under my belt), and am only working on this because the existing Coda connector is broken. I agree it’s a bit frustrating.

While I have you, do you have any idea of whether it’s possible to store some data between runs?

Coda returns a “nextSyncToken” which isn’t a timestamp, but a random string that encodes a given data state.

For incremental runs, I would want to pass that token so that Coda only returns new data since that sync token. Notably, the sync token is not a timestamp — it’s a hash.

So ideally I’d be able to include that as a query string, but I need to store it between runs.

Well you’re doing great for 3 hours! I don’t think there’s a way yet in Builder/Low-Code that allows you to inject custom state values (it might be possible through a custom component, but there are limits on that), but it can be done in the CDK.

I do know other APIs that have historically worked this way, so there’s definitely a use case there. I would think some way of storing custom state records would be the most flexible way to handle this (since some APIs may combine it with date-based cursors as well).

<@U069EMNRPA4> will probably sanity check me in case I’m lying to you. :joy:

Thanks Justin — appreciate the help!

If it helps, <@U069EMNRPA4>, here’s the <https://arc.net/l/quote/jhytxkdn|specific spot> in the Coda API describing how they provide the non-timestamp sync token for incremental updates.

Hmmmm, I definitely have seen the framework navigate to next page token (i.e. replace full url instead). Suspicious.

<@U069EMNRPA4> what’s the correct course of action here. Should I open up an issue in Github? Or just let this evaporate into the ether?