- Is this your first time deploying Airbyte?: Yes
- OS Version / Instance: Debian
- Memory / Disk: 16GB/200GB
- Deployment: Docker on GCP VM
- Airbyte Version: 0.42.0
- Source name/version: airbyte/source-zuora 0.1.3
- Destination name/version: airbyte/destination-bigquery 1.2.18
- Step: During sync
Description:
- Airbyte uses the Zuora Data Query API.
- The Data Query API has a 10 million row limit for input.
- Input rows are filtered by
WHERE
in SQL. -
If filtering on
UpdatedDate
using a string cast withTIMESTAMP
and the string has 6 decimals for fractions of a second, that filter is seemingly ignored, making you hit that 10m row limit in a big table. - Version 0.1.3 of source-zuora outputs a string with 6 decimals.
Example:
-- Working query:
select count(*)
from payment where
updateddate >= TIMESTAMP '2023-01-02 00:00:00.000 +00:00' and
updateddate <= TIMESTAMP '2023-01-07 00:00:00.000 +00:00'
-- Failing query:
select count(*)
from payment where
updateddate >= TIMESTAMP '2023-01-02 00:00:00.000000 +00:00' and
updateddate <= TIMESTAMP '2023-01-07 00:00:00.000000 +00:00'
The last query hits the 10 million row limit: Query failed (#): Input Rows for payment exceeded limit (10000000)
I have fixed this issue in this PR.
If I publish that connector and use it as a custom connector in Airbyte I get several other issues though, like:
- oauth not working anymore (workaround here),
- Airbyte running duplicate queries (two identical queries, using the same
UpdatedDate
window) and then spamming at least a few hundredDESCRIBE <table>
queries. I haven’t let it run to see if it finishes.
I only run one table at a time, so first it queries the table twice, and then it spams for description of the same table.
Does this sound familiar to anyone? Have I missed something obvious when deploying the forked custom connector?