- Is this your first time deploying Airbyte?: Yes
- OS Version / Instance: Ubuntu
- Memory / Disk: 8Gb / 100Gb
- Deployment: Docker
- Airbyte Version: 0.40.15
- Source name/version: MongoDB
- Destination name/version: BigQuery
- Step: The issue is happening during sync, creating the connection or a new source?
Description:
This is my first topic here, so apologizing for any mistake made here. I am trying to introduce Airbyte to my company. For the POC part, I need to show that airbyte is able to fetch partial data from source rather than fetching all the data from Source DB at initial state at the first run. I am using incremental-append. Can anyone suggest how to do that?
For instance, assume I have a production table sized 200 GB, but I am not willing to fetch all the data for poc, I would like to have recent 1 month of data and continue on from there. How can I achieve that.
What I tried:
I tried to use airbyte api to set the state of the connection state to a specific date and tried to run. The state was set perfectly, but when I ran from UI, it ran from scratch fetching all the data.
As usual, airbyte acts perfectly incremental-append mode after initial full fetch. But I do not want that for my POC.
Requested Payload to the endpoint: http://0.0.0.0:8000/api/v1/state/create_or_update
{
"connectionId": CONNECTION_ID,
"connectionState": {
"stateType": "legacy",
"connectionId": CONNECTION_ID,
"state": {
'cdc': False,
'streams': [
{
'cursor': '2022-11-27T08:51:45.914Z',
'stream_name': 'partial_some_table',
'cursor_field': ['updatedAt'],
'stream_namespace': 'some_stream'
}
]
},
"streamState": [],
"globalState": {}
}
}
Response:
{'stateType': 'legacy',
'connectionId': CONNECTION_ID,
'state': {'cdc': False,
'streams': [{'cursor': '2022-11-27T08:51:45.914Z',
'stream_name': 'partial_some_table',
'cursor_field': ['updatedAt'],
'stream_namespace': 'some_stream'}]}}
Thanks in advance for any suggestions.