Issue with Incremental Read in HubSpot Connector

Summary

The user is facing an issue with running an incremental read in the HubSpot connector, as it continues to perform a full read instead. They are seeking assistance in understanding why this is happening.


Question

Hi :wave:
trying to run a “simple” incremental read over source-hubspot but if does not look to properly work and still run a FULL read
do you know why ?
thanks :pray:



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["incremental-read", "source-hubspot", "full-read", "issue"]

I made a quick run in my local machine and it works as expected.
poetry run source-hubspot read --config secrets/config.json --catalog integration_tests/my-catalog.json --state sample_files/my-state.json

{"type":"LOG","log":{"level":"INFO","message":"The following scopes were granted: ['content', 'automation', 'business-intelligence', 'oauth', 'forms', 'files', 'integration-sync', 'tickets', 'e-commerce', 'sales-email-read', 'forms-uploaded-files', 'crm.lists.read', 'crm.objects.contacts.read', 'crm.import', 'files.ui_hidden.read', 'crm.schemas.custom.read', 'crm.objects.custom.read', 'crm.schemas.contacts.read', 'crm.objects.feedback_submissions.read', 'crm.objects.companies.read', 'crm.objects.deals.read', 'crm.schemas.companies.read', 'crm.schemas.deals.read', 'crm.objects.owners.read', 'crm.objects.goals.read']"}}
{"type":"LOG","log":{"level":"INFO","message":"The following streams are unavailable: []"}}
{"type":"LOG","log":{"level":"INFO","message":"The following streams are partially available: [], add the following scopes to download all available data: set()"}}
{"type":"LOG","log":{"level":"INFO","message":"Marking stream companies as STARTED"}}
{"type":"TRACE","trace":{"type":"STREAM_STATUS","emitted_at":1723050335727.4338,"stream_status":{"stream_descriptor":{"name":"companies","namespace":null},"status":"STARTED"}}}
{"type":"LOG","log":{"level":"INFO","message":"Setting state of SourceHubspot stream to {'updatedAt': '2020-01-01T00:00:00.000000Z'}"}}
{"type":"LOG","log":{"level":"INFO","message":"Syncing stream: companies "}}
{"type":"LOG","log":{"level":"INFO","message":"Reading contacts associations of company"}}
{"type":"LOG","log":{"level":"INFO","message":"Marking stream companies as RUNNING"}}```
See the line:
````{"type":"LOG","log":{"level":"INFO","message":"Setting state of SourceHubspot stream to {'updatedAt': '2020-01-01T00:00:00.000000Z'}"}}```

<@U01MMSDJGC9> did you use exactly same values for your files as the one I provided ?

I can see the Company stream in your logs, but I just want to run the Engagements_notes one

I tried after with Engagements Notes and worked as expected :stuck_out_tongue:

Your stream file must be:

    "type": "STREAM",
    "stream": {
      "stream_descriptor": { "name": "engagements_notes" },
      "stream_state": { "updatedAt": "2200-01-01T00:00:00.000000Z" }
    }
  }
]```

I copy paste your file, and still same issue… (and excepted line breaks, it was the same as mine)

my sync is stuck after this two messages
if I enable --debug I can see it makes many API calls, meaning it is in FULL sync mode

Found ! :exploding_head:

in the secrets/config.json , if I set start_date to:
• 2023-09-05T00:00:00Z or later , it does not work
• 2023-09-04T00:00:00Z or earlier , it works !

  "start_date": "2023-09-04T00:00:00Z",
  "credentials": {
    "credentials_title": "Private App Credentials",
    "access_token": "my-token"
  }
}```
any explanation ?! :open_mouth:

my-catalog.json:

  "streams": [
    {
      "stream": {
        "name": "engagements_notes",
        "json_schema": {},
        "supported_sync_modes": ["full_refresh", "incremental"],
        "source_defined_cursor": true,
        "default_cursor_field": ["updatedAt"]
      },
      "sync_mode": "incremental",
      "cursor_field": ["updatedAt"],
      "destination_sync_mode": "append"
    }
  ]
}```

my-state.json:

  {
    "stream": {
      "stream_descriptor": {
        "name": "engagements_notes"
      },
      "stream_state": {
        "updatedAt": "2024-08-01T00:00:00Z"
      }
    }
  }
]```

command used:
poetry run source-hubspot read --config secrets/config.json --catalog integration_tests/my-catalog.json --state sample_files/my-state.json

and did not succeed to debug :confused:

proposed PR : https://github.com/airbytehq/airbyte/pull/43381

(but still not understand why the start_date provoked this behavior)