Issues with Hubspot web analytics stream

Summary

Encountering various problems with the Hubspot connector while trying to sync web analytics data to BigQuery, including issues with stopping attempts, query errors, and high log file count.


Question

Anyone having issues with the web analytics stream from Hubspot?
Hello, I am using a connection with Hubspot (image 1.9.0) as source and Big Query as destination (2.0.6) and I am interested in getting web analytics data based on contacts. I tried more times to get the data and I encountered these problems:
• Even if the current attempt stops and gets to the next one(for no reason), the current attempt does not actually stops, both pods for read are running, both attempts emit logs and state, the state is different from one attempt to another - the second one has less data than the first on state(the source uses composite state)
• When reach to the 3rd attempt sometimes the sync stops with Query error: Transaction is aborted due to concurrent update against table
• Another try I wanted to test was how the sync behaves when I trigger the cancel of the sync for UI, the scenario was being on the second attempt, the first was still running never stops even if I canceled, only the second one was stopped
• Also changed the SYNC_JOB_MAX_TIMEOUT_DAYS to 10 days to let the sync as much as possible and changed to OAuth to bypass the api limits
• The amount of log files is substantially higher than any Hubspot sync aprox 2000 files than under <100(this sync get only web analytics) and we have problems accessing the logs from the ui, we migrate the logs from minio to gcs for this task to make it easier to “view them” from the back
I have to mention that our contacts stream (parent stream) have 1 million of records and getting events for this number makes the source behaves weird, events are treated with composite state. Do you have any suggestions on how I should resolve this issue or same test cases to detect why is this happening?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["hubspot", "web-analytics", "bigquery", "sync", "query-error", "log-files", "composite-state"]