HubSpot companies data becomes double in array list

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: GCE
  • Memory / Disk: 4Gb / 300Mb
  • Deployment: Docker
  • Airbyte Version: 0.39.20-alpha
  • Source name/version: HubSpot (0.1.75)
  • Destination name/version: BigQuery (1.1.11)
  • Step: The issue happens when airbyte collect raw data from hubspot source.
  • Description:

_airbyte_raw_deals table shows below.
image

But when I check the deal on HubSpot API at https://api.hubapi.com/deals/v1/deal/9138379099?hapikey=[my_api_key], it returns below and there are only one item in list(no duplicate).

{…, dealId":9138379099,“isDeleted”:false,“associations”:{“associatedVids”:[136953],“associatedCompanyIds”:[7379827584]…}

Why companies get duplicated when it came to _airbyte_raw_deals and how can I fix this problem??

Your support would be much appreciated.

Hello @yasuyama, airbyte_raw<table_name> tables can have duplicate records. You have to use the deduped history sync mode to generate a final table with deduplication rows.

If this is a nested stream it cannot be deduplicated, but you could use a custom transformation to dedupe it. If that’s the case, take a look at this discussion:
https://github.com/airbytehq/airbyte/issues/9465