Summary
The user is experiencing an issue where synchronization in full refresh overwrite mode works correctly the first time but behaves as incremental the second time, resulting in the target table being empty or with minimal data. The issue seems to be related to the state being defined for objects in the second synchronization.
Question
airbyte version: 0.50.52
source: hubspot v4.1.1 (with additions)
destination: Postgres v0.4.0
Greetings, everyone!
I’m new to airbyte, and I’ve been given a mission to maintain the company’s airbyte code. Our technology leader was maintaining airbyte on his own and making some changes to the source_hubspot code (our CRM). The idea was to create a sort of filter for customized objects to bring in only those that contained properties (“operator”: “HAS_PROPERTY”)
entity = "contact"
last_modified_field = "lastmodifieddate"
associations = ["contacts", "companies"]
primary_key = "id"
scopes = {"crm.objects.contacts.read"}
def __init__(
self,
include_archived_only: bool = False,
**kwargs,
):
super().__init__(**kwargs)
self._state = pendulum.parse(kwargs['start_date'])
# beginning of the code change
def search(self, url: str, data: Mapping[str, Any], params: MutableMapping[str, Any] = None) -> Tuple[Union[Mapping[str, Any], List[Mapping[str, Any]]], requests.Response]:
filter_groups = [
{
"filters": [data["filters"][0],
{"value": "0", "propertyName": "num_associated_deals", "operator": "GT"}]
},
{
"filters": [data["filters"][0],
{"propertyName": "servico_financeiro_contratado", "operator": "HAS_PROPERTY"}]
},
{
"filters": [data["filters"][0],
{"propertyName": "industry", "operator": "HAS_PROPERTY"}]
},
]
del data['filters']
data['filterGroups'] = filter_groups
return super().search(url, data, params)```
However, now any synchronization in full refresh overwrite mode works the first time, but the second time it returns only the new data (as if it were incremental) and rewrites the target table, leaving it empty or with almost no data.
You can see that in the first synchronization the objects have no state defined, but in the second they have a state defined for the date and time of the last synchronization.
Connection state:
```[
{
"streamDescriptor": {
"name": "grupoeconomico"
},
"streamState": {
"updatedAt": "2024-07-24T20:23:44.267878Z"
}
},
{
"streamDescriptor": {
"name": "apolices"
},
"streamState": {
"updatedAt": "2024-07-24T20:23:44.267595Z"
}
},
{
"streamDescriptor": {
"name": "objetivos"
},
"streamState": {
"updatedAt": "2024-07-24T20:23:44.268111Z"
}
},
{
"streamDescriptor": {
"name": "eventos"
},
"streamState": {
"updatedAt": "2024-07-24T20:23:44.267254Z"
}
},
{
"streamDescriptor": {
"name": "carteira"
},
"streamState": {
"updatedAt": "2024-07-24T20:23:44.268364Z"
}
},
{
"streamDescriptor": {
"name": "contacts"
},
"streamState": {
"updatedAt": "2024-07-24T19:16:35.257960Z"
}
}
]```
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1721923308247279) if you want
to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["airbyte-version-0.50.52", "hubspot-v4.1.1", "postgres-v0.4.0", "full-refresh-overwrite-mode", "synchronization-issue", "state-definition"]
</sub>