Checksum Invalid Error in Custom Python Source

Summary

User encounters checksum invalid errors for a custom Python source in Airbyte, indicating a mismatch between state record counts during data extraction and loading. The user has verified record counts and primary key definitions but is unable to resolve the issue despite rewriting the code.


Question

Hi all! Hope all is well.

I’m facing a checksum invalid message for a custom python source. I get both:

platform > Source state message checksum is invalid: state record count 282378.0 does not equal tracked record count 5388.0 for stream
and
platform > Destination state message checksum is invalid: state source record count 282378.0 does not equal state destination record count 5388.0. Please note that the destination count matches the platform count for stream
I know (because I’m logging record counts) that the read_records() method should be yielding all rows:

2024-11-22 15:36:45 source > 282378 records parsed.
2024-11-22 15:37:01 source > 282378 records remained.```
When I isolate the `API` class I have built to work with the CDK and run it locally, I get a complete output, but somehow when integrating with the Airbyte Protocol, something breaks.

Lastly, I know that the primary keys are correctly defined for the stream (the source only has 1 stream), because I'm selecting every possible dimensional combination as key.

What is more frustrating is that I have another python source that is working correctly, and that was used as a template to build this problematic one. I have basically rewritten all the code twice and could not find the issue. Any input would be greatly appreciated!!

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1732292122413649) if you want
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
['checksum-invalid', 'custom-python-source', 'airbyte-protocol', 'record-counts', 'primary-keys']
</sub>

to anyone reading this in the future, I had rows with some NaN values that were not being correctly parsed. Solved by replacing NaN values with None.