Marketo connector error TypeError: '>' not supported between instances of 'str' and 'NoneType'

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: docker on EC2
  • Memory / Disk: you can use something like 4Gb / 1 Tb
  • Deployment: Docker
  • Airbyte Version: v0.40.11
  • Source name/version: source-marketo
  • Destination name/version: destination-redshift
  • Step: The issue is happening during sync
  • Description: marketo sync fail with the following error :
destination > Starting a new buffer for stream marketo_leads (current state: 0 bytes in 1 buffers)
source > Encountered an exception while reading stream leads
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 113, in read
    yield from self._read_stream(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 182, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 246, in _read_incremental
    stream_state = stream_instance.get_updated_state(stream_state, record_data)
  File "/airbyte/integration_code/source_marketo/source.py", line 99, in get_updated_state
    self.cursor_field: max(
TypeError: '>' not supported between instances of 'str' and 'NoneType'

This happen after an update of the Marketo connector, connection is only syncing leads, campaigns and programs from marketo.
At the end of the sync, only campaign has a connection state

27960fb9_f0c0_414c_bda4_792667d7d031_logs_1222_txt.txt (186.1 KB)

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

Hey @CBarbault, sorry to hear you’re having this issue! Let me look into it, my guess is that something got updated in the new version and caused breaking changes with your data. As a first step, could you try making a new test connector and do a small trial sync?

Hey, when I run a small sync (just a few days) the sync succeed and the state is correctly saved.
But when I run a full sync of my data (2 years), my sync worker fail.

(The connector I’m using is a new connector)

EDIT : I tried syncing 1 year and 1 month without success
EDIT 2: When running on my local machine (docker Airbyte v0.40.25) I got the following error message:

Additional Failure Information: message='java.lang.IllegalStateException: Job ran during migration from Legacy State to Per Stream State. 
One of the streams that did not have state is: io.airbyte.protocol.models.StreamDescriptor@13dbea3d[name=leads,namespace=<null>,additionalProperties={}]. 
Job must be retried in order to properly store state.', type='java.lang.RuntimeException', nonRetryable=false

I’m still getting this error, do you have any idea what could be the cause @natalyjazzviolin ?

Hey Cyprien! I have completed a successful sync and was not able to replicate this issue. Are you using normalization?

Yes I’m using " Normalized tabular data". The amount of data I’m trying to sync is quite important (ex: 400000+ leads) so I’m using the incremental dedup sync

I’m thinking there must be something in your older data that is causing a type error - the connector is trying to compare a string and a null value and that’s causing the exception. Do you see any airbyte_raw tables? Look for the leads one and look for a null value if you can! Then we can take it from there!

I do have those tables, but it appears that _airbyte_raw_marketo_programs is empty.

And does that data exist for you in Marketo?

We do have data in Marketo (e.g. more than 400k leads).

I’ve tried running the connection in full refresh override but this time it failed with the error message Additional Failure Information: invalid literal for int() with base 10: 'Asia/Bangkok'

Ah! So looks like more type errors. I think there is something that needs to be corrected in the connector code, or something is being set incorrectly in Marketo. Could you tell me what field this ‘Asia/Bangkok’ datapoint comes from? You ran only the leads stream, right? We need to pinpoint where this is happening. I’m looking through the leads stream and see a few integer fields.
https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-marketo/source_marketo/schemas/leads.json

Hello, I wish you a happy new year,

I was not able to find which field is causing the error, but here are the logs from a failing sync :
27960fb9_f0c0_414c_bda4_792667d7d031_logs_1334_txt.txt (46.3 KB)

To narrow the problem, this sync is only targeting the “lead” stream since 2023-12-20.

Edit: Looking at our data, “Asia/Bangkok” is refering to a persontimezone from Marketo

Hey again.

I tried pulling the Airbyte repository code to investigate the marketo connector and I found the following :

The lead who is breaking the pipeline has a name containing the following letters : ễ Đ ứ c (Vietnamese character I guess)
All its field are in the wrong order (ex: billingCity contains the email, id contains the timezone…)

I guess this has something to do with character encoding not being handle in the right way.

@natalyjazzviolin
I was able to fix the issue by using the unicodecsv library instead of the csvone and removing the decode_unicode=True.

Edit : I found an even simpler fix : seting response.encoding = 'utf-8'
In fact, response.encoding was set the default value of ISO-8859-1 as it wasn’t able to detect the utf-8 encoding

That is wonderful to hear, thank you so much for the update!

1 Like

Thanks for your assistance.

Link to the PR → 🐛 Source Marketo: fix encoding error for Lead sync by CyprienBarbault · Pull Request #20973 · airbytehq/airbyte · GitHub

Hey @natalyjazzviolin I’m still in need of a review on the PR, you’re in the last of reviewer could you take a look ? That would be super nice of you !