Invalid timezone offset in source and destination when syncing streams

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: AWS Linux
  • Memory / Disk: you can use something like 32Gb / 100 GB
  • Deployment: Are you using Docker or Kubernetes deployment? No
  • Airbyte Version: What version are you using now? v0.40.28
  • Source name/version: Jira & Salesforce
  • Destination name/version: Redshift
  • Step: The issue is happening during sync, creating the connection or a new source? During sync
    Description:

When loading data from Jira and Salesforce into Redshift, I’m seeing lots of datetime errors describing an invalid timezone offset:

Salesforce

source > Syncing stream: Account 
ERROR c.n.s.DateTimeValidator(tryParse):82 - Invalid date-time: Invalid timezone offset: +0000

Jira

destination > Preparing tmp table in destination started for stream jira__issues. schema: source_jira, tmp table name: _airbyte_tmp_ldd_jira__issues
ERROR c.n.s.DateTimeValidator(tryParse):82 - Invalid date-time: Invalid timezone offset: -0800

Questions:

  • What happens to this data? Is it thrown out?
  • Is there a workaround possible on our end, when using the “Normalized tabular data” transformation?
  • Will we need to “transform this data” manually instead?
  • Should I submit a github issue against airbytehq/airbyte? Are there any guidelines I should observe?

Thanks for any help!

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

I am also stuck on this same problem.
Is there any update on any solution?

My environment is as follows

  • Deployment: Are you using Docker or Kubernetes deployment? Yes
  • Airbyte Version: What version are you using now? v0.40.26
  • Source name/version: Salesforce/1.0.30 & TikTok Marketing/2.0.1
  • Destination name/version: BigQuery/1.2.13
  • Step: The issue is happening during sync, creating the connection or a new source? During sync

This problem has been occurring since September 2022 and I assume it is a bug in the update.
These bugs are happening in the normalization procedure, which is very costly to manage as you have to create your own json parsing process and dbt.
(_airbyte_raw_*** has the right data)

Another problem is that it is no longer possible to use sync mode for Deduped History.
Is there any movement to try to fix these?
If not, I would like to attempt to fix them myself. Can you describe the bug in detail?

@marcosmarxm
@tuliren
You guys have several actions to take on this issue. Do you have any ideas for solutions?

Related Links

1 Like

Hello joseph, it’s been a while without an update from us. Are you still having problems or did you find a solution?

I am seeing a similar issue which I’m gathering is related:

2023-03-14 21:07:02 ERROR c.n.s.DateTimeValidator(tryParse):82 - Invalid date-time: Invalid timezone offset: -0400
2023-03-14 21:07:02 ERROR c.n.s.DateTimeValidator(tryParse):82 - Invalid date-time: Invalid timezone offset: -0400

I’m using 0.41.0 with Jira connector 0.3.4. This does not appear to be during the normalization dbt code - it’s throwing these errors during the initial sync. I can’t tell what the actual timezone string is from the error message.

If I run the following:

select _airbyte_data:created from _airbyte_raw_issues limit 100;

I see strings like:

2023-03-06T23:48:44.823-0500

But I don’t know enough about the workflow to know if those are causing the error (I get a dozen or two of those error messages, but have 29k+ issues, all of which have a created timestamp like the above. For our own dbt jira date processing (required for history and embedded date strings), I use the following snowflake conversion:

{% macro jira_timestamp(value) %}
    to_timestamp({{ value }}, 'YYYY-MM-DD"T"HH24:MI:SS.FF3TZHTZM')
{% endmacro %}

I can’t tell what the implications of this are - are we missing records in raw airbyte data? processed data? are fields getting converted to null? What fields?

@eschrock I am running into similar issues, and some modified records not getting synched. Wondering if timestamp mismatch is related to missed updates.

Were you able to make this error go away?

No, I wasn’t able to resolve it. I never did fully root cause the issue or the implications. For us it hasn’t had a significant effect, but if it’s causing modified records to not be synced that could cause subtle errors down the road. I will put it back on our list to investigate.

I applied some schema changes and preformed a full reset of some streams (e.g. Salesforce Account object). Now the issue is pretty massive, just 50% of the records have synched. No other errors in the log (other than the schema validation date time mismatch - which is more a warning). Salesforce connector needs some work or better logging. I cant think of a troubleshooting method. Missing records are more recent, in the last 12 months.