Summary
Issue with timestamp timezone conversion in Airbyte deployed on Kubernetes when syncing data from Postgres to Snowflake or Google Sheets. Timezone of timestamps is altered during synchronization, affecting data reliability.
Question
Hello everyone,
I’m encountering an issue with Airbyte deployed on Kubernetes, syncing data from Postgres to Snowflake (and also tested with Google Sheets as an additional destination). My columns in Postgres are of the type “timestamp with timezone”. However, during synchronization, the timezone of the timestamps is altered.
For example:
• In Postgres, I have a timestamp: “2018-10-03 03:05:01.142 +0200”
• In Snowflake, it appears as: “2018-10-03 01:05:01.142 +0000”
This issue also occurs with Google Sheets as a destination. My Kubernetes cluster is set to UTC time, but I am puzzled as to why this would affect the conversion of timezones in this manner, since Airbyte is supposed to preserve the original timezone of the data. This conversion is particularly problematic with regard to daylight saving time changes, making time-sensitive data less reliable.
Has anyone experienced this problem or have any suggestions on how I might resolve it? Any help would be greatly appreciated.
Thank you in advance for your support!
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["timestamp-timezone-conversion", "airbyte", "kubernetes", "postgres", "snowflake", "google-sheets", "timezone", "data-reliability"]
This is likely the correct default behavior. Both of these values represent the same timestamp, and because the offsets are numeric (not named timezones), neither is DST aware to start with—so therefore no less reliable than storing the value with the offset.
With Snowflake specifically, the default Timezone (and Datetime) types are aliases of TIMEZONE_NTZ
. More on that <Date & time data types | Snowflake Documentation the docs>. This means that it’s effectively treated as wallclock time (zone-unaware), unless you manually configure sessions to treat it as TIMEZONE_TZ
/TIMESTAMP WITH TZ
which will apply the time zone. So conversion to UTC here is appropriate, or the values would be wrong upon retrieval.
I’m not sure if changing your settings will change this behavior as it relates to Airbyte’s syncs (although you could try, or go read the source code). But if you need to know the original offset, I would personally recommend storing it separately (e.g. extracting it with a Transformation and storing it in its own column as something like tz_offset
). This then allows you to handle it in application-specific ways regardless of the other implementation details.
Time zones are a fragile thing, especially when working with offsets. I would personally recommend storing named zones for any data where it’s valuable to allow for the appropriate DST-adjusted data maths when needed.