CDC Errors on PostgreSQL Source using wal2json

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 204Gb / 1 Tb
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.28-alpha (error persisted on 0.39.21)
  • Source name/version: Postgres(0.4.28) and Postgres(0.4.26)
  • Destination name/version: Bigquery(1.1.11) and Bigquery(1.1.9)
  • Step: source(not really sure)
  • Description:

CDC stopped working after not testing again for two weeks, full load works, but in the next sync when there is some CDC record to be collected, it pops the following error:

2022-06-28 14:43:20 source > 2022-06-28 14:43:20 ERROR i.d.p.ErrorHandler(setProducerThrowable):35 - Producer failure
2022-06-28 14:43:20 source > org.postgresql.util.PSQLException: ERROR: out of memory
2022-06-28 14:43:20 source > Detail: Cannot enlarge string buffer containing 1073741806 bytes by 23 more bytes.

a similar error can be found here When i use Debezium and wal2json it cause a problem · Issue #183 · eulerto/wal2json · GitHub.

Whenever i try a update/insert of like one row, this error appears so a memory issue looks unlikely

Maybe a solution would be to airbyte request the data in format-version 2?

Version 1
SELECT data FROM pg_logical_slot_peek_changes('berna_slot', NULL, NULL, 'pretty-print', '1', 'add-tables', 'public.nova_tabela', 'format-version', '1');


 {                                                                                                               
         "change": [                                                                                             
                 {                                                                                               
                         "kind": "insert",                                                                       
                         "schema": "public",                                                                     
                         "table": "nova_tabela",                                                                 
                         "columnnames": ["id", "varchar", "bigint", "numeric", "decimal", "date"],               
                         "columntypes": ["integer", "character varying", "bigint", "numeric", "numeric", "date"],
                         "columnvalues": [3, "mais um teste do berna2", null, null, null, null]                  
                 }                                                                                               
         ]                                                                                                       
 }

version 2:

SELECT data FROM pg_logical_slot_peek_changes('berna_slot', NULL, NULL, 'pretty-print', '1', 'add-tables', 'public.nova_tabela', 'format-version', '2');

{"action":"I","schema":"public","table":"nova_tabela","columns":[{"name":"id","type":"integer","value":3}
{"name":"varchar","type":"character varying","value":"mais um teste do berna2"},
{"name":"bigint","type":"bigint","value":null},
{"name":"numeric","type":"numeric","value":null},
{"name":"decimal","type":"numeric","value":null},
{"name":"date","type":"date","value":null}]}

Anthony did you change the resource provide to source/destination in the Kubernetes deployment? This issue started after an upgrade or is a new sync?

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.