Sync fails (no records but no errors) on ingest CSV (Files source) to Postgres destination

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu 18.04
  • Memory / Disk: 64GiB / 512MB
  • Deployment: Docker (Nomad)
  • Airbyte Version: 0.39.38-alpha
  • Source name/version: Files
  • Destination name/version: Postgres
  • Step: During Sync
  • Description:

I’m trying to ingest some publicly available aircraft data. It’s a 83MB CSV file.

I see the sync successfully extracting many records, but then no error is logged and yet it claims that 0 rows were read. Here’s a relevant extract of the logs:

# (Emits thousands of lines of records like the following.)
2022-08-02 19:24:34 source > {"type": "RECORD", "record": {"stream": "aircraft", "data": {"operatorcallsign": null, "operatoriata": null, "linenumber": null, "categoryDescription": null, "built": null, "serialnumber": "016", "testreg": null, "typecode": "Z37T", "adsb": false, "manufacturericao": "MORAVAN", "notes": null, "icao24": "49c94a", "operator": null, "registration": "OK-RJJ", "operatoricao": null, "registered": null, "engines": null, "owner": "Private", "acars": false, "manufacturername": "Moravan", "model": "Zlin Agro-Turbo Z-37T", "reguntil": null, "status": NaN, "modes": false, "seatconfiguration": NaN, "firstflightdate": null, "icaoaircrafttype": "L1T"}, "emitted_at": 1659468274000}}
2022-08-02 19:24:34 source > {"type": "RECORD", "record": {"stream": "aircraft", "data": {"operatorcallsign": null, "operatoriata": null, "linenumber": null, "categoryDescription": null, "built": "1946-01-01", "serialnumber": "12-1428", "testreg": null, "typecode": null, "adsb": false, "manufacturericao": null, "notes": null, "icao24": "a2fffb", "operator": null, "registration": "N2925M", "operatoricao": null, "registered": null, "engines": "LYCOMING 0-235 SERIES", "owner": "Bedgar Dean", "acars": false, "manufacturername": "Piper", "model": "PA-12", "reguntil": "2025-02-28", "status": NaN, "modes": false, "seatconfiguration": NaN, "firstflightdate": null, "icaoaircrafttype": null}, "emitted_at": 1659468274000}}
2022-08-02 19:24:35 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):328 - Source has no more messages, closing connection.
2022-08-02 19:24:35 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):336 - Total records read: 0 (0 bytes)

No errors in stderr either. I’m unsure how to troubleshoot this.

As a follow-up, I was able to get rows successfully syncd by specifying a Reader Options parameter to treat all columns as strings.

{"dtype": "str"}

Note: specifically using str instead of the value string here, as the latter caused pandas to raise errors as it attempted to cast to StringArray.