Source Local File: Sync fails (no records but no errors) on ingest CSV (Files source)

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu 18.04
  • Memory / Disk: 64GiB / 512MB
  • Deployment: Docker (Nomad)
  • Airbyte Version: 0.39.38-alpha
  • Source name/version: Files
  • Destination name/version: Postgres
  • Step: During Sync
  • Description:

I’m trying to ingest some publicly available aircraft data. It’s a 83MB CSV file.

I see the sync successfully extracting many records, but then no error is logged and yet it claims that 0 rows were read. Here’s a relevant extract of the logs:

# (Emits thousands of lines of records like the following.)
2022-08-02 19:24:34 source > {"type": "RECORD", "record": {"stream": "aircraft", "data": {"operatorcallsign": null, "operatoriata": null, "linenumber": null, "categoryDescription": null, "built": null, "serialnumber": "016", "testreg": null, "typecode": "Z37T", "adsb": false, "manufacturericao": "MORAVAN", "notes": null, "icao24": "49c94a", "operator": null, "registration": "OK-RJJ", "operatoricao": null, "registered": null, "engines": null, "owner": "Private", "acars": false, "manufacturername": "Moravan", "model": "Zlin Agro-Turbo Z-37T", "reguntil": null, "status": NaN, "modes": false, "seatconfiguration": NaN, "firstflightdate": null, "icaoaircrafttype": "L1T"}, "emitted_at": 1659468274000}}
2022-08-02 19:24:34 source > {"type": "RECORD", "record": {"stream": "aircraft", "data": {"operatorcallsign": null, "operatoriata": null, "linenumber": null, "categoryDescription": null, "built": "1946-01-01", "serialnumber": "12-1428", "testreg": null, "typecode": null, "adsb": false, "manufacturericao": null, "notes": null, "icao24": "a2fffb", "operator": null, "registration": "N2925M", "operatoricao": null, "registered": null, "engines": "LYCOMING 0-235 SERIES", "owner": "Bedgar Dean", "acars": false, "manufacturername": "Piper", "model": "PA-12", "reguntil": "2025-02-28", "status": NaN, "modes": false, "seatconfiguration": NaN, "firstflightdate": null, "icaoaircrafttype": null}, "emitted_at": 1659468274000}}
2022-08-02 19:24:35 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):328 - Source has no more messages, closing connection.
2022-08-02 19:24:35 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):336 - Total records read: 0 (0 bytes)

No errors in stderr either. I’m unsure how to troubleshoot this.

As a follow-up, I was able to get rows successfully syncd by specifying a Reader Options parameter to treat all columns as strings.

{"dtype": "str"}

Note: specifically using str instead of the value string here, as the latter caused pandas to raise errors as it attempted to cast to StringArray.

Hi @hozn, thanks for your post and welcome to the community. I’m happy to hear you were able to sync successfully and that the you were able to ingest the data from the csv files. I believe pandas throws an error if you try to convert a string with numbers into a StringArray, maybe that was the case here. Regardless, thanks for following-up on your post with a solution!