S3 - Issue reading CSV when only header row present

Hi team, I am currently using Airbyte to read CSV files from S3 bucket
I am getting an error if the file only has headers.
Is there a configuration workaround that we can use for skipping files with headers only?

Traceback (most recent call last):
  File "/airbyte/integration_code/main.py", line 13, in <module>
    launch(source, sys.argv[1:])
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 129, in launch
    for message in source_entrypoint.run(parsed_args):
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/entrypoint.py", line 120, in run
    for message in generator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 123, in read
    raise e
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 114, in read
    yield from self._read_stream(
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 159, in _read_stream
    for record in record_iterator:
  File "/usr/local/lib/python3.9/site-packages/airbyte_cdk/sources/abstract_source.py", line 248, in _read_full_refresh
    for record in records:
  File "/airbyte/integration_code/source_s3/source_files_abstract/stream.py", line 452, in read_records
    yield from super().read_records(sync_mode, cursor_field, stream_slice, stream_state)
File "/airbyte/integration_code/source_s3/source_files_abstract/stream.py", line 346, in read_records
    file_reader = self.fileformatparser_class(self._format, self._get_master_schema())
  File "/airbyte/integration_code/source_s3/source_files_abstract/stream.py", line 235, in _get_master_schema
    raise RuntimeError(
RuntimeError: Detected mismatched datatype on column 'Count', in file 'test/Report.csv'. Should be 'integer', but found 'string'.

Hey the error doesn’t seem to be saying that the file is empty rather it says the row has mismatching data right? Am I missing something?

Hey Harshith,

The file is supposed to have a header followed by rows of data.

When the data syncs for the first time, the file has rows after the header. I am guessing that the data in the rows decide the data type based on my observations so far.

When the file has only a header row, I am seeing the error above.

I am not able to upload the CSV files here, so pasting the mock data in plain text below.

File for which the replication is working:
id,first_name,last_name,email,gender,ip_address,count
1,John,Doe,johndoe@hubpages.com,Male,127.0.0.1,1
2,Jane,Doe,janedoe@walmart.com,Female,127.0.0.1,1

File for which the replication is failing:
id,first_name,last_name,email,gender,ip_address,count

PS:

  • The file that we are dealing with is system generated and the file has only headers if there are no rows of data

Hey was able to reproduce it and have created an issue around this. Request you to give a +1 over the issue https://github.com/airbytehq/airbyte/issues/13171 and follow it there