Summary
The user is facing a parsing error when trying to update an S3 source from version 3.1.0 to 4.7.6. The error is related to a mismatch between the config’s file type and the actual file type or because the file or record is not parseable.
Question
Hi All,
I’m trying to update a simple s3 source from v3.1.0 to v4.7.6, it is used to sync ndjson files from s3 to BQ.
Unfortunately I have some issue with creating and testing the source itself. I got the following parsing error.
Traceback (most recent call last):\n File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py", line 95, in _check_parse_record\n record = next(iter(parser.parse_records(stream.config, file, self.stream_reader, logger, discovered_schema=None)))\n File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/file_based/file_types/jsonl_parser.py", line 65, in parse_records\n yield from self._parse_jsonl_entries(file, stream_reader, logger)\n File "/usr/local/lib/python3.10/site-packages/airbyte_cdk/sources/file_based/file_types/jsonl_parser.py", line 122, in _parse_jsonl_entries\n raise RecordParseError(FileBasedSourceError.ERROR_PARSING_RECORD, filename=file.uri, lineno=line)\nairbyte_cdk.sources.file_based.exceptions.RecordParseError: Error parsing record. This could be due to a mismatch between the config\'s file type and the actual file type, or because the file or record is not parseable.
The configuration for v.3.1.0 was file format: jsonl, optional fields: allow newlines in values: true, unexpected field behavior: infer, block size: 540k
I can also select jsonl in the v.4.7.6, but I don’t have any optional fields here.
I’m not sure what else should I check or do in a different way.
Thanks,
Kornel
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.