Plan to build parsers for CSV responses

Summary

Airbyte users are inquiring about the possibility of building parsers for CSV responses.


Question

Hello again.
IS there any plan to build a parsers for CSV responses?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["parsers", "csv-responses", "connector"]

Yes, there are plans to build a parser for CSV file types. According to a Github issue on Airbyte’s repository, the plan is to create a parser for CSV files as part of the basic implementation of certain methods. This parser will be used to create AirbyteRecordMessages for streams consisting of CSV files. However, it’s important to note that as of now, Airbyte’s low-code framework is primarily designed to handle JSON responses from REST APIs. If the API response is a CSV file, you can handle it by writing a custom component to override the extractor. This approach requires some coding and is not entirely low-code. For more details, you can refer to this discussion on Airbyte’s Github.

We’re considering this very seriously for the next few months.

Can you give me an example of how you’d use that? What api are you trying to work with?

Hello. Happy to hear that this is being considered.

This is one example of API: https://services.docs.unity.com/statistics/|https://services.docs.unity.com/statistics/

There is no option to choose the format of the output, only CSV. The API is pretty simples with HTTP Basic authentication and using query parameters to define the structure of the report.

As we work with multiple Marketing advertising partners, we need to deal with different types of APIs and sometimes this is the case.

If some steams are regular rest json, but one or a few are csv, perhaps we could bridge the gap. If you can start making the connector in the builder and link me to it, I might be able to give you an example of a custom component for csv parsing. Or maybe we can ship it within a month.

Are you using Airbyte for other syncs already?

We are using Airbyte for other syncs (APIs and Relational databases).

I have the code already for this connector with the parser implemented on the parse_reponse method and it is simple like that:

            self,
            response: requests.Response,
            stream_state: Mapping[str, Any],
            stream_slice: Mapping[str, Any] = None,
            next_page_token: Mapping[str, Any] = None,
    ) -> Iterable[Mapping]:
        # The response is a simple JSON whose schema matches our stream's schema exactly,
        # so we just return a list containing the response.
        csv_rows = csv.DictReader(response.text.splitlines())

        for row in csv_rows:
            yield {
                "timestamp": row["timestamp"],
                "country": row["country"],
                "spend": row["spend"],
            }```

Two things then :wink:

  1. Thank you for the snippet! I can’t promise we generalize it and put it into the low code CDK so you can do this in the Builder super soon, but we are looking.
  2. Would you be open to contributing the connector back into our catalog in GitHub? Basically, I’d imagine you already have a low-code connector with that custom component, right? I’d love to help you get from that to making a PR and getting this to be available to others.
    a. If we go that route, than when low-code CDK feature for CSV is available, I’ll ping you, and we’ll update the connector to use it.

We have implemented the connector using the Python CDK instead of low-code. It was not clear to use how to implement this custom component to replace the DPathExtractor.

But I’d love to contribute and share this connector with the community.

<@U069EMNRPA4> do you have any updates about the implementation of the CSV parser?

I can give you an example of a custom component, yeah. Sorry about the delay!

One sec, I’ll ask the team — I know we’ve written quite a few.

As for the low-code thing for CSVs — not right now, not a priority for Q1, but we’ll get to it too.

One example of a Hybrid connector:
• You can make most of the connector low-code, but have one stream that needs CSVs to be in Python CDK:
• <🚨🚨 Source Sendgrid: migrate to low code by bleonard · Pull Request #35776 · airbytehq/airbyte · GitHub an example!>
• Take a look at how Contacts stream is <airbyte/airbyte-integrations/connectors/source-sendgrid/source_sendgrid/streams.py at a2b5d13cc10197d766348103aad71b9b8c43c3f1 · airbytehq/airbyte · GitHub in >streams.py.
Disclaimer: that is NOT a custom component per se — that is a Python CDK stream embedded in a low-code component. We call this a “hybrid connector”

That works, but having a fully low-code connector with a custom component is preferable. We’ll post an example once one comes up :wink:

Hello! I’m looking forward to this CSV parser implementation too

<@U069EMNRPA4> hello! Can you please share any example of connectors with custom extractor, please?