Response streaming in Python CDK custom connector

I have an API endpoint that returns very large csv at once.

Is it safe to pass “stream=True” into request_kwargs method and iterate over response rows in parse_response method? Will response connection be automatically closed?

I am relating to iter_lines method of requests.Response object.

Hey can you help us understand how big the file can be? Also amazon source also has a similar implementation and we are doing good even though the file is big enough so would suggest you to try it.

Here I am talking about 3-4GB files. I don’t think it is good idea to store all of this data in memory at once since we’ve noticed few MemoryError exceptions in our old non-CDK connector and decided to rewrite download logic to streaming using requests.get(url, stream=True).

Can you please clarify what implementation are you relating to?

That’s a big file. Then yeah I think streaming is the possible way. This is the file https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-amazon-seller-partner/source_amazon_seller_partner/streams.py where you can find amazon implementation which also has CSV download.

Also what is this source you are talking about ?

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.