I have an API endpoint that returns very large csv at once.
Is it safe to pass “stream=True” into request_kwargs method and iterate over response rows in parse_response method? Will response connection be automatically closed?
I am relating to iter_lines method of requests.Response object.
Hey can you help us understand how big the file can be? Also amazon source also has a similar implementation and we are doing good even though the file is big enough so would suggest you to try it.
Here I am talking about 3-4GB files. I don’t think it is good idea to store all of this data in memory at once since we’ve noticed few MemoryError exceptions in our old non-CDK connector and decided to rewrite download logic to streaming using requests.get(url, stream=True)
.
Can you please clarify what implementation are you relating to?
That’s a big file. Then yeah I think streaming is the possible way. This is the file https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-amazon-seller-partner/source_amazon_seller_partner/streams.py where you can find amazon implementation which also has CSV download.
Also what is this source you are talking about ?
Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.