Debugging large data issue in Airbyte

Summary

Airbyte error related to handling large data causing missing records in CDK command ‘test_read’.


Question

Anyone can help debug this : I know it is because of airbyte not being able to handle large data.

error : The CDK command test_read completed properly but no records nor trace were found. Logs were: 0.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["debugging", "large-data-issue", "airbyte", "cdk-command", "test-read"]

Based on the information provided, it seems like you’re encountering an issue where Airbyte fails when it receives a large response from the API. This issue typically occurs when the API response reaches a certain size threshold, in this case, around 25 MB. Above this threshold, it throws errors and fails to return the response properly. There are a couple of potential solutions to this issue: 1. Detect when the response is above the size threshold we can handle, and if so raise a clear error message to the user which recommends splitting up the data into smaller chunks through pagination or partitioning. This would involve modifying your data extraction process to limit the size of the data it retrieves at once. 2. Adjust the server / CDK process to be able to handle larger requests (e.g. give it more memory or refactor to be more performant). This could involve increasing the memory allocated to the Airbyte server or optimizing the way it processes data. Unfortunately, without more specific information about your setup and the exact error message you’re seeing, it’s difficult to provide a more precise answer. I would recommend checking the Airbyte documentation or the Airbyte GitHub issues for more information. Here is the relevant extract for your reference: > Source: https://github.com/airbytehq/airbyte/issues/34814 > Content: > # Github Issue (closed): Handle large responses gracefully > Original post Part 1 > What > Currently, the connector builder server fails in a very unclear way if it receives too large of a response from the API. > This is what it looks like when it fails this way: > From my testing, it looks like this happens when the API response gets to the ~25 MB threshold. Below that, it properly returns the response, but above that, it throws these errors in the deployment logs, which result in the error shown above: > Acceptance Criteria > We should handle this type of issue better, doing one of the following: > - Detect when the response is above the size threshold we can handle, and if so raise a clear error message to the user which recommends splitting up the data into smaller chunks through pagination or partitioning > - Adjust the server / CDK process to be able to handle larger requests (e.g. give it more memory or refactor to be more performant)