Summary
How to track the current status of a long running job and monitor the processing speed of rows
Question
How do we know the current status of long running job , like the speed at which X rows are processed
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["monitoring-progress", "long-running-job", "processing-speed", "job-status"]
Based on the information provided in the extracts, there isn’t a direct mention of a specific feature or method within Airbyte to monitor the current status of long-running jobs in real-time, specifically regarding the speed at which rows are processed. However, we can infer some practices that might help in understanding the performance of sync jobs: 1. Logging and Monitoring: Airbyte provides logs for each sync job, which can be useful for understanding the performance and any issues that might be occurring. For instance, in one of the extracts, there’s a mention of normalization logs showing the number of records processed and the time taken (Source). This can give you an idea of the throughput of your job at different stages. 2. Performance Benchmarks: Another extract discusses the performance of syncing a large table and mentions throughput metrics like the time taken to load a 6GB table and the corresponding data transfer speed (Source). While this doesn’t provide a real-time monitoring solution, analyzing logs post-sync can help establish performance benchmarks. 3. Connector-Specific Parameters: Some connectors might offer configuration options that indirectly affect the visibility of job progress. For example, the fetchSize
parameter for JDBC-based connectors influences how many rows are fetched in each batch, which could impact and reflect on the sync speed (Source). For real-time monitoring or more detailed insights into the speed at which rows are processed during a sync, you might need to rely on external tools or custom logging solutions that can parse Airbyte logs and provide more granular metrics. Additionally, exploring the Airbyte API or the Airbyte GitHub repository for any new features or community contributions related to monitoring could be beneficial. For the most accurate and up-to-date information, I recommend consulting the official Airbyte documentation or reaching out to the Airbyte community on their discussion forum or GitHub repository.
I don’t think we have that in the UI today, but that’s a project on our roadmap to show clear progress indication / progressbars
<@U069EMNRPA4> - Can I extract it from logs I am little overwhelmed by logs folder in workspace
I see this in sync stats logs
but not sure what each attribute means. we are at a point to compare performance between different tools or libraries , I am able to get it for melting by looking at log and no of records processed , I will like to do the same for Airbyte