Slow extraction time with Slack Connector for large data volumes

Summary

The user is experiencing slow extraction time with the Slack Connector when retrieving data from Public channels. They are comparing it to Fivetran, which completed the extraction faster. They are seeking assistance to determine if the slow performance is expected or if there are ways to improve it.


Question

Hi Team, I am trying to use Slack Connector to get all Slack data from Public channels and it is taking me 14 hours for 5k records of complete extraction. Is this expected, I did try this with Fivetran which was able to complete in 2 hours. can someone help me , if this is expected behaviour or if you are able to see different load times in your instance with more data.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["slack-connector", "data-extraction", "public-channels", "performance", "fivetran"]

I understand that you’re experiencing a delay in data extraction using the Slack Connector. This could be due to a variety of factors, including your specific settings, the amount of data being processed, or potential issues with the connector itself. Here are a few steps you can take to troubleshoot this issue: 1. Check your settings: Ensure that you’ve correctly set up the Slack Connector in Airbyte. This includes authenticating your account, setting your start_date and lookback_window, and configuring your join_channels and channel_filter settings. You can refer to the setup guide for more details. 2. Review your data: The amount of data being processed can impact the time it takes for the extraction to complete. If you’re pulling a large amount of data, it may naturally take longer. 3. Check for known issues: There might be known issues with the Slack Connector that could be causing the delay. For example, a Github issue was reported where only some thread content was retrieved, which could potentially slow down the extraction process. 4. Share your sync logs: If you’re still experiencing issues, it would be helpful to share your sync logs to identify what might be causing the delay. A maintainer suggested this in a similar discussion. If you’ve checked all of these and are still experiencing issues, it might be best to reach out to the Airbyte team directly for further assistance.

Haha, I think we talked on Reddit, haven’t we? <@U067F0ZASG7>

We’re looking into this. What we see in logs is that our Slack connector makes a “stream”, i.e. at least one separate http request, per message thread. Wouldn’t be a problem if you could just opt-in to syncing threads, but it does sync threads by default.

It would be interesting to check if Fivetran and Airbyte sync the same streams / same data by default. If Fivetran optimizes for faster initial syncs but does not sync the threads, for example, that would explain the discrepancy.

Ultimately we’re both bound by the same rate limiting.

<@U069EMNRPA4> - I confirm that threads are causing this issue, now the blocker is my team wants threads and we generate a volume of 300K messages in a day and I am not sure how this solution will work. I will like one more help from you .

These will be the OOB connectors we will be using in Airbyte.
Slack
Zoom
JIRA Cloud
Asana
Okta

Do you have any known issues in this?
I worked on Okta and saw that the current connector only works for http://OKta.com|OKta.com domains and won’t accept http://OktaPreview.com|OktaPreview.com or CustomDomain.

This will help me to further clarify the limitations in the recommendations I have to provide

Re: slack: if the threads sync up in incremental mode fast enough after the initial sync, I expect that the sync will stay healthy. To fix the underlying performance problem, though, we need to look into how to batch up reading threads data without spawning N+1 requests.

The other connectors should generally be fine — and if issues do come up, they’re usually visible on GitHub tagged with that connector name. Do check that they already provide all the streams that you need — it’s possible the platforms added some new data streams that we did not build into the connector yet, but adding a stream should be straightforward.

Okta custom domains support: I have not looked into Okta connector myself yet, but hypothetically, we could put that on the roadmap and add a config setting for the connector to accept a custom domain.

Thanks, <@U069EMNRPA4> for the quick reply. I will explore further by setting this connector and see if anything is a blocker.