Telemetry - Security questions

Hi folks!

Not sure if this is the right place for security questions, but I’m a security engineer in an organization thinking about using Airbyte. Regarding telemetry that’s being sent back to Airbyte, I had a few questions

  • What ports are sending out telemetry and to where? If we cut off that access will this break the Airbyte instance? Is it a common customer use case to deny all access except what’s needed for current connectors, or does that break behavior?
  • Are we getting responses back from Airbyte that influences container behavior in any way?
  • What does the “Anonymize my usage data” toggle do in the screenshot? If it’s off, does it mean no telemetry is being sent, or does it mean that it’ll be sent regardless without anonymization?
    • Is there an option to fully disable telemetry?

In case Airbyte is breached I’m worried about its impact, including reconnaissance about customer systems.

Related issue: https://github.com/airbytehq/airbyte/issues/9428

  • Is there a way to restrict the types of telemetry that goes back (i.e. don’t report connector job status, etc.)?

Thanks!

Hello @hrm, thank you for this interesting question!
Before answering your question let me remind you of the purpose of telemetry for Airbyte.
We use telemetry to build reporting on the dashboard to measure the usage of Airbyte and its connector. It allows us to easily identify the failing connectors and prioritize our development efforts on these connectors. E.G.: The fix for a heavily used connector that is often failing will be highly prioritized. The telemetry does not send the data you sync, only metadata about the job you run.

Is there an option to fully disable telemetry?

Yes, please check this post: Can I disable analytics in Airbyte?

Airbyte uses Segment to send telemetry data. The segment client we are using sends analytics data to Segment’s servers: https://api.segment.io (on the port 443 as this is https).
If you deny egress traffic on this port you basically disallow Airbyte to connect to the internet…

What does the “Anonymize my usage data” toggle do in the screenshot? If it’s off, does it mean no telemetry is being sent, or does it mean that it’ll be sent regardless without anonymization?

When you first connect to Airbyte we ask you to fill in an email address. Anonymizing my usage data means that the telemetry will be sent without your email address. In other words, we won’t be able to match your email address to the telemetry activity we receive.

Is there a way to restrict the types of telemetry that goes back (i.e. don’t report connector job status, etc.)?

Not at the moment, it’s all in our all out.

I hope I answered your questions, feel free to ask for more details if needed.

1 Like

Thanks @alafanechere for the responses!

Knowing that we can disable telemetry without impacting Airbyte, and also what mechanism it uses to send it out, is useful. Fully understand that telemetry has a legitimate use case and is helpful for development feedback; for us it’s a matter of balancing risk.

For example, if the airbyte/webapp image (a bad commit sneaks into airbyte or there’s a supply chain attack in a dependency) is compromised in such that way that exploits the included segment client to exfiltrate data to another account, it’d be something we can watch out for.

1 Like

Sure, but to mitigate this kind of risk the Airbyte team remains the maintainer of the repo and performs reviews and merge.