Summary
User is looking to extract events/email data from SendGrid using Airbyte connector. They are seeking confirmation if the existing connector supports this data or if a custom connector needs to be built.
Question
Hey hey team,
I am looking to get events/email data out of SendGrid. I can see a Connector exists, but none of the streams seem to contain any of that granular data.
Just looking for confirmation that that is correct and want to ask if anyone has managed to build a custom connector to do this instead?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["sendgrid", "events-email-data", "connector", "custom-connector"]
<@U06DQ4RHPUJ> We ended up going the webhook route because our client didn’t want to pay for the add-on.
Interestingly the schema listed here doesn’t show anything about offset
or a pagination object in the response like you see on other endpoints:
https://www.twilio.com/docs/sendgrid/api-reference/email-activity/filter-all-messages
Not sure if that’s a docs omission or if it actually doesn’t paginate at all and if limit is just when you only want a subset of the data. You could also try NOT passing limit
and see if pagination fields show up in the response
You could also check with their support since this IS a paid add-on . . . I wonder if the docs are just poor since it isn’t used nearly as much. But to me if you can’t paginate those results, there isn’t much of a reason for the add-on (unless they’re expecting everyone to always filter by both the msg_id
and to_email
or something)
From what I can see so far, I just don’t think pagination is supported, and I think the offset
parameter is just not doing anything. You have to specify a limit
(not adding it causes the call to fail), so I think my best bet is to try and filter down to a timeframe that is small enough that I will be able to capture all the activity based on the last_event_time
without hitting the 1000 objects returned of the limit
Out of curiosity, how does your setup with the webhooks look? I’ve never really worked with them so don’t know what tools/infra they require
<@U06DQ4RHPUJ> I went down this road not long ago, and found that they don’t have any API coverage for event-stream data by default. The https://www.twilio.com/docs/sendgrid/api-reference/stats/retrieve-global-email-statistics|Stats endpoints only aggregate by date or other properties (so not subscriber-level), and <https://www.twilio.com/docs/sendgrid/api-reference/single-sends/get-all-single-sends|Get All Single Sends> only has the aggregates for the send, also no option for subscriber-level data.
There is an Additional Email Activity History add-on that enables access to the <https://www.twilio.com/docs/sendgrid/api-reference/email-activity|Email Activity Feed API> . I believe you pay for this for each additional 30 days you want, up to 13 months max retention. This is effectively message-level data, but you can apply queries to it (e.g. call it per-to_email
or per-marketing_campaign_id
, as well as filter down to messages with certain events like clicks > 0
or opens > 0
). So you could easily expand Airbyte’s connector to handle that as a child stream of the marketing campaigns, but it will require an additional subscription—and note that IIRC adding that only start collecting events at the time you add it, not historically.
If you don’t want to have to pay and don’t care about historical data, you can get this data by subscribing to their https://www.twilio.com/docs/sendgrid/for-developers/tracking-events/getting-started-event-webhook|webhooks which will send event data in near-real-time. Just keep in mind that Airbyte isn’t designed as a Webhook integration platform, so you’d need to work that part into another part of your pipeline. But you could let Airbyte pull your marketing campaigns to marry that data up with (if you’re using the marketing features).
We have a client using SendGrid as a marketing email platform, and I have to say it’s one of the most bizarre systems I’ve ever seen as such. It’s basically an odd sketch of what a transactional email platform thinks a marketing email system looks like, without having ever used one day-to-day before.
That is an insanely detailed response to my question and I really, REALLY appreciate it!
My client does have the Extended Email Activity History add-on so it does seem we have access to the raw data we’ll need
Nice. Honestly it should be a pretty easy add, but depends a bit on your use case. Are you using this for their marketing emails, or purely for transactional emails?
We’re hoping to do this to do some user segmentation and send them different emails based on how they respond:
• Send everyone an email, monitor their responses (by ingesting the data with Airbyte)
• Somebody didn’t click on it within 3 days? Tag them as so and send them a separate email reminding them to do so etc.
So really I’m hoping to use this to track user behaviour over time so I can group them into cohorts and send them more emails
Yeah, that makes sense.
My initial thought looking at their docs for the Email Activity API is that you can probably create an incremental stream using the query based on last_event_time
. . . but if you need granular engagement (e.g. whether they clicked within a certain period from the send) you’d probably need to use append mode (not deduping by Message ID), otherwise I think you’d only know the latest activity date.
Another thing I’m not sure of is that the example schema they show doesn’t return the marketing email ID, so it may be hard to line them up unless the msg_id
field matches (or contains) this in some form. If that becomes a problem, you could pull this as a child stream of marketing single sends and inject the marketing email ID from the parent stream. That will just split it across more API pulls. And you might have to do the same with marketing automations to get the full data.
Would love to hear how it works out!
Thank you so much, this is very useful.
I have the exact same idea, incremental append using the last_event_time.
From the data I pulled it doesn’t seem my client leverage campaigns at all so far so I might just ignore that for now, but switch to pulling campaigns -> passing the id as a parent stream to messages if they start using it
I know I’ve already used a lot of your time but did you get very far with the pagination? It’s behaving pretty oddly.
It seems that the data that gets returned in every partition is the same, even though the request is updating the offset value each time
Returns the exact same data as this: