Suppose I want to build an analytics tool for Marketo users, using Airbyte to pull their data into a data warehouse. If I don’t yet have a customer using Marketo, I am stuck since I have no idea what the data would look like. What is needed is a “sandbox” Marketo account. Is there a place where I can find such sandbox accounts for various sources?
You can try to reach Marketo to see if they can provide you a sandbox account. Airbyte has an integration account but is used by tests and development. Unfortunately we can’t share them.
It’s a really tedious process requesting a sandbox Marketo account.
I had another idea – the thing I really want is the set of DBT transformations that do the basic normalizations after an ingest from (say) Marketo. The various json files under source_marketo/schemas
contain all the “hard work” of specifying the details of the data elements that are returned from an API call. I have a theory, that I can run the test_normalization.py
in the integration_tests
pointing to the Marketo json schemas, and this will produce the needed DBT transformations.
The DBT transformations would of course have no data yet to actually run on, but I will have an idea of the resulting tables, and I can prepare my downstream DBT transformation models for my analytics tool. This way, I can be “ready” for a Marketo customer, i.e. as soon as I connect airbyte to their Marketo, I will have the downstream processes ready.
I may be totally wrong about generating the DBT models in this way though, let me know your thoughts
I see, I can run one example of Marketo output to you.
Thanks @marcosmarxm … I was actually able to create a catalog.json
for marketo under the base-normalizations and produced the dbt base-norm models.
But Marketo is just one example, and from the POV of someone trying to build a general analytics tool that can work for a variety of MarTech platforms, it would be nice to have a way to produce the “base-normalization” dbt models (such as airbyte_ctes
and airbyte_incremental
) without having any credentials for the source.
I do have HubSpot credentials for a test account, and looking closely at the docker logs during a sync, it looks like the key is this log line:
transform-config --config destination_config.json \
--integration-type bigquery --out /data/12/0/normalize
This depends on some configs and jsons generated by earlier docker cmds. I don’t know if there is an easy way to identify those.
So to summarize, my basic question is: is there a sequence of commands I can use to generate base-normalization dbt-models for a source X, without any credentials for X?
I know this may not be your target use case, but this ability would expand your potential use scenarios from “end-users with data they want to pull from sources X, Y, Z”, to “meta-level app-builders who want to build analytics tools for end-users who have data in X, Y, Z”.
Don’t know if that made sense, any pointers appreciated.
Today only doing the manual process you made to generate the models.
I created the issue Generate normalization models without running the sync or having credentials · Issue #12047 · airbytehq/airbyte · GitHub to implement this feature in the future.