Sandbox accounts with sample data?

Suppose I want to build an analytics tool for Marketo users, using Airbyte to pull their data into a data warehouse. If I don’t yet have a customer using Marketo, I am stuck since I have no idea what the data would look like. What is needed is a “sandbox” Marketo account. Is there a place where I can find such sandbox accounts for various sources?

You can try to reach Marketo to see if they can provide you a sandbox account. Airbyte has an integration account but is used by tests and development. Unfortunately we can’t share them.

It’s a really tedious process requesting a sandbox Marketo account.
I had another idea – the thing I really want is the set of DBT transformations that do the basic normalizations after an ingest from (say) Marketo. The various json files under source_marketo/schemas contain all the “hard work” of specifying the details of the data elements that are returned from an API call. I have a theory, that I can run the test_normalization.py in the integration_tests pointing to the Marketo json schemas, and this will produce the needed DBT transformations.

The DBT transformations would of course have no data yet to actually run on, but I will have an idea of the resulting tables, and I can prepare my downstream DBT transformation models for my analytics tool. This way, I can be “ready” for a Marketo customer, i.e. as soon as I connect airbyte to their Marketo, I will have the downstream processes ready.

I may be totally wrong about generating the DBT models in this way though, let me know your thoughts :slight_smile:

I see, I can run one example of Marketo output to you.

Thanks @marcosmarxm … I was actually able to create a catalog.json for marketo under the base-normalizations and produced the dbt base-norm models.

But Marketo is just one example, and from the POV of someone trying to build a general analytics tool that can work for a variety of MarTech platforms, it would be nice to have a way to produce the “base-normalization” dbt models (such as airbyte_ctes and airbyte_incremental) without having any credentials for the source.

I do have HubSpot credentials for a test account, and looking closely at the docker logs during a sync, it looks like the key is this log line:

transform-config --config destination_config.json \
  --integration-type bigquery --out /data/12/0/normalize

This depends on some configs and jsons generated by earlier docker cmds. I don’t know if there is an easy way to identify those.

So to summarize, my basic question is: is there a sequence of commands I can use to generate base-normalization dbt-models for a source X, without any credentials for X?
I know this may not be your target use case, but this ability would expand your potential use scenarios from “end-users with data they want to pull from sources X, Y, Z”, to “meta-level app-builders who want to build analytics tools for end-users who have data in X, Y, Z”.

Don’t know if that made sense, any pointers appreciated.

Today only doing the manual process you made to generate the models.
I created the issue Generate normalization models without running the sync or having credentials · Issue #12047 · airbytehq/airbyte · GitHub to implement this feature in the future.

1 Like