Docker-free commands to sync a source?

arbiter · April 15, 2022, 1:42pm

Currently one has to do docker-compose up then go to the UI and config source/destination to do a sync.

Is there a convenient place where I can find docker-free (“native”) series of commands to do the same? In other words, instructions to set up local python env, install dependencies, and then either a sequence of terminal cmds or python invocations to accomplish the same thing. I suppose I could look at the Dockerfile…but I’m suspecting it may not be as simple as that.

Thanks

alafanechere · April 15, 2022, 4:37pm

Hey @arbitrer,
If you are specifically interested in building a local environment to work on the normalization you can check this README. It explains how to setup the Python virtualenv and run the tests. Feel free to share a bit more about what you are trying to achieve so that I can give more practical examples.

arbiter · April 15, 2022, 9:30pm

Thanks @alafanechere, yes my main interest is in obtaining the dbt normalization models for a source that I may not necessarily have credentials for. And yes I was actually able to run the tests under base-normalization and even was able to manually create a catalog.json for Marketo (for which I don’t have credentials), and added a script to splice in the specific schemas of each type (e.g. campaign, lead) into the streams list in the catalog (which has empty json_schema: {} stubs).

I tried to generalize my script to other sources, but saw that they were all organized a bit differently (e.g. some have schemas/blah.json and others have schemas.json, etc). In any case, I thought it would be good to know the overall end-to-end docker-free sequence of commands to go from source to destination and final normalization. If I know that, then I can peek into the stage where a catalog.json is created and hack together a script that directly produces this catalog and simply produces the normalization models without having to depend on any source credentials.

The nice thing about having the normalization models for sources X, Y, Z is that I can write downstream dbt models to transform these into a final unified table-structure on which my analytics tool can run. Then I can go to customers that use X, Y, Z and say we are ready to do analytics for them. I am aware that one needs actual data from these sources to get the fully normalized tables, but I think that is something that can be done once a customer actually connects their X, Y, Z tools.

alafanechere · April 19, 2022, 2:18pm

Hi @arbiter,
Unfortunately there’s no single run down process in Python that can be done for this use case. Airbyte’s protocol leverage docker containerization and is not language specific, this is why we also have java source connectors.
The standard way of obtaining a catalog is running the discover command on a source connector: e.g.

docker run airbyte/my-source:dev discover --config path_to_secret.json

As you can see from the command I shared above, the discover command requires secrets, hence access to data. This is because some connector have dynamically discovered schema (and not hard coded JSON schemas).

TLDR; the standard way of getting a source catalog is running the discover command on a source connector. To get it you need access to the source because some connectors dynamically generate schema. This is why normalization models are generated at sync time and not before the sync run.

Topic		Replies	Views
Catalog.json creation without actual source credentials? Q&A connectors	3	455	April 21, 2022
Is there a way to perform the DBT transformations server-less Connector Development normalization , transformation	2	540	October 18, 2022
Automating custom source creation for yaml and docker sources Connector Questions docker , connector , question , yaml , custom-source	0	32	June 8, 2024
Sandbox accounts with sample data? Connector Development connectors	5	588	April 14, 2022
Custom Transformation failure when dbt entry-point contains CMD line args Connector Questions & Issues source-mongo-db , destination-postgres , normalization , transformation	10	1002	July 14, 2022

Docker-free commands to sync a source?

Related topics