Custom Transformation with dbt for API Connector to Postgres Sync

Summary

Guidance on starting custom transformation with dbt for API connector syncing to Postgres.


Question

I created a custom connector for API and sync to Postgres is running as expected. I want to do custom transformation with dbt . I am confused on how and where to start . I need some guidance.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["custom-transformation", "dbt", "api-connector", "postgres-sync"]

Sure, I can guide you on how to use dbt for custom transformations in Airbyte. 1. Understand dbt Transformations: Airbyte integrates natively with dbt for post-sync transformations. This is useful if you would like to trigger dbt models after a sync successfully completes. You can learn more about this in the Airbyte documentation. 2. Export dbt normalization project outside Airbyte: If you have dbt installed locally on your machine, you can view, edit, version, customize, and run the dbt models in your project outside Airbyte syncs. Here is an example of how to do it: bash #!/usr/bin/env bash dbt deps --profiles-dir=$NORMALIZE_DIR --project-dir=$NORMALIZE_DIR dbt run --profiles-dir=$NORMALIZE_DIR --project-dir=$NORMALIZE_DIR --full-refresh You can find more details here. 3. Add Custom dbt Transformations to Airbyte: To add back the transformations to Airbyte, you need to create a new git repo and push your modified code there. After having your code stored in a remote repository, you can add transformation steps in the settings page for the connection in Airbyte. You can find more details here. 4. Set up dbt Cloud integration in Airbyte Cloud: If you are using Airbyte Cloud, you can set up the dbt Cloud integration to create and run dbt transformations during syncs. This allows you to transform raw data into a format that is suitable for analysis and reporting. You can find more details here. Remember, if you make changes to the connection, for example, if you add a new table or column, the normalization step for the added entity does not exist in your custom code and thus will not run. Similarly, if you remove a table and do not change the custom normalization code, the dbt process will fail. Be sure to repeat the "Extract the generated dbt code from Airbyte” step to get the Airbyte generated code for your new tables. I hope this helps! Let me know if you have any other questions.

<@U04SE163WC8> I created dbt project for Postgres inside the custom connector folder. I created a model called mart_employees.sql. I pushed the codes to GitHub. I created a custom dbt transformation in the airbyte connection transformation settings. When I run sync now the logs say it is recognizing the dbt project but I am getting following error “2024-01-04 08:45:21 dbt > Running from /data/76/1/transform/git_repo
2024-01-04 08:45:21 dbt > detected no config file for ssh, assuming ssh is off.
2024-01-04 08:45:21 dbt > Running: dbt run --profiles-dir=/data/76/1/transform --project-dir=/data/76/1/transform/git_repo
2024-01-04 08:45:26 dbt > 08:45:26 Encountered an error:
2024-01-04 08:45:26 dbt > Runtime Error
2024-01-04 08:45:26 dbt > fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file”

How do I fix this ?

The error message you’re seeing indicates that dbt is unable to find the dbt_project.yml file in the specified directory. This file is crucial for dbt to recognize and run your project. Based on the information provided in the extracts, you might need to adjust the Entrypoint arguments for dbt cli to run the project in your Airbyte connection settings. As per the forum post, you should set the Entrypoint arguments for dbt cli to run the project as follows: run --select models/mart_employees.sql --project-dir=my-dbt-project-name Replace my-dbt-project-name with the actual name of your dbt project. Also, ensure that your dbt_project.yml file is located in the root directory of your dbt project. If it’s not, dbt won’t be able to find it and you’ll encounter the error you’re seeing. If you’re still encountering issues, you might want to validate your dbt project settings as described in the Airbyte documentation. This can help you identify any potential issues with your dbt project configuration.

<@U04SE163WC8> So I put the following code on custom transformation in airbyte ui “run --select models/mart/mart_employee.sql --project-dir=dbt_learn”

I am still getting error “fatal: Invalid --project-dir flag. Not a dbt project. Missing dbt_project.yml file”

A message from http://kapa.ai|kapa.ai

The error message indicates that dbt is unable to find the dbt_project.yml file in the specified directory. This file is crucial for dbt to recognize and run your project. Based on the information provided in the extracts, it seems like the --project-dir argument might not be pointing to the correct directory. The --project-dir should point to the root directory of your dbt project where the dbt_project.yml file is located. If your dbt_project.yml file is located in the dbt_learn directory, then your command seems correct. However, if dbt_learn is not the root directory of your dbt project, you’ll need to adjust the --project-dir argument to point to the correct directory. Also, ensure that your dbt_project.yml file is correctly formatted and does not contain any errors. You can refer to the dbt documentation for more information on how to structure this file. If you’re still encountering issues, you might want to validate your dbt project settings as described in the Airbyte documentation. This can help you identify any potential issues with your dbt project configuration.

<@U04SE163WC8> the dbt run command executes successfully when I run it in vs code sitting in the dbt_learn directory. The root directory for dbt is dbt_learn and dbt_project.yml is in the same directory.

If your dbt_project.yml file is indeed in the dbt_learn directory and the dbt run command executes successfully in your local environment, it seems like there might be an issue with how Airbyte is trying to access your dbt project. One possible issue could be the way the project directory is being referenced in the Airbyte environment. In the forum post, a user encountered a similar issue and the support agent suggested editing the dbt_project.yml file to specify the paths for the target, log, and modules directories. Here’s the suggested configuration: yaml target-path: "../build" # directory which will store compiled SQL files log-path: "../logs" # directory which will store DBT logs modules-path: "../dbt_modules" # directory which will store external DBT dependencies You could try adjusting these paths in your dbt_project.yml file to see if it resolves the issue. If the issue persists, it might be helpful to reach out to the Airbyte community on their forum or Slack for further assistance.

<@U04SE163WC8> Does the dbt project name and profile name require to have specific name i.e. “normalize” for dbt custom transformation to run on airbyte ?

Sorry <@U06B02N744F>, I am experiencing especially high traffic right now, please try again. :technologist:

<@U04SE163WC8> can u check my previous question?

I’m sorry, but as an AI, I don’t have the ability to access previous conversations or questions. Could you please repeat or provide more context to your question? I’m here to help with any questions related to Airbyte based on the information provided in the extracts.

<@U04SE163WC8> does airbyte require the dbt project to have name “normalize” ? Also is it a requirement that the profile should also be named “normalize” ?

No, it’s not a requirement for the dbt project or the profile to be named “normalize”. The name of your dbt project is defined in the dbt_project.yml file and can be any valid name that reflects your organization’s name or the intended use of the models (source). The profile name is also configurable and should match the profile name specified in your dbt_project.yml file. If you change the profile name in both dbt_project.yml and profiles.yml, it should work fine with Airbyte (source). However, if you prefer to use your own profiles.yml stored in the git repository or in the Docker image, you can specify an override with --profiles-dir=&lt;path-to-my-profiles-yml&gt; in the dbt CLI arguments (source). Remember, if you specify a custom profile file, you are responsible for securely managing the credentials. Currently, Airbyte doesn’t have a way to manage and pass secrets and it’s recommended you let Airbyte pass this to dbt (source).