Custom Transformation failure when dbt entry-point contains CMD line args

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 16Gb / 500 Gb
  • Deployment: Docker
  • Airbyte Version: * 0.36.3-alpha
  • Source name/version: MongoDB/0.1.13
  • Destination name/version: Postgres/0.3.18
  • Step: The issue is happening after sync and during transformation.
  • Description: I’m having a connection with
    sync_mode: incremental deduped history,
    sync frequency: 6 hours.
    Transformation Type: Raw Data JSON + custom DBT transformation.
    Entry point to custom transformation: eg: run --profiles-dir . --vars "{'dbname', 'postgres'}"
    So, whenever there are cmd line args after run in DBT cli entry point, I always end up getting the error “Could not find profile named ‘normalize’ .” during the syncs. However, if I remove the cmd line args and just put run, sync + custom transformation works just fine without any issue.

PS: The DBT transformations also works fine when run outside Airbyte with cmd line args

DBT docker image used is: fishtownanalytics/dbt:1.0.0

What I’m trying to achieve here is, to make airbyte create raw_* tables with (RAW data (JSON) normalization) and then apply a custom DBT transformation that will rely on these raw_* tables and then transform it into the desired form.

Hello @abhilash_m,
You can understand a bit more how CMD line args are processed by Airbyte in this entrypoint script.

To my understanding, if you set a --profiles-dir option Airbyte will look for a normalize profile.

Could you also try without the --profiles-dir option, it might be redundant.

Hey @alafanechere thanks for the reply. I’ll go through the entry point script.

This is what I’m assuming on how Airbyte works, please correct me if I’m wrong.
Firstly, it applies its own dbt transformation(On selection of Raw Data: JSON) which we don’t have much control on.(How it needs to be invoked with what params etc).

Then, custom transformation if any, will be executed. So, here in my case, I’m using the --profiles-dir . and other vars for the “custom dbt project” and not on the default dbt transformation that the Airbyte applies. So, Airbyte need not necessarily search for a ‘normalise’ profile on the custom DBT right?

Not exactly, raw data replication does not use DBT.
A transformation step with DBT only happens if you select “Normalized tabular data” and / or declare a custom transformation. I would advise to select Raw data in Normalization and add you custom transformation. This will only run your DBT project.

The error you have maybe comes from the fact you both selected Normalized tabular data and added a custom transformation?

Ha got it! So there is no DBT transformation on selecting RAW data: JSON, makes sense.

I’m actually selecting Raw Data itself in the normalization tab, with my custom DBT transformation. But still, see the error of ‘normalize’ profile not found.
Could it be because, the profile name is hardcoded to ‘normalize’ in entry_point ?

Yes, it could but it should not.

Quoting one of our engineers:

Custom transformations uses the normalization image to generate profiles.yml from config.json files of destination. Normalization script names its profile normalize. I don’t think that if the user gives their own profile.yml they have to name it normalize.

Well, in that case, looks like it’s a bug to me. Because I have my own profile.yml file in the custom dbt project, something like this

config:
  partial_parse: true
  printer_width: 120
  send_anonymous_usage_stats: false
  use_colors: true
abcdef:
  outputs:
    DEV:
      dbname: "{{ var('DBT_ENV_SECRET_DATABASE') }}"
      host: "{{ var('DBT_ENV_SECRET_HOST') }}"
      pass: "{{ var('DBT_ENV_SECRET_PASSWORD') }}"
      port: "{{ var('DBT_ENV_SECRET_PORT') | as_number }}"
      schema: "{{ var('DBT_ENV_SECRET_SCHEMA') }}"
      threads: 8
      type: postgres
      user: "{{ var('DBT_ENV_SECRET_USER') }}"
    QA:
      dbname: "{{ var('DBT_ENV_SECRET_DATABASE') }}"
      host: "{{ var('DBT_ENV_SECRET_HOST') }}"
      pass: "{{ var('DBT_ENV_SECRET_PASSWORD') }}"
      port: "{{ var('DBT_ENV_SECRET_PORT') | as_number }}"
      schema: "{{ var('DBT_ENV_SECRET_SCHEMA') }}"
      threads: 8
      type: postgres
      user: "{{ var('DBT_ENV_SECRET_USER') }}"
  target: QA

Also in the dbt_project.yml, the profile name is pointing to the same profile name as above

 # Name your project! Project names should contain only lowercase characters
 # and underscores. A good package name should reflect your organization's
 # name or the intended use of these models
 name: 'test_dbt'
 version: '1.0.0'
 config-version: 2
 
 # This setting configures which "profile" dbt uses for this project.
 profile: 'abcdef'
.
.
.

And on changing the profile name to ‘normalize’ on both dbt_project.yml and profiles.yml it seems to work fine now with Airbyte

1 Like

Thanks for checking this out @abhilash_m . Could you please open an issue on our GitHub repository?

1 Like

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.