Fields being dropped in connector builder despite adding them to declared schema

Summary

Fields added to declared schema in connector builder are still being dropped, causing them not to appear in the detected schema. Any idea what could be causing this issue?


Question

In connection builder, I have a set of fields that are being dropped.

  1. they don’t make it in the detected schema
  2. even after adding them in the declared schema, they are still being dropped.
    Any idea what is causing this?


This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["fields-dropped", "connector-builder", "declared-schema", "detected-schema"]

Hmm, I don’t know enough to give you a confident answer yet. However, based on the information provided, it seems like there might be a discrepancy between your declared schema and the actual data being returned by your source. In Airbyte, the “Declared schema” defines how the records will be shaped via a JSON Schema definition. It defines which fields and nested fields occur in the records, whether they are always available or sometimes missing and which types they are. This information is used by the Airbyte system for different purposes such as column selection, recreating the data structure with right columns in destination, and detecting schema changes. If the declared schema deviates from the detected schema, the “Detected schema” tab in the testing panel highlights the differences. Differences are not necessarily a problem that needs to be fixed - in some cases the currently loaded set of records in the testing panel doesn’t feature all possible cases so the detected schema is too strict. However, if the declared schema is incompatible with the detected schema based on the test records, it’s very likely there will be errors when running syncs. You can manually edit the declared schema in the UI and it will no longer be automatically updated when triggering test reads if you disable the Automatically import declared schema switch. For more detailed information, you can refer to the Airbyte documentation on Declared Schema.

In the response above, the issue seems to be airbytes column selection?

> even after adding them in the declared schema, they are still being dropped.
Can you elaborate on where you’re seeing them getting dropped? The assumption I’m making is that the reason they don’t make it in the detected schema is because those fields are not in the subset of records being returned during the builder test requests.

Do you mean that this is indeed the case, and you expect to, after having added them to the declared schema, show up in the test requests as set to null?

Hi Ella. Thanks jumping in to help me here.

as seen in the first screenshot, I see local_id in the Responses tab while running test.

As seen in the second screenshot, local_id is not available in the Records tab of testing.

Schema detection doesn’t pick it up and adding it manually doesn’t help.

And it’s not in the detected schema (I assume) - what did you try as the type in the declared schema?

Correct, and schema detection is not picking up local_id

I tried manually adding it into the declared schema both as number and string.

      "type": [
        "null",
        "number"
      ]
    },```
it didn't help

Hm. Adding null to the list is exactly what I wanted to check/suggest, so I’m not sure off the top of my head

I’ll see if I can get any insight

Thanks Ella. We are doing a POC to see if we can replace our manual python scripts.

Just to check, local_id is on the same object level as all the other fields there that are showing up? I can’t tell from the response screenshot since it only shows the one field

Slash can you provide the info for your recordSelector?

Correct. local_id is at the same level the rest.

Here are all the fields missing from Records that exist in the Response

is_running__release
local_id
provisioning_progress
download_progress
custom_longitude
custom_latitude
logs_channel
(...and more)```
Here is the `Response` body:
```{
  "status": 200, 
  "body": {
    "d": [
      {
        "id": 111111,
        "belongs_to__application": {
          "__id": 11111
        },
        "belongs_to__user": null,
        "actor": 11111,
        "should_be_running__release": null,
        "device_name": "DO NOT UNPLUG! Flashing FW...(111111)",
        "is_of__device_type": {
          "__id": 58
        },
        "uuid": "111111111111111111",
        "is_running__release": {
          "__id": 11111
        },
        "note": null,
        "local_id": null,
        "os_version": "xxxxxxxxOS 2021.10.2",
        "os_variant": "dev",
        "supervisor_version": "12.10.3",
        "should_be_managed_by__supervisor_release": null,
        "should_be_operated_by__release": {
          "__id": 111111
        },
        "is_managed_by__service_instance": {
          "__id": 111111
        },
        "provisioning_progress": null,
        "provisioning_state": "",
        "download_progress": null,
        "is_web_accessible": false,
        "longitude": "-1",
        "latitude": "1",
        "location": "xxxxxxxx, xxxxx",
        "custom_longitude": null,
        "custom_latitude": null,
        "is_locked_until__date": null,
        "is_accessible_by_support_until__date": null,
        "created_at": "2022-03-11T03:54:33.037Z",
        "modified_at": "2023-10-06T19:41:52.008Z",
        "is_undervolted": false,
        "logs_channel": null,
        "vpn_address": null
      }
    ]
  }
}```
And here is the `Records`
```[
  {
    "id": 111111,
    "belongs_to__application": {
      "__id": 111111
    },
    "actor": 9491920,
    "device_name": "DO NOT UNPLUG! Flashing FW...(111111)",
    "is_of__device_type": {
      "__id": 58
    },
    "uuid": "111111111111111111",
    "is_running__release": {
      "__id": 111111
    },
    "os_variant": "dev",
    "supervisor_version": "12.10.3",
    "should_be_operated_by__release": {
      "__id": 111111
    },
    "is_managed_by__service_instance": {
      "__id": 111111
    },
    "is_web_accessible": false,
    "longitude": "-1",
    "latitude": "1",
    "location": "xxxxxx",
    "created_at": "2022-03-11T03:54:33.037Z",
    "modified_at": "2023-10-06T19:41:52.008Z",
    "is_undervolted": false
  }
]```

another thought…is it possible to source the entire json as record rather than trying to parse it?

Can you share your record selector?

Is this it?

        type: RecordSelector
        extractor:
          type: DpathExtractor
          field_path:
            - d```

<@U047ANT3J84> would it be possible to jump on a 15 minute call to review this?

Sorry, I can’t do that and honestly am not the person most equipped to help you with this! But I’m still asking around for ideas :slightly_smiling_face:

No problem, Ella. And thank you for trying to help.