After recent update json schema validation fails

I have created a source connector based on source-github connector.
Specifically, I am reusing schema files. Everything has been working just fine until an update to version above 0.38 (or so). I know this works just fine in 0.37.1.
So the error I get is as follows:

Additional Failure Information: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved

The primary schema file which is being used by the stream is comments.json (i’ve stripped out a few fields which are irrelevant, so this is not too big of a post) and is as follows:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "repository": {
      "type": ["string"]
    },
    "organization": {
                        "type": ["null", "string"]
                },
    "id": {
      "type": ["null", "integer"]
    },
    "node_id": {
      "type": ["null", "string"]
    },
    "user": {
      "$ref": "user.json"
    }
     }
}

user.json is sitting in subdirectory shared/ and looks like this: (I’ve stripped some fields out to make it easier to read)

{
        "type": ["null", "object"],
        "properties": {
                "repository": {
                        "type": ["string"]
                },
                "organization": {
                        "type": ["null", "string"]
                },
                "login": {
                        "type": ["null", "string"]
                },
                "id": {
                        "type": ["null", "integer"]
                },
                "node_id": {
                        "type": ["null", "string"]
                },
                "avatar_url": {
                        "type": ["null", "string"]
                },
                "gravatar_id": {
                        "type": ["null", "string"]
                },
                "url": {
                        "type": ["null", "string"]
                },
                "html_url": {
                        "type": ["null", "string"]
                }    "type": ["null", "string"]
                
        }
}

From the official json schema validator site:
https://json-schema.org/understanding-json-schema/structuring.html#ref

It talks about usage of $id. Your github schema files do not have $id entries. Maybe some magic is happening in the background?!

In short, I keep getting that json schema validation error and cannot run my connections.
Can someone help? It seems like a big issue.

Thank you

Hey could you help with the exact Airbyte version? If it’s not latest could you update it to latest and try again

Just switched to 0.39.20. Now I get different error. On my connection, I refresh the source schema (ie Replication tab). Then Click Save. It prompts to reset and I get a message below in red:
Something went wrong.
Internal Server Error.
Looking at the logs of temporal docker image, I get this:

{"level":"info","ts":"2022-06-16T07:35:43.736Z","msg":"query directly through matching on sticky timed out, attempting to query on non-sticky","service":"history","shard-id":2,"address":"172.23.0.3:7234","shard-item":"0xc0004ad180","component":"history-engine","wf-namespace":"default","wf-id":"connection_manager_73d8e40b-6db0-428f-8ece-d24dbe949ab8","wf-run-id":"1046b6a3-1d0e-41bb-9206-59964c6ab482","wf-query-type":"getJobInformation","wf-task-queue-name":"","wf-next-event-id":3,"logging-call-at":"historyEngine.go:923"}
{"level":"error","ts":"2022-06-16T07:35:43.741Z","msg":"query directly though matching on non-sticky failed","service":"history","shard-id":2,"address":"172.23.0.3:7234","shard-item":"0xc0004ad180","component":"history-engine","wf-namespace":"default","wf-id":"connection_manager_73d8e40b-6db0-428f-8ece-d24dbe949ab8","wf-run-id":"1046b6a3-1d0e-41bb-9206-59964c6ab482","wf-query-type":"getJobInformation","error":"java.lang.IllegalArgumentException: Unknown query type: getJobInformation, knownTypes=[]\n\tat io.temporal.internal.sync.QueryDispatcher.handleQuery(QueryDispatcher.java:79)\n\tat io.temporal.internal.sync.SyncWorkflowContext.handleQuery(SyncWorkflowContext.java:276)\n\tat io.temporal.internal.sync.WorkflowExecuteRunnable.handleQuery(WorkflowExecuteRunnable.java:121)\n\tat io.temporal.internal.sync.SyncWorkflow.query(SyncWorkflow.java:187)\n\tat io.temporal.internal.replay.ReplayWorkflowExecutor.query(ReplayWorkflowExecutor.java:136)\n\tat io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleQueryWorkflowTask(ReplayWorkflowRunTaskHandler.java:244)\n\tat io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:117)\n\tat io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:97)\n\tat io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:241)\n\tat io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:199)\n\tat io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n","logging-call-at":"historyEngine.go:941","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:142\ngo.temporal.io/server/service/history.(*historyEngineImpl).queryDirectlyThroughMatching\n\t/temporal/service/history/historyEngine.go:941\ngo.temporal.io/server/service/history.(*historyEngineImpl).QueryWorkflow\n\t/temporal/service/history/historyEngine.go:834\ngo.temporal.io/server/service/history.(*Handler).QueryWorkflow.func1\n\t/temporal/service/history/handler.go:968\ngo.temporal.io/server/common/backoff.RetryContext\n\t/temporal/common/backoff/retry.go:125\ngo.temporal.io/server/service/history.(*Handler).QueryWorkflow\n\t/temporal/service/history/handler.go:966\ngo.temporal.io/server/api/historyservice/v1._HistoryService_QueryWorkflow_Handler.func1\n\t/temporal/api/historyservice/v1/service.pb.go:1401\ngo.temporal.io/server/common/rpc/interceptor.(*RateLimitInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/rate_limit.go:83\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1113\ngo.temporal.io/server/common/rpc/interceptor.(*TelemetryInterceptor).Intercept\n\t/temporal/common/rpc/interceptor/telemetry.go:108\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/metrics.NewServerMetricsTrailerPropagatorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:113\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/metrics.NewServerMetricsContextInjectorInterceptor.func1\n\t/temporal/common/metrics/grpc.go:66\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngo.temporal.io/server/common/rpc.ServiceErrorInterceptor\n\t/temporal/common/rpc/grpc.go:131\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1.1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1116\ngoogle.golang.org/grpc.chainUnaryInterceptors.func1\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1118\ngo.temporal.io/server/api/historyservice/v1._HistoryService_QueryWorkflow_Handler\n\t/temporal/api/historyservice/v1/service.pb.go:1403\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1279\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1608\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:923"}

Got another error.
So I’ve switched back to 0.39.18.
That error above about workflow does not occur there.
For testing, I have removed all of the $ref entries in the schemas and replaced them with the actual data.

Now, Airbyte is complaining about id field in schema files:

2022-06-16 07:59:12 - Additional Failure Information: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: java.lang.UnsupportedOperationException: No suitable validator for id

Sorry. Another bug. Since we are talking about issues, one small thing that changed from previous releases. When you have a sync log open for currently running job, it does not refresh most of the times. You have to click on it to fold and click to unfold again to see updates.

Hey could you create a new post for the other error

I am not sure which error you meant :slight_smile: but I have created a separate post for the ID error.
I am surprised nobody else is seeing this problem. I am using default JSON schemas that come with source-github but in my connector.

Thanks for your help.

Is the source error still the same? If you could you share the code of this custom connector so that I can check it?

I have been digging in the logs of worker and server images. It seems the problem is related to new json validator that was introduced. Judging from json schema definitions, root has to have $id which supposed to have absolute URI. Does that get added by Airbyte somehow? Also, $ref cannot be resolved error means that /shared/file.json does not have an $id as well.

My connector is not doing anything weird. I just pass the schema file to it. Did something change when this new json validator was added?

Am I missing anything?

I am using plain Python Source connector (not the REST API one) generated by template generator.
In discover, I append my streams and specify json_schema field with json object loaded from schema file.

The failure occurs after the data gets read and about to be written to destination connector which is local json.

Got it.

I don’t see any changes over the source-github on adding id so I think the internals should be taking care of the $id then. Could you try these

  1. Change the destination and see if that is leading somewhere.
  2. Also could you share the logs of the sync when you run with Airbyte

So I know that your source-github connector is doing HTTP API type. I am doing plain python source. So in discover I create a list of streams. Each stream has json_schema parameter where I stick json object read from schema file.

I make a list of streams (where json_schema is json object read from schema file, the rest is read from my custom json config object)

streams.append(AirbyteStream(name=stream_name, json_schema=json_schema,supported_sync_modes=stream_list[stream_name]["sync_modes"], source_defined_cursor=False,default_cursor_field=stream_list[stream_name]["cursor_field"]))

In the end of the function I return:

return AirbyteCatalog(streams=streams)

Then in read, once I read the data, I spit it out to destination as follows: (where one is json object with data, and stream is just a name of stream)

record=AirbyteRecordMessage(stream=stream, data=one, emitted_at=int(datetime.now().timestamp()) * 1000)
yield AirbyteMessage(type=Type.RECORD, record=record)

The error below occurs after data gets read and I presume being send to destination connector. I am using Local JSON. All of this stuff is still working in 0.37.1. I think after you have introduced new json validator, these errors started to happen.

--== 2022-06-21 20:8:52 ==-- 7 comments - Page 1 downloaded


2022-06-21 20:08:52 ERROR c.n.s.JsonMetaSchema(newValidator):345 - Error:
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        ... 31 more
2022-06-21 20:08:52 ERROR c.n.s.JsonMetaSchema(newValidator):345 - Error:
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        ... 20 more
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):63 - Airbyte message consumer: succeeded.
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.d.l.LocalJsonDestination$JsonConsumer(close):174 - finalizing consumer.
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.d.l.LocalJsonDestination$JsonConsumer(close):190 - File output: /local/test_data2/_airbyte_raw_comments.jsonl
2022-06-21 20:08:53 destination > 2022-06-21 20:08:53 INFO i.a.i.b.IntegrationRunner(runInternal):153 - Completed integration: io.airbyte.integrations.destination.local_json.LocalJsonDestination
2022-06-21 20:08:53 ERROR i.a.w.g.DefaultReplicationWorker(run):180 - Sync worker failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:173) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.run(DefaultReplicationWorker.java:65) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:158) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:362) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        ... 1 more
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
        at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36) ~[json-schema-validator-1.0.42.jar:?]
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:480) ~[?:?]
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254) ~[json-schema-validator-1.0.42.jar:?]
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362) ~[json-schema-validator-1.0.42.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78) ~[io.airbyte-airbyte-json-validation-0.39.21-alpha.jar:?]
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312) ~[io.airbyte-airbyte-workers-0.39.21-alpha.jar:?]
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        ... 1 more
2022-06-21 20:08:53 INFO i.a.w.g.DefaultReplicationWorker(run):239 - sync summary: io.airbyte.config.ReplicationAttemptSummary@2587c4f5[status=failed,recordsSynced=0,bytesSynced=0,startTime=1655842126856,endTime=1655842133211,totalStats=io.airbyte.config.SyncStats@6a5a3a74[recordsEmitted=0,bytesEmitted=0,stateMessagesEmitted=0,recordsCommitted=0],streamStats=[]]
2022-06-21 20:08:53 INFO i.a.w.g.DefaultReplicationWorker(run):268 - Source did not output any state messages
2022-06-21 20:08:53 WARN i.a.w.g.DefaultReplicationWorker(run):276 - State capture: No new state, falling back on input state: io.airbyte.config.State@7ff799fe[state={}]
2022-06-21 20:08:53 INFO i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
2022-06-21 20:08:53 INFO i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):157 - sync summary: io.airbyte.config.StandardSyncOutput@b44e5b8[standardSyncSummary=io.airbyte.config.StandardSyncSummary@4e466ab4[status=failed,recordsSynced=0,bytesSynced=0,startTime=1655842126856,endTime=1655842133211,totalStats=io.airbyte.config.SyncStats@6a5a3a74[recordsEmitted=0,bytesEmitted=0,stateMessagesEmitted=0,recordsCommitted=0],streamStats=[]],normalizationSummary=<null>,state=io.airbyte.config.State@7ff799fe[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@2be9677a[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@7f420590[stream=io.airbyte.protocol.models.AirbyteStream@530756e5[name=comments,jsonSchema={"type":"object","$schema":"https://json-schema.org/draft/2020-12/schema","properties":{"id":{"type":["null","integer"]},"url":{"type":["null","string"]},"body":{"type":["null","string"]},"user":{"$ref":"user.json"},"node_id":{"type":["null","string"]},"user_id":{"type":["null","integer"]},"html_url":{"type":["null","string"]},"issue_url":{"type":["null","string"]},"created_at":{"type":["null","string"],"format":"date-time"},"repository":{"type":["string"]},"updated_at":{"type":["null","string"],"format":"date-time"},"author_association":{"type":["null","string"]}},"additionalProperties":false},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=false,defaultCursorField=[updated_at],sourceDefinedPrimaryKey=[],namespace=<null>,additionalProperties={}],syncMode=full_refresh,cursorField=[updated_at],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}],failures=[io.airbyte.config.FailureReason@2e6995fb[failureOrigin=replication,failureType=<null>,internalMessage=java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved,externalMessage=Something went wrong during replication,metadata=io.airbyte.config.Metadata@4edbe5fb[additionalProperties={attemptNumber=0, jobId=51}],stacktrace=java.util.concurrent.CompletionException: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
        at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1807)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:362)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
        ... 3 more
Caused by: com.networknt.schema.JsonSchemaException: #/properties/user/$ref: Reference user.json cannot be resolved
        at com.networknt.schema.RefValidator.<init>(RefValidator.java:43)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130)
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342)
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53)
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198)
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76)
        at com.networknt.schema.PropertiesValidator.<init>(PropertiesValidator.java:36)
        at jdk.internal.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at com.networknt.schema.ValidatorTypeCode.newValidator(ValidatorTypeCode.java:130)
        at com.networknt.schema.JsonMetaSchema.newValidator(JsonMetaSchema.java:342)
        at com.networknt.schema.ValidationContext.newValidator(ValidationContext.java:53)
        at com.networknt.schema.JsonSchema.read(JsonSchema.java:198)
        at com.networknt.schema.JsonSchema.initialize(JsonSchema.java:76)
        at com.networknt.schema.JsonSchemaFactory.newJsonSchema(JsonSchemaFactory.java:254)
        at com.networknt.schema.JsonSchemaFactory.getSchema(JsonSchemaFactory.java:362)
        at io.airbyte.validation.json.JsonSchemaValidator.validateInternal(JsonSchemaValidator.java:63)
        at io.airbyte.validation.json.JsonSchemaValidator.ensure(JsonSchemaValidator.java:78)
        at io.airbyte.workers.RecordSchemaValidator.validateSchema(RecordSchemaValidator.java:54)
        at io.airbyte.workers.general.DefaultReplicationWorker.validateSchema(DefaultReplicationWorker.java:383)
        at io.airbyte.workers.general.DefaultReplicationWorker.lambda$getReplicationRunnable$6(DefaultReplicationWorker.java:312)
        ... 4 more
,retryable=<null>,timestamp=1655842132854]]]
2022-06-21 20:08:53 INFO i.a.w.t.TemporalUtils(withBackgroundHeartbeat):237 - Stopping temporal heartbeating...
2022-06-21 20:08:53 INFO i.a.c.p.ConfigRepository(updateConnectionState):775 - Updating connection f8169f47-4869-4c24-915b-ec405056714a state: io.airbyte.config.State@2a274f33[state={}]
2022-06-21 20:08:53 INFO i.a.c.f.EnvVariableFeatureFlags(autoDisablesFailingConnections):14 - Auto Disable Failing Connections: false

Just a thought. Airbyte’s Github Source connector is HTTP API. Mine is Generic Python Source connector. Could that be the difference why Github Source connector has no issues with validating schemas, and mine does have issues?

Yeah possible because internally we could be handling some things

I can comment better if I can get hands-on over the code

Sure. I have just simplified the task. I have created plain Python Source connector.
I did not touch check function. I copied all of the schema files from source-github connector. I am trying to just test against one: comments.json
In discover, I am just loading that json file into json object and passing to AirbyteStream (just like default code does).
In read, I just add an example of record (static) to AirbyteMessage (just like default code does).

That’s it. I send it to Local JSON destination connector. And it fails with the original error about $ref user.json not resolved.

Here is the code of source.py and comments.json (copied from source_github)

import json
from datetime import datetime
from typing import Dict, Generator

from airbyte_cdk.logger import AirbyteLogger
from airbyte_cdk.models import (
    AirbyteCatalog,
    AirbyteConnectionStatus,
    AirbyteMessage,
    AirbyteRecordMessage,
    AirbyteStream,
    ConfiguredAirbyteCatalog,
    Status,
    Type,
)
from airbyte_cdk.sources import Source
import os

main_path = "/airbyte/integration_code/source_github_mine/"

class SourceGithubMine(Source):
    def check(self, logger: AirbyteLogger, config: json) -> AirbyteConnectionStatus:
        try:

            return AirbyteConnectionStatus(status=Status.SUCCEEDED)
        except Exception as e:
            return AirbyteConnectionStatus(status=Status.FAILED, message=f"An exception occurred: {str(e)}")

    def discover(self, logger: AirbyteLogger, config: json) -> AirbyteCatalog:
        streams = []

        stream_name = "comments"  # Example
        with open(os.path.join(main_path,"schemas","comments.json")) as f:
            json_schema = json.load(f)

        streams.append(AirbyteStream(name=stream_name, json_schema=json_schema))
        return AirbyteCatalog(streams=streams)

    def read(
        self, logger: AirbyteLogger, config: json, catalog: ConfiguredAirbyteCatalog, state: Dict[str, any]
    ) -> Generator[AirbyteMessage, None, None]:
        
        stream_name = "comments"  # Example
        data = {"url":"https://api.github.com/repos/curl/curl/issues/comments/785098704","html_url":"https://github.com/curl/curl/pull/6654#issuecomment-785098704","issue_url":"https://api.github.com/repos/curl/curl/issues/6654","id":785098704,"node_id":"MDEyOklzc3VlQ29tbWVudDc4NTA5ODcwNA==","user":{"login":"ghost","id":10137,"node_id":"MDQ6VXNlcjEwMTM3","avatar_url":"https://avatars.githubusercontent.com/u/10137?v=4","gravatar_id":"","url":"https://api.github.com/users/ghost","html_url":"https://github.com/ghost","followers_url":"https://api.github.com/users/ghost/followers","following_url":"https://api.github.com/users/ghost/following{/other_user}","gists_url":"https://api.github.com/users/ghost/gists{/gist_id}","starred_url":"https://api.github.com/users/ghost/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/ghost/subscriptions","organizations_url":"https://api.github.com/users/ghost/orgs","repos_url":"https://api.github.com/users/ghost/repos","events_url":"https://api.github.com/users/ghost/events{/privacy}","received_events_url":"https://api.github.com/users/ghost/received_events","type":"User","site_admin":False},"created_at":"2021-02-24T14:05:29Z","updated_at":"2021-04-19T09:16:36Z","author_association":"NONE","body":"<img src=\"https://www.deepcode.ai/icons/green_check.svg\" width= \"50px\" align= \"left\"/> Congratulations :tada:. DeepCode [analyzed](https://www.deepcode.ai/app/gh/curl/curl/56a037cc0ad1b2a770d0c08d3d09dee1ce600f0f/curl/curl/bfde4230450e7756e42a43f866879037e4bba340/pr/_/%2F/code/?utm_source=gh_review&c=0&w=0&i=0&) your code in 2.831 seconds and we found no issues. Enjoy a moment of no bugs :sunny:.\n\n#### 👉 View analysis in [**DeepCode’s Dashboard**](https://www.deepcode.ai/app/gh/curl/curl/56a037cc0ad1b2a770d0c08d3d09dee1ce600f0f/curl/curl/bfde4230450e7756e42a43f866879037e4bba340/pr/_/%2F/code/?utm_source=gh_review&c=0&w=0&i=0&) | [_Configure the bot_](https://www.deepcode.ai/app/gh/?ownerconfig=curl)\n","reactions":{"url":"https://api.github.com/repos/curl/curl/issues/comments/785098704/reactions","total_count":0,"+1":0,"-1":0,"laugh":0,"hooray":0,"confused":0,"heart":0,"rocket":0,"eyes":0},"performed_via_github_app":None,"repository":"curl/curl"}

        yield AirbyteMessage(
            type=Type.RECORD,
            record=AirbyteRecordMessage(stream=stream_name, data=data, emitted_at=int(datetime.now().timestamp()) * 1000),
        )
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "repository": {
      "type": ["string"]
    },
    "id": {
      "type": ["null", "integer"]
    },
    "node_id": {
      "type": ["null", "string"]
    },
    "user": {
      "$ref": "user.json"
    },
    "url": {
      "type": ["null", "string"]
    },
    "html_url": {
      "type": ["null", "string"]
    },
    "body": {
      "type": ["null", "string"]
    },
    "user_id": {
      "type": ["null", "integer"]
    },
    "created_at": {
      "type": ["null", "string"],
      "format": "date-time"
    },
    "updated_at": {
      "type": ["null", "string"],
      "format": "date-time"
    },
    "issue_url": {
      "type": ["null", "string"]
    },
    "author_association": {
      "type": ["null", "string"]
    }
  }
}

As per Airbyte’s documentation user.json is sitting in “shared” subfolder in schemas.

So the only weird thing I might be doing, is loading comments.json from disk with full path. But I am not sure if I can use relative or what is the best practice.

Thank you.

I assume something is wrong with
airbyte-cdk/python/airbyte_cdk/sources/utils/schema_helpers.py

HTTP python source connector does not have any references to json schemas. I assume it loads them somehow automatically.

Have created a issue around this so that team can look at it. https://github.com/airbytehq/airbyte/issues/14289 Feel free to continue discussion there also you can add any information if needed