Source Harvest throws 403 Client Error

  • Is this your first time deploying Airbyte: Yes
  • OS Version / Instance: Ubuntu 18.04, New Digital ocean deploy
  • Memory / Disk: 2Gb / 1Tb SSD
  • Deployment: Docker
  • Airbyte Version: 0.30.25-alpha
  • Source name/version: Harvest 0.1.8
  • Destination name/version: MySQL 0.1.18
  • Step: Setting new connection, source / On sync
  • Description: I’m trying to sync for the first time and the process doesn’t finish.
022-04-05 17:40:58 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):190 - Replication thread started.

2022-04-05 17:40:59 INFO () DefaultAirbyteStreamFactory(internalLog):98 - Starting syncing SourceHarvest

2022-04-05 17:40:59 ERROR () LineGobbler(voidCall):82 - SLF4J: Class path contains multiple SLF4J bindings.

2022-04-05 17:41:01 INFO () DefaultAirbyteStreamFactory(lambda$create$0):62 - 2022-04-05 17:41:01 INFO i.a.i.d.m.MySQLDestination(main):125 - starting destination: class io.airbyte.integrations.destination.mysql.MySQLDestination

2022-04-05 17:41:08 INFO () DefaultReplicationWorker(run):121 - Source thread complete.

2022-04-05 17:41:08 INFO () DefaultReplicationWorker(run):122 - Waiting for destination thread to join.

2022-04-05 17:41:23 INFO () DefaultReplicationWorker(run):124 - Destination thread complete.

2022-04-05 17:41:23 ERROR () DefaultReplicationWorker(run):128 - Sync worker failed.

io.airbyte.workers.WorkerException: Source process exit with code 1. This warning is normal if the job was cancelled.

logs-11-2.txt (43.8 KB)

Can you deploy with more memory @bstrokevin ?

Hi. I’ve changed to 4GB ram. Is 4GB enough?
Also updated airbyte to * 0.35.65-alpha
I’m now getting a different error. Please see attachment.

I use personal auth for harvest connector. Reset data and run in the sync and failed.

2022-04-06 16:24:00 INFO i.a.w.t.TemporalUtils(withBackgroundHeartbeat):235 - Stopping temporal heartbeating…

2022-04-06 16:24:00 INFO i.a.c.p.ConfigRepository(updateConnectionState):731 - Updating connection b9d54989-9024-4513-a213-c7c6ad9d5268 state: io.airbyte.config.State@40a537b9[state={}]

2022-04-06 16:24:00 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed.

errors: $.client_id: is missing but it is required, $.client_secret: is missing but it is required, $.refresh_token: is missing but it is required, $.api_token: is not defined in the schema and the schema does not allow additional properties, $.auth_type: must be a constant value Client, $.auth_type: does not have a value in the enumeration [Client]

I’m not sure the error log say it need $client_id and $.client_secret but I’m using personal auth.
logs-19.txt (97.1 KB)

![Screen Shot 2022-04-06 at 9.21.09 AM|557x500](upload://lCWFYjeFaEplYnR9qlbgpY9EKlk.png)

@marcosmarxm sorry forgot to tag you on the previous message

The error from your logs is:

2022-04-06 16:23:40 e[44msourcee[0m > 403 Client Error: Forbidden for url: https://api.harvestapp.com/v2/estimate_item_categories?per_page=50&updated_since=2017-01-25T00%3A00%3A00%2B00%3A00

Looks the API/user don’t have the permission to pull data from that endpoint.
https://developers.greenhouse.io/harvest.html#errors

Is it possible for you try curl the url manual and see the return message and contact Harvest?

HTTP/2 200

server: nginx

date: Wed, 06 Apr 2022 19:02:35 GMT

content-type: application/json; charset=utf-8

x-frame-options: SAMEORIGIN

x-xss-protection: 1; mode=block

x-content-type-options: nosniff

x-download-options: noopen

x-permitted-cross-domain-policies: none

referrer-policy: strict-origin-when-cross-origin

cache-control: private, no-store

x-app-server: harvestapp-7d5dc745d5-vfbzn

x-robots-tag: noindex, nofollow

content-security-policy: report-uri https://cspreports.harvestapp.com/csp_reports; default-src *; img-src * data:; font-src data: too many links… removed

etag: W/“d7fa815625e9cd88cf395c2b372a735d”

x-request-id: c447605ed5b7e2d7a1889d7b8e6c0703

x-runtime: 0.037718

strict-transport-security: max-age=31536000; includeSubDomains

strict-transport-security: max-age=31536000; includeSubDomains

vary: Origin

via: 1.1 google

alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v=“46,43”

@bstrokevin you should do something like this:

curl "https://api.harvestapp.com/v2/estimate_item_categories\?per_page\=50\&updated_since\=2017-01-25T00%3A00%3A00%2B00%3A00" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Harvest-Account-Id: $ACCOUNT_ID" \
  -H "User-Agent: MyApp (yourname@example.com)"

The response:

{"error":"invalid_token","error_description":"The access token provided is expired, revoked, malformed or invalid for other reasons."}%

Please change the values to use yours access token and account id to get the correct response.

                                                                                                                                                                                            ➜  ~ curl "https://api.harvestapp.com/v2/estimate_item_categories?per_page=50&updated_since=2017-01-25T00:00:00Z" \             

-H “Authorization: Bearer 2211609.pt.bFLrkpd06wabpNjm6EICdLAmPuiIPYbWnz9BuGqvKvvBVbP824ORA81WZ2cMqJazaLaloNTCTES7tFk3x6iYpg”
-H “Harvest-Account-Id: 1254528”
-H “User-Agent: MyApp (kevin@bstro.com)”
{“message”:“Not authorized!”}%

Seems like not authorized. I don’t see additional options when I create the personal access token…

https://api.harvestapp.com/api/v2/users/me.json

gets

{“id":3179693,“first_name”:“Kevin”,“last_name”:“Chan”,“email”:"kevin@bstro.com”,“telephone”:"",“timezone”:“Pacific Time (US & Canada)”,“weekly_capacity”:144000,“has_access_to_all_future_projects”:false,“is_contractor”:false,“is_admin”:true,“is_project_manager”:false,“can_see_rates”:true,“can_create_projects”:true,“can_create_invoices”:true,“can_close_account”:false,“is_active”:true,“calendar_integration_enabled”:false,“calendar_integration_source”:null,“created_at”:“2020-03-11T20:52:48Z”,“updated_at”:“2022-02-14T18:16:11Z”,“default_hourly_rate”:null,“cost_rate”:null,“roles”:[“Dev Team”,“Leadership Team”,“Ops Team”],“avatar_url”:""}%

You need to check with Harvest why you API Token doesn’t have those access.

ok I will check with harvest

If I want to use OAuth2
It asked for refresh token in the setting page.

When I try to set it up, it ask for redirect URL… so I can get the tokens… But what’s the redirect URL? My air byte is hosted at http://137.184.165.36:8000/

After talking to the harvest support. They told me we didn’t have the “Estimate” module turn on. So the API can’t access the estimate data.

It seems like there’s a few modules we don’t have turn on. So i check my “Replication” tab and stop syncing on tables that are used by the disabled modules. I am turning off additional steams as I run into each errors.

This is good to know as I’m sure it will help other users.

Thanks for sharing this @bstrokevin

@marcosmarxm
Almost giving up…

After disable sync on some tables…
The sync say “Succeeded” But no records created. Not even the tables. I’ve removed the connections and add it again.

62.56 MB | 60,861 emitted records | 60,861 committed records | 5m 53s | Sync

2022-04-07 17:39:04 INFO i.a.c.p.ConfigRepository(updateConnectionState):731 - Updating connection 930cecf9-2cfd-4bae-b742-a49a91175814 state: io.airbyte.config.State@4dc724d2[state={}]

2022-04-07 17:39:04 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed.

errors: $.client_id: is missing but it is required, $.client_secret: is missing but it is required, $.refresh_token: is missing but it is required, $.api_token: is not defined in the schema and the schema does not allow additional properties, $.auth_type: must be a constant value Client, $.auth_type: does not have a value in the enumeration [Client]


logs-49.txt (57.9 KB)

Is it working now @bstrokevin?
Because from the logs the sync finished and was able to transfer the data.
What namespace are you using?

hi @marcosmarxm it’s not working. No tables created with data. Only airbtye tmp tables.
Destination Namespace → Mirror source structure.


From your logs the sync was successful and transfer data.


2022-04-07 17:39:04 e[32mINFOe[m i.a.w.DefaultReplicationWorker(run):228 - sync summary: io.airbyte.config.ReplicationAttemptSummary@243153c4[status=completed,recordsSynced=60861,bytesSynced=65603006,startTime=1649352791406,endTime=1649353144258,totalStats=io.airbyte.config.SyncStats@675a9839[recordsEmitted=60861,bytesEmitted=65603006,stateMessagesEmitted=0,recordsCommitted=60861],streamStats=[io.airbyte.config.StreamSyncStats@5699dabd[streamName=clients,stats=io.airbyte.config.SyncStats@2bc403aa[recordsEmitted=49,bytesEmitted=10425,stateMessagesEmitted=<null>,recordsCommitted=49]], io.airbyte.config.StreamSyncStats@5c5da65[streamName=projects,stats=io.airbyte.config.SyncStats@36ed01b4[recordsEmitted=235,bytesEmitted=163931,stateMessagesEmitted=<null>,recordsCommitted=235]], io.airbyte.config.StreamSyncStats@2e3979e6[streamName=project_assignments,stats=io.airbyte.config.SyncStats@25ebe6d8[recordsEmitted=371,bytesEmitted=696967,stateMessagesEmitted=<null>,recordsCommitted=371]], io.airbyte.config.StreamSyncStats@e5616f2[streamName=project_budget,stats=io.airbyte.config.SyncStats@50e8562d[recordsEmitted=185,bytesEmitted=46633,stateMessagesEmitted=<null>,recordsCommitted=185]], io.airbyte.config.StreamSyncStats@357be39f[streamName=roles,stats=io.airbyte.config.SyncStats@4f5c1ece[recordsEmitted=9,bytesEmitted=1370,stateMessagesEmitted=<null>,recordsCommitted=9]], io.airbyte.config.StreamSyncStats@71036051[streamName=company,stats=io.airbyte.config.SyncStats@63480962[recordsEmitted=1,bytesEmitted=475,stateMessagesEmitted=<null>,recordsCommitted=1]], io.airbyte.config.StreamSyncStats@9c5798f[streamName=time_entries,stats=io.airbyte.config.SyncStats@2640ac17[recordsEmitted=58595,bytesEmitted=64263204,stateMessagesEmitted=<null>,recordsCommitted=58595]], io.airbyte.config.StreamSyncStats@4c2b5166[streamName=task_assignments,stats=io.airbyte.config.SyncStats@6f0b5298[recordsEmitted=1317,bytesEmitted=378317,stateMessagesEmitted=<null>,recordsCommitted=1317]], io.airbyte.config.StreamSyncStats@4da047f7[streamName=tasks,stats=io.airbyte.config.SyncStats@6e237a40[recordsEmitted=60,bytesEmitted=12726,stateMessagesEmitted=<null>,recordsCommitted=60]], io.airbyte.config.StreamSyncStats@7815a683[streamName=users,stats=io.airbyte.config.SyncStats@4d3fa48b[recordsEmitted=39,bytesEmitted=28958,stateMessagesEmitted=<null>,recordsCommitted=39]]]]

Can you check the schema or find a schema called harvest?

@marcosmarxm
The database is bstroharvest. I don’t see any tables apart from _airbtye*
I resync again. Same …

Should I delete the database, connectors and starts again?

The end of the log say
2022-04-08 16:27:42 INFO i.a.v.j.JsonSchemaValidator(test):56 - JSON schema validation failed.

What does that mean?

logs-52.txt (55.2 KB)

e

FYI i’m also getting the same error with MySQL source to Snowflake (CDC incremental + dedupe, and when it’s reset, the first sync fails with this error)
EDIT: this works after resetting (possibly with SCD existing and not requiring a full-refresh)

@bstrokevin sorry the long delay in this topic. Are you still having problem in this connection?