Worker process of Airbyte > v0.40.9 fails to start on custom S3 config

+1

I’m getting the exact same error with a very similar deployment on AWS EKS, using custom S3 logging. This was working with our previous version of Airbyte, v0.38.3-alpha, and started failing after the upgrade to v0.40.14. A resolution that does not involve reverting back to MINIO would be well appreciated here.

1 Like

Hey there, I previously created an issue requesting better documentation on these config options, please add a thumbs up and comment any other info you’d like to add: https://github.com/airbytehq/airbyte/issues/17649

Are you deploying with helm or kustomize?

I am seeing here that S3 should be an acceptable config option for WORKER_STATE_STORAGE_TYPE:
https://github.com/airbytehq/airbyte/blob/master/airbyte-workers/src/main/java/io/airbyte/workers/config/CloudStorageBeanFactory.java#L84

Created https://github.com/airbytehq/airbyte/issues/18016 to track this issue

Hello, I see that there has been some updates.

Please check to make sure you have these envs filled out (example in Helm): https://github.com/airbytehq/airbyte/blob/master/charts/airbyte/values.yaml#L23

state:
## state.storage.type Determines which state storage will be utilized. One of "MINIO", "S3" or "GCS"
storage:
type: "S3"
logs:
## logs.accessKey.password Logs Access Key
## logs.accessKey.existingSecret
## logs.accessKey.existingSecretKey
accessKey:
password: ""
existingSecret: ""
existingSecretKey: ""
## logs.secretKey.password Logs Secret Key
## logs.secretKey.existingSecret
## logs.secretKey.existingSecretKey
secretKey:
password: ""
existingSecret: ""
existingSecretKey: ""
## logs.storage.type Determines which log storage  will be utilized.  One of "MINIO", "S3" or "GCS"
##                   Used in conjunction with logs.minio.*, logs.s3.* or logs.gcs.*
storage:
  type: "s3"

##  logs.minio.enabled Switch to enable or disable the Minio helm chart
minio:
  enabled: false

##  logs.externalMinio.enabled Switch to enable or disable an external Minio instance
##  logs.externalMinio.host External Minio Host
##  logs.externalMinio.port External Minio Port
##  logs.externalMinio.endpoint Fully qualified hostname for s3-compatible storage
externalMinio:
  enabled: false
  host: localhost
  port: 9000

##  logs.s3.enabled Switch to enable or disable custom S3 Log location
##  logs.s3.bucket Bucket name where logs should be stored
##  logs.s3.bucketRegion Region of the bucket (must be empty if using minio)
s3:
  enabled: false
  bucket: airbyte-dev-logs
  bucketRegion: ""

Are you deploying with helm or kustomize?

We are deploying it with kustomize - I provided link above to the airbyte documentation which discusses the environment variables in context of the kustomize config files. Unfortunately, the example with Helm variables would not apply to us. Could you provide an example with kustomize configs here airbyte/kube/overlays/stable at master · airbytehq/airbyte · GitHub. Thank you!

2 Likes

I’m in the same situation and would like to know how to get this working with kustomize.

2 Likes

Hey all!

I was having the same issue. I’m using helm and am not super familiar with kustomize, but hopefully this helps. I had to set a couple more values in my values.yaml file to get it to work.

global:
  # ...
  logs:
    accessKey:
      password: <access_key_id>
      # Downstream charts don't use the secret created by the password above, so we need to pass in the secret info ourselves
      existingSecret: <helm_release_name>-airbyte-secrets
      existingSecretKey: AWS_ACCESS_KEY_ID
    secretKey:
      password: <secret_access_key>
      # Downstream charts don't use the secret created by the password above, so we need to pass in the secret info ourselves
      existingSecret: <helm_release_name>-airbyte-secrets
      existingSecretKey: AWS_SECRET_ACCESS_KEY

Dug in to the code and found out that basically the airbyte-worker and airbyte-server deployment.yaml files only set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables if the existingSecret and existingSecretKey are set, or if minio or externalMinio is enabled. There’s nothing there if I’m just passing in the password myself.

For your situation, I assume the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY aren’t being set properly on the worker/server for some reason. Hope that helps!

1 Like

We’re also attempting to upgrade to 0.40.22 with kustomize and run into the exact same problem with the worker as stated here. We’ve been using S3 for logging instead of Minio as well.

What should be the course of action here? Stay stuck to a version before the WORKER_* vars were introduced like 0.40.6? @sh4sh any clue?

1 Like

Hello Oleg Gusak, it’s been a while without an update from us. Are you still having problems or did you find a solution?

I am also stuck on this same problem.
Is there any update on any solution?

This is still an issue with using kustomization overlays for version 0.40.26. Oddly, the helm chart works correctly (for this, there are other things that are broken which is why I’m trying kustomization) so there’s probably a workaround.

This is still an issue with using kustomization overlays for version 0.40.26. Oddly, the helm chart works correctly (for this, there are other things that are broken which is why I’m trying kustomization) so there’s probably a workaround.

[Discourse post]

I have confirmed a workaround to get this fixed in version 0.40.26. In order to configure S3 logs correctly using the kustomization overlays, you need to follow the instructions found here as well as set WORKER_LOGS_STORAGE_TYPE=S3. Note that WORKER_STATE_STORAGE_TYPE needs to remain unchanged.

We are using Kustomize and our airbyte version is 0.40.23. The issue we are seeing is that we failed to set custom s3 as a state storage bucket. The workaround right now is to turn on mino back just for state information.

I put a fix before based on the limited knowledge I have.

Hi - I’m still having some trouble with this and wondered if you could confirm your set up

Env overlay

S3_LOG_BUCKET=<your_s3_bucket_to_write_logs_in>
S3_LOG_BUCKET_REGION=<your_s3_bucket_region>
# Set this to empty.
S3_MINIO_ENDPOINT=
# Set this to empty.
S3_PATH_STYLE_ACCESS=
WORKER_LOGS_STORAGE_TYPE=S3
# leave as is, for me, defaults to MINIO
# WORKER_STATE_STORAGE_TYPE=

Secrets overlay

AWS_ACCESS_KEY_ID=<your_aws_access_key_id>
AWS_SECRET_ACCESS_KEY=<your_aws_secret_access_key>

And that’s it? I’ve tried this on v0.40.28 and v0.40.26 but I’m still getting the same issue as the original post.

Thanks @rcheatham-q - your suggestion to set vars as

WORKER_LOGS_STORAGE_TYPE=S3
WORKER_STATE_STORAGE_TYPE=MINIO

worked for us too.

anyone has figured out this for GCS logs?
I’m not convinced that I should put minio related values if I have only gcs logs activated

Yes; we encountered a similar problem with GCS.

These configuration changes solved the issue for us (note that we are using the k8s manifests directly, not the helm chart):

  1. In .env, the env var GCS_LOG_BUCKET needs to be set to the log bucket and the additional variable called STATE_STORAGE_GCS_BUCKET_NAME needs to be set to the state storage bucket. As far as I can tell, STATE_STORAGE_GCS_BUCKET_NAME isn’t documented, but you can see that it is part of the GCS configuration block for the workers: airbyte/application.yml at 7676af5f5fb53542ebaff18a415f9c89db417055 · airbytehq/airbyte · GitHub . The Minio/S3 variables for us are mostly nulled out, so the config variables for logs and storage largely look like so:
# S3/Minio Log Configuration
S3_LOG_BUCKET=
S3_LOG_BUCKET_REGION=
S3_MINIO_ENDPOINT=
S3_PATH_STYLE_ACCESS=

# GCS Log Configuration
GCS_LOG_BUCKET=<log bucket>
STATE_STORAGE_GCS_BUCKET_NAME=<state bucket>

# State Storage Configuration
STATE_STORAGE_MINIO_BUCKET_NAME=
STATE_STORAGE_MINIO_ENDPOINT=

# Cloud Storage Configuration
WORKER_LOGS_STORAGE_TYPE=gcs
WORKER_STATE_STORAGE_TYPE=gcs
  1. Secondly, the manifests for the workers need to be modified to actually pass the GCS state bucket variables as they currently do not. In the airbyte-worker deployment (airbyte/worker.yaml at master · airbytehq/airbyte · GitHub), we added the following vars (note that GOOGLE_APPLICATION_CREDENTIALS are reused here, but it is probably better to have a separate SA credentials for writing state):
- name: STATE_STORAGE_GCS_BUCKET_NAME
  valueFrom:
    configMapKeyRef:
      name: airbyte-env
      key: STATE_STORAGE_GCS_BUCKET_NAME
- name: STATE_STORAGE_GCS_APPLICATION_CREDENTIALS
  valueFrom:
    secretKeyRef:
      name: airbyte-secrets
      key: GOOGLE_APPLICATION_CREDENTIALS

Hope this helps.

thanks a lot ! it works !

Hey, you work with which version of Airbyte?
have you tested 0.40.30+ and had issues with GCS logging?
c.f. Cloud Storage Configs are null for GCS logs storage type - #2 by marcosmarxm

Currently on 0.40.30, though the version is a bit irrelevant in my case - I use the manifests defined here: airbyte/kube at master · airbytehq/airbyte · GitHub, with the modification to worker.yaml from my post above. I never had an issue with GCS logging, but an issue with workers writing state to GCS - because the state bucket and creds are not getting passed in the worker deployment configs.

As far as I can tell, as of the latest commit on master, the manifests still have that issue and require the worker modification posted above for the deployment to function on GCP.