- Is this your first time deploying Airbyte?: No
- OS Version / Instance: COS on GCP
- Memory / Disk: you can use something like 4Gb / 1 Tb
- Deployment: Docker
- Airbyte Version: 0.36
- Source name/version: Salesforce 1.0.11
- Destination name/version:
- Step: The issue is happening during sync.
Hi, I’m trying to hide PII data from the source and would like to confirm on a few things.
I understand Airbyte doesn’t provide the feature to select columns yet and I have posted a question before.
This time the connection is to sync from Salesforce (not a database) to GCS. So the table view solution does not apply and unfortunately for some reasons there are PII columns like name that cannot be hidden in Salesforce (unless there is one?)
I tried the hacky solution provided in the github issue to edit the connection catalog directly to remove the PII columns. It seems to work since I don’t find them in the files on GCS.
Then this comment got me worried again:
I tried that solution and realised that the unwanted columns are still being synced in the raw table (they are removed in the normalised version). Is there a way to tackle that problem?
So my questions are:
- If the connection catalog does not include a column of a stream, does that mean data from that column won’t be extracted from the source?
- Or does it depends on the type of the connection? For example, mine is syncing over to GCS so it doesn’t have the option of normalization but it could be different if the destination is BigQuery as mentioned in the github comment?
- Where are the buffered data saved? From the log I see things like
Records read: 50000 (120 MB),
Flushing all 1 current buffers (180 MB in total), and
Finished writing data to ee4b2c65-5193-49ef-a5a5-c6233e04eef110743XXXXXXXX.csv. Are the buffered data saved on the host that runs Airbyte? I ask this because I want to inspect file
ee4b2c65-5193-49ef-a5a5-c6233e04eef110743XXXXXXXX.csvto make sure that no PII columns are extracted. Do you know where I can find this file or does it make sense to check it?