We’re in the process of setting up incremental deduplicated streams from an Oracle DB to Bigquery. We let source systems handle GDPR deletions when possible.
If GDPR deletions are done by deleting the row, we would miss out on those changes, and still display the data that should have been deleted in the destination table.
If on the other hand the rows are cleared of sensitive data and we get them but emptied out, that would work as far as not having the deleted rows in the destination table, but they would still be present in both the raw and the scd tables.
Are there any good ways of handling either of these scenarios?
As a start we’ll just do non-sensitive streams incrementally. Hopefully we can keep doing full refreshes of sensitive tables.