Enhancements to Google Drive Connector for S3 Destination

slack-user-airbyte · November 24, 2024, 6:12am

Summary

Enhancements requested for the Google Drive connector include adding file_id and file_url to metadata and enabling raw file extraction. Questions raised about modifying the generic airbyte_cdk[file-based] and configuring the connector to store raw files directly to S3.

Question

Hi,

We’re using the Google Drive source with an S3 destination and need to implement some critical enhancements to the connector. Here’s what we’re looking to achieve:

• Add file_id and file_url to Metadata:

Currently, the connector fetches details like file content, file name, and file path, but it doesn’t include file_id or file_url.
file_name is currently used as the primary key, but we’d like to switch to file_id as it’s more robust.
We’ve identified changes needed in the airbyte_cdk[file-based] to include these additional fields.
Question: Will modifying the generic airbyte_cdk[file-based] to support these fields affect other connectors relying on it?

• Enable Raw File Extraction:
The current setup uses the Unstructured platform for text extraction, but I’d like to be able to:

Extract tables from Excel files.
Extract both tables and images from PDFs.
Store raw files (binary data) from Google Drive directly to S3 without any processing.
Question: Is there a way to configure the connector to extract and store raw files instead of processed text?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["google-drive-connector", "s3-destination", "file-id", "file-url", "raw-file-extraction", "airbyte-cdk", "unstructured-platform", "text-extraction", "excel-tables", "pdf-tables", "raw-files"]}

Topic		Replies	Views
Enhancements Needed for Google Drive Connector Connector Questions connector , question , airbyte-cdk , s3-destination , google-drive	0	1	December 9, 2024
Syncing arbitrary files with Airbyte Connector Questions connector , question , s3 , google-drive , syncing-arbitrary-files	2	55	June 24, 2024
Transfer binary data from Google Drive to AWS S3 Connector Questions airbyte-connector , connector , question , google-drive , transfer-binary-data	0	33	May 14, 2024
Moving unstructured data from Google Cloud Storage to S3 with Airbyte Connector Questions airbyte-connector , connector , question , s3 , move-unstructured-data	3	145	June 23, 2024
Google Drive as a Source Missing Connector Questions connector , question , source , google-drive , missing	2	39	June 6, 2024

Enhancements to Google Drive Connector for S3 Destination

Summary

Question

Related topics