Since this community is dedicated to the Airbyte ETL platform, I’m not sure you’ll get a lot of feedback here as it’s not product-related. You may want to ask something like this in one of the more Kubernetes-focused communities or forums.
With that said, I think you’re thinking about it wrong. When working on a task-oriented workflow, you likely don’t want the overhead of idle infrastructure from a cost standpoint, and the workers likely don’t need a lot of shared components.
But also, and probably more importantly: solve the first problems first and know them inside and out: how are you getting and queueing these jobs? How will you ensure that a job is only processed once? How will you know how much concurrency to allow? What happens with the results of the queries? How quickly do they need to run? Can they be batch workloads? Are there realtime/streaming workloads? What are the cost constraints for the business on this system? Is cost more important than performance? What about reliability? (these are all just hypothetical questions, but things you need to think through and that someone would need an idea of to make robust recommendations)
These are pretty well-defined issue areas in data, so I’d do some research before jumping to solutioneering.
If something like Google Cloud Run or Cloud Functions meets your requirements, it would likely be more cost-effective because it can spin to zero and you don’t need to manage any of the infrastructure. My rule of thumb is to choose what problem I’m solving and have the lowest infrastructure footprint as possible. So favor the simplest solution. In this case, if you can get by with the limitations of Cloud Functions (which most can), use that. If you need things like custom Docker images or more in the way of persistent disk, use Cloud Run (technically these are in the same family now; it’s about how much customization you have, with the tradeoff of cost and complexity). If you need even more control but want to automate scaling, look for something like GKE Autopilot. Eliminate the management of what you can
You may find that all you need is a Pub/Sub queue and a Cloud function to automatically grab the new job, and can set your concurrency limits and call it a day without deploying or managing any infrastructure.
But there are many types of task queues, many times needs for dependencies between jobs, or different criteria you’ll need to choose your tooling. You need to get those baselines together before anyone can really help you effectively, because they can only understand your problem as well as you do.
The right solution also depends on where your data is hosted. If it’s Google Cloud Platform (e.g. BigQuery), my recommendations above stand. If the data is S3/Redshift, it may look more like AWS Lambda. If it’s not in the cloud, things get a bit trickier because Cloud-based IaaS and PaaS offerings will likely reduce your costs, but may have undesirable trade-offs in terms of performance because of the latency between those platforms and your data.
I love Kubernetes, but management of k8s is not for the faint of heart, and if you’re not already pretty deeply knowledgeable it can be a lot of work to maintain and secure (especially when there’s PII and compliance requirements involved). So don’t let that discourage you, but make sure you count the cost of your time.
I’ve literally watched at least a dozen engineers burn months of their time to create fragile systems that effectively re-produced things like Pub/Sub queues. Or to create a query task runner when their cloud data warehouse already had one built in.
Work smarter, not harder