ML Platform Engineer (Evergreen)
About kaiko
Delivering high quality cancer care is complex; specialists form a view of each patient's condition by reasoning across different data - CT scans, genomics context, treatment history and clinical notes.
Current AI are powerful within domains but fall short when it comes to reasoning across data or domain areas. kaiko.w, our AI assistant for oncology, aims to equip every clinician with a full understanding of their patients, helping them to reason across data as they assess each case.
We’re building this in close collaboration with the Netherlands Cancer Institute (NKI) and a growing network of hospitals and research centers. We’ve raised significant long-term funding and have nearly doubled our team over the past year. We’re now 80+ people representing 25 nationalities, based across our offices in Zurich and Amsterdam
About the role
We’re continuously looking for exceptional ML platform engineers who push the limits of the compute fabric powering frontier multimodal AI. Our stack spans GPU-dense clusters, hybrid cloud/on-prem systems, and high-throughput pipelines for state-of-the-art model training and deployment.
This is not a fixed role, but an opportunity to define and evolve the backbone of our AI infrastructure.
You will be based in either The Netherlands or Switzerland, with the expectation of spending at least 50% of your time at the office.
Some areas of responsibility
- You’ll architect, scale, and evolve the infrastructure that makes ML development fast, reliable, and observable - from IaC to CI/CD to container orchestration.
- You’ll advance our compute orchestration layer - scaling GPU and CPU workloads across heterogeneous clusters using Kubernetes and modern scheduling strategies.
- You’ll design hybrid and multi-cloud strategies that balance performance, compliance, and cost - enabling elastic scaling for ML research and production.
- You’ll build automation and validation pipelines to ensure every training environment - from NVIDIA firmware to Python packages - is version-aligned, reproducible, and optimized for stability and performance.
We’re building the backbone that powers the next generation of multimodal models in oncology and beyond.
If you thrive on large-scale distributed systems, elegant automation, and high-impact infrastructure - we’d love to hear from you, anytime.
About you
You bring deep expertise in one or more of these domains and curiosity for the rest:
- Compute orchestration: Kubernetes, SLURM, Helm, Terraform, and container ecosystems.
- Automation & deployment: CI/CD, ArgoCD, GitHub Actions, or comparable tools.
- Systems programming: Python, Go, Rust, or similar languages.
- Observability: Prometheus, Grafana, Loki, or related stacks.
- ML infrastructure: Ray, MLFlow, NVIDIA GPU training stack (CUDA, cuDNN, InfiniBand)
- Data orchestration: Airflow, Dagster, Prefect, Flyte.
- Distributed storage systems: VAST, Weka, Hammerspace, or object stores like S3.
You’ll thrive here if you:
- Love designing for scale, reproducibility, and scientific velocity.
- Think in systems, not tickets.
- Enjoy the tension between “move fast” and “build right”.
- Are comfortable bridging infrastructure, ML engineering, and distributed systems.
We are excited to gather a broad range of perspectives in our team, as we believe it will help us build better products to support a broader set of people. If you’re excited about us but don’t fit every single qualification, we still encourage you to apply: we’ve had incredible team members join us who didn’t check every box!
Why kaiko 
At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We’ve built a team of international experts where your work has direct impact. Here’s what we value: 
- Ownership: You’ll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work.
- Collaboration: You’ll have to approach disagreement with curiosity, build on common ground and create solutions together.
- Ambition: You’ll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients.
  
In addition, we offer: 
- An attractive and competitive salary, a good pension plan and 25 vacation days per year.
- Great offsites and team events to strengthen the team and celebrate successes together.
- A EUR 1000 learning and development budget to help you grow.
- Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings.
- An annual commuting subsidy.
- Department
- ML Ops
- Role
- Data Engineering
- Locations
- Zürich (Puls 5), Amsterdam
- Remote status
- Hybrid
Already working at Kaiko?
Let’s recruit together and find your next colleague.
