Senior ML Platform Engineer
About kaiko
In cancer care, treatment decisions can take many days—but patients don’t have that time. One of the reasons for delays? Cancer patients' data is scattered across many places: doctor’s notes, medical imagery, genomics data. At kaiko, we are developing AI foundational models to bring this data together and integrate it into clinical workflows, enabling doctors to make faster, more effective treatments decisions.
We also collaborate closely with the leading Dutch cancer research institute (NKI) on multiple AI research projects and a joint clinical validation initiative. In 2025, we plan on expanding our partnerships to even more hospitals.
We raised significant long-term funding and have offices in Zurich and Amsterdam. Over the past year, our team has nearly doubled in size, now comprising 70+ people from 25 countries. Our multidisciplinary team brings expertise in LLM and foundational model development, data science, product management, compliance, growth, and operations.
About the role
We are seeking a highly skilled Senior ML Platform Engineer with a passion for building scalable ML platform and ensuring a high-availability experience to empower our AI research team in their daily work. You'll play a vital role in making our ambitious AI healthcare solutions a practical reality. This exciting role will be based in either The Netherlands or Switzerland.
Your responsibilities
- Design and build Kaiko’s multi-tenant machine learning platform, including our large-scale distributed training systems;
- Create robust distributed training and inference solutions for maximum computational efficiency;
- Implement and maintain workflows and tools (CI/CD, containerization, orchestration, monitoring, logging and alerting systems) for our large training runs;
- Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments;
- Ensure compliance with security best practices and industry standards.
Why kaiko
At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We’ve built a team of international experts where your work has direct impact. Here’s what we value:
- We act like owners: You’ll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work.
- We thrive on collaboration: You’ll have to approach disagreement with curiosity, build on common ground and create solutions together.
- We work with ambitious people: You’ll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients.
In addition, we offer:
- An attractive and competitive salary, a good pension plan and 25 vacation days per year.
- Great offsites and team events to strengthen the team and celebrate successes together.
- A EUR 1000 learning and development budget to help you grow.
- Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings.
- An annual commuting subsidy.
About you
- 3+ years of experience building production ML platform and systems;
- Experience building and optimizing latency and throughput of machine learning systems and GPU workloads;
- Hands-on experience with distributed training frameworks (e.g. Ray, Dask, PyTorch Lightning);
- Experience with at least one cloud platform (e.g. AWS, Azure or Google Cloud);
- Strong coding skills in at least one programming language (e.g. Python, Scala, Java, C++);
- Excellent problem-solving and communication skills;
- Self-motivated and able to work well in a fast-paced startup environment.
Nice to have:
- Track record of successfully scaling ML platform;
- Fundamentals of modern Deep Learning;
- Experience with CI/CD tools (e.g. GitLab CI/CD, Github Actions or CircleCI), containerization (e.g. Docker) and orchestration tools (e.g. Kubernetes, Helm, Kustomize);
- Knowledge of monitoring, logging, alerting and observability tools (e.g. Prometheus, Grafana, ELK Stack or Datadog);
- Familiarity with infrastructure-as-code tools (e.g. Terraform, CloudFormation or Pulumi);
- Understanding of networking, security, and system administration concepts;
- Experience of high-performance computing (HPC) systems and workload managers (Slurm).
This Senior ML Platform Engineer position is a full-time role. It is important for the applicant to be a resident in The Netherlands or Switzerland, have a valid work permit and preferably be within commutable distance from our offices in Amsterdam or Zürich. Given the nature of kaiko’s business and the fact that it deals with sensitive data, a Certificate of Conduct will be required upon finalizing the employment contract.
We are excited to gather a broad range of perspectives in our team, as we believe it will help us build better products to support a broader set of people. If you’re excited about us but don’t fit every single qualification, we still encourage you to apply: we’ve had incredible team members join us who didn’t check every box!
- Department
- Platform Engineering
- Locations
- Amsterdam (NKI-AvL), Zürich (Puls 5)
- Remote status
- Hybrid Remote
Senior ML Platform Engineer
Loading application form
Already working at Kaiko?
Let’s recruit together and find your next colleague.