Junior Data ML Engineer
About kaiko
kaiko.ai is building a next-generation agentic clinical AI assistant that helps clinicians reason across patient data, guidelines, and diagnostics.
Healthcare decisions are rarely made by a single person or from a single data source. kaiko’s assistant maintains longitudinal patient context across encounters, clinicians, and institutions, enabling collaboration, second opinions, and complex diagnostic workflows. The system is designed to operate safely in real clinical environments, with human oversight, auditability, and regulatory alignment at its core.
Our assistant core supports broadly applicable clinical tasks such as patient data navigation, guideline interaction, multimodal interaction (chat and voice), and care coordination. On top of this foundation, we are developing specialized diagnostic agents in areas such as oncology, radiology, and pathology.
We build in close collaboration with leading hospitals and research centers, including the Netherlands Cancer Institute (NKI). kaiko is a well-funded company with a growing international team, operating from Zurich and Amsterdam.
About the role
kaiko’s Multimodal Large Language Model (MLLM) is trained on domain-specific, high-complexity medical data. To reach clinical-grade performance, we’ll need to ramp up our data efforts to manage massive scale, ensure consistent quality, and tightly control data relevance and integrity.
As a Junior Research Data ML Engineer, you will design and implement our data‑sourcing, synthetic‑generation, and curation pipelines. High‑quality datasets are the fuel for frontier‑scale language models, and you will play a pivotal role in producing them.
You will build high‑throughput data pipelines that:
Ingest multi‑modal data at petabyte scale.
Generate large volumes of synthetic data.
Filter & rate content by topic, quality, and policy compliance.
You will work closely together with ML researchers and help steer the development of our state‑of‑the‑art foundation models. You will be based in Zurich or Amsterdam, with the expectation of spending half of your time at the office.
About you
Strong programming skills in Python and familiarity with distributed frameworks such as Ray or Spark is a plus.
Experience contributing to ML research and associated data challenges, such as data cleaning, transformation and validation
Exposure to synthetic-data generation workflows or interest in working with LLM-related data pipeline.
Understanding of lakehouse paradigms (Delta, Iceberg) and columnar formats (Parquet, ORC).
Experience with core data‑processing primitives (hashing, deduplication, chunking etc.) and associated scalability/performance trade‑offs.
Strong communication skills and the ability to present experimental results and technical concepts clearly and concisely.
Nice To Have:
Experience using workflow orchestration tools such as Dagster or similar workflow engines.
Exposure to data‑quality & validation frameworks and monitoring/observability tooling.
Strong grasp of machine‑learning fundamentals (model architectures, training paradigms, evaluation metrics) to collaborate deeply with researchers and guide data‑driven choices.
We are excited to gather a broad range of perspectives in our team, as we believe it will help us build better products to support a broader set of people. If you’re excited about us but don’t fit every single qualification, we still encourage you to apply: we’ve had incredible team members join us who didn’t check every box!
Why kaiko
At kaiko, we believe the best ideas come from collaboration, ownership and ambition. We’ve built a team of international experts where your work has direct impact. Here’s what we value:
Ownership: You’ll have the autonomy to set your own goals, make critical decisions, and see the direct impact of your work.
Collaboration: You’ll have to approach disagreement with curiosity, build on common ground and create solutions together.
Ambition: You’ll be surrounded by people who set high standards for themselves and others, who see obstacles as opportunities, and who are relentless in their work to create better outcomes for patients.
In addition, we offer:
An attractive and competitive salary, a good pension plan and 25 vacation days per year.
Great offsites and team events to strengthen the team and celebrate successes together.
A EUR 1000 learning and development budget to help you grow.
Autonomy to do your work the way that works best for you, whether you have a kid or prefer early mornings.
An annual commuting subsidy.
Our interview process
Our interview process is designed to assess mutual fit across skills, motivation, and values. It typically includes the following steps:
Screening call: A short conversation to align on your motivation, career goals, and initial fit for the role.
Codility test: online coding assessment focused on core programming skills, problem-solving ability, and fundamental data structures and algorithms.
Codility review: follow-up discussion to review your submission.
Onsite technical interview: A in-depth discussion into your problem-solving approach through a technical challenge, case study, or role-specific scenario, and conversations with team members to assess collaboration dynamics, team fit, and day-to-day fit.
- Locations
- Amsterdam, Zürich (Puls 5)
- Remote status
- Hybrid
Already working at Kaiko?
Let’s recruit together and find your next colleague.