Job Description The Machine Learning (ML) Systems Engineer is a key architect of the platform that empowers our teams to build, deploy, and operate AI models at scale. You will design and build the core infrastructure, pipelines, and tools supporting everything from traditional ML to Large Language Models (LLMs). This is a high-impact software engineering role for those passionate about building robust, scalable systems that improve developer velocity and enable the effective application of AI across the organization. Key Responsibilities Build the Core Platform: Design, build, and operate our ML infrastructure on Kubernetes for model training and inference. Develop Force-Multiplier Tools: Create tooling to streamline the end-to-end ML/LLM lifecycle (e.g., experiment tracking, RAG systems, model observability). Drive Best Practices: Design and implement MLOps and AIOps principles to improve automation, reliability, and security. Collaborate and Enable: Work closely with Data Scientists and ML Engineers as your internal customers to understand their needs and accelerate their work. Personal Attributes we love to see: Pragmatism : While extensive knowledge of ML theory is highly valued, pragmatism wins over elaborate theory when it comes to shipping products that work. Collaboration : We believe data science is a team sport, and are after candidates who can communicate well, share knowledge, and be open to taking on ideas from anyone in the team. Having worked on shared code-bases in a commercial environment is a big plus, but it's the attitude that matters most. Technical Skills : A decent base of python and linux are key to a role in the team. Other than that, we're pretty flexible - we know tools are changing rapidly, and will continue to do so for many years to come. Experience with tools like Kubernetes, Helm, PyTorch, Terraform, Prometheus etc. are highly valued, but not mandatory. Attention to detail : Showing attention to detail when it counts is important.