About the role We are looking for an Infrastructure Engineer with experience operating Kubernetes at scale, CICD and automation of internal ML and Data Engineering applications, to join our growing infrastructure team. In this role, you will work on managing and expanding our globally distributed Kubernetes infrastructure running on bare-metal Linux across multiple data centres. You will work at the forefront of high-performance infrastructure, supporting petabyte-scale systems and enabling advanced MLOps capabilities. This includes contributing to MLops Pipelines, data engineering, system resilience, and the integration of machine learning services. You’ll report to the AI Infrastructure Lead and work closely with senior engineers across the infrastructure and AI platform teams. Responsibilities, Duties and Expectations Contribute to the architecture, implementation, and operations of a multi-node, distributed Kubernetes cluster. Assist in building and scaling the physical infrastructure of our on-premises systems, including GPU worker banks, storage arrays, and compute nodes. Collaborate with the team to build and continuously improve MLOps infrastructure and tooling. Solve complex problems involving distributed systems, high-volume data processing, and real-time service integration. Maintain and optimise hybrid infrastructure environments spanning on-premises and cloud systems. Qualifications, Experience and Skills Degree in Computer Science, Software Engineering, or similar technical field, or equivalent experience. 5 years’ experience as an infrastructure or platforms engineer, or SRE, or equivalent technical role in production environments. Experience with Kubernetes, including operating clusters on Linux. CKA/CKS certifications highly desirable. Advanced knowledge of Linux internals, including L1/L2 networking, system performance, process monitoring, and shell scripting. Experience with server assembly and hardware integration is also desirable. Familiarity with CI/CD automation, ideally using GitLab. Strong experience with Infrastructure as Code (IaC). Familiarity with configuration management tools like Ansible, Puppet, or Chef. Solid understanding of networking and security principles Who You Are A systems thinker with a deep understanding of scalable infrastructure. Passionate about automation, performance, and reliability. Comfortable working with both development and operations teams to bridge technical gaps. Curious, self-motivated, and always looking for ways to improve systems and practices. Excited by working on infrastructure at the intersection of AI, defence, and distributed systems. Note for recruitment agencies: We do not accept unsolicited candidates from external recruiters unless specifically instructed.