Altos Labs - Staff AI Platform Engineer
Requirements
• B.S. in Computer Science, or related quantitative field, or equivalent technical experience • 5+ years of experience in a Cloud Infrastructure, DevSecOps, or systems-focused platform software engineering role • 3+ years experience designing microservices architectures and deploying/maintaining production-grade Kubernetes (k8s) infrastructure • 2+ years experience utilizing AI-assisted software development methods • Deep experience architecting and operating cloud solutions on AWS utilizing Infrastructure as Code (IaC) practices (Terraform, CDK, etc.) and orchestration technologies such as Python, Ruby, or Bash • Experience prototyping agentic AI concepts including identity and memory management • Proven track record of building infrastructure for GenAI/LLM systems, AI agents, and high-volume event driven workloads • Experience with monitoring and observability tools, including FinOps best practices • Excited to design, implement, and evangelize computing standards and culture across scientific and technical functions • Enjoys building scalable, secure, and robust systems • Strong communication, collaboration, and presentation skills, including sharing technical topics with non-technical audiences • Experience supporting the development of scientific software in a life sciences environment • Experience building platforms to accelerate the efforts of software, data, and machine learning engineers • Experience mentoring junior software engineers • Experience writing automated tests across different levels of the test pyramid • Cambridge UK salary ranges £97,300 to £128,000 • Exact compensation may vary based on skills, experience, and location. • For UK applicants, before submitting your application: • Please click here to read the Altos Labs EU and UK Applicant Privacy Notice (bit.ly/eu_uk_privacy_notice)- This Privacy Notice is not a contract, express or implied and it does not set terms or conditions of employment. • Equal Opportunity Employment • We value collaboration and scientific excellence. • We believe that a culture of belonging are foundational to scientific innovation and inquiry. At Altos Labs, exceptional scientists and industry leaders from around the world work together to advance a shared mission. Our intentional focus is on Belonging, so that all employees know that they are valued for their unique perspectives. We are all accountable for sustaining an inclusive environment.
Responsibilities
• The Altos Cloud Platform Team is seeking an experienced platform software engineer to drive the design, development, and maintenance of the Altos Agentic AI Platform. You will be part of the team responsible for building and managing agentic and autonomous compute systems used by Altos scientists to conduct cutting-edge research. Additionally, you will work closely with software engineers, researchers, and IT teams to build and support systems that are scalable, reliable, secure, and efficient. • Implement platform technologies to create, deploy, and monitor AI-enabled scientific applications for internal and external use • Implement network and security controls, including authentication and authorization across dozens of AI-enabled and agentic applications • Design, deploy, and maintain highly resilient, production-grade Kubernetes (k8s) clusters • Implement and fine-tune dynamic compute optimization for just-in-time, node-level autoscaling for compute-heavy and streaming workloads • Utilize service mesh management for complex traffic routing, enforcing mutual TLS (mTLS), handling circuit breaking, and ensuring secure, observable service-to-service communication. • Implement and maintain high-performance MCP and LLM proxies supporting fleets of hundreds of agentic tools • Understand user needs across a wide range of scientific disciplines and build systems that scientists can use productively to create AI-enabled solutions • Optimize system performance and utilization for scientific applications and underlying infrastructure • Work with stakeholders to improve cloud and AI financial operations, driving changes for efficient allocation of resources and cost management • Improve reliability of systems via automation, testing, and continuous integration and delivery • Continuously improve and maintain documentation for our development and deployment processes
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT