Graphcore - Lead Engineer Support Linux Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Responsibilities
• Guide, mentor, and cultivate a team of Linux Engineering Support Engineers, defining clear roles, responsibilities, and methods of collaboration • Own and oversee support for Linux-based systems and engineering environments, ensuring stability, performance, and availability • Act as a point of contact for complex technical issues and outages, providing hands-on support when a customer concern arises • Diagnose and resolve high-impact system and interoperability issues across mixed and distributed environments • Perform hands-on investigation and troubleshooting to understand issues and drive effective solutions • Direct incident response efforts, encompassing triage, coordination, and resolution • Take responsibility for and lead Root Cause Analysis (RCA) processes, ensuring preventative improvements are identified and applied • Establish and improve incident management processes, driving operational maturity and reliability • Drive adoption of automation and configuration-as-code practices across Linux systems • Ensure system changes are delivered through controlled, auditable processes wherever possible • Oversee development and implementation of automation solutions for system management and operational tasks • Promote and support use of workflows based on Git and CI/CD pipelines for configuration and operational processes • Identify and prioritize opportunities to reduce manual effort through automation and improved tooling • Collaborate with engineering teams to assist development environments and system requirements • Act as a senior technical liaison between engineering teams and infrastructure/platform functions • Support onboarding of new systems, services, and environments using standardized and automated approaches • Ensure system configurations stay consistent and aligned with established standards and governance • Oversee integration points (e.g. identity, CI/CD, tooling) and ensure issues are resolved effectively • Identify and drive improvements in system performance, scalability, and maintainability • Contribute to and enforce documentation, standards, and operational guidelines • Ensure systems meet audit, compliance, and governance requirements, with full traceability of changes • Candidate Profile • Extensive experience managing and maintaining Linux-based systems in complex technical or engineering environments • Strong troubleshooting skills across operating systems, networking, storage, and application layers • Demonstrated ability to identify and solve intricate technical problems, including within diverse or distributed settings • Demonstrated experience managing significant incidents and outages, including directing resolution efforts and participating in Root Cause Analysis (RCA) • Extensive background in automation and scripting (e.g., Bash, Python, or similar) • Extensive background in configuration management or infrastructure-as-code tools (e.g., Ansible, Terraform, Puppet, or similar) • Experience working with configuration-as-code practices and workflows managed through Git • Experience building, managing, or assisting with CI/CD pipelines for configuration and operational processes • Strong understanding of system interoperability across distributed environments • Experience working within defined standards, governance frameworks, and controlled processes • Strong communication skills and ability to collaborate closely with engineering, platform, and infrastructure teams • Experience mentoring or supporting the development of other engineers • Capability to work efficiently across different time zones within a dispersed organization • Demonstrated capability to work autonomously, establish goals, and achieve results • Desirable • Experience managing or coordinating incident response activities • Experience working alongside DevOps, platform, or infrastructure engineering teams • Experience with monitoring, observability, and logging systems • Experience supporting AI/ML or high-performance computing environments • Understanding of identity and access management concepts • Experience building or scaling operational processes or support functions • Experience managing and maintaining Linux-based systems in a technical or engineering environment • We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
No credit card. Takes 10 seconds.