thought-machine - Senior Site Reliability Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• You have a track record of delivering high-impact projects with focus on long-term scalability, ensuring that human intervention scales sub-linearly with usage growth. • You possess an up-to-date understanding of design patterns relevant to hosting and networking architectures. • You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges. • You’re a high-agency individual who can independently drive projects to completion by effectively scaling your individual output with the appropriate delegation of work to team members. • You have a strong background working in either Python, Golang or Java, having used one of these programming languages to execute a significantly sized project or initiative. • You have experience working with Kubernetes or other container orchestration systems. • You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible. • You have expertise in one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure. • You have extensive experience working with either GCP or AWS. • We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
Responsibilities
• Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews. • Executing various department strategies - contributing to the design and scoping work for team members around disaster recovery, backup, redundancy and capacity planning activities. • Being part of a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments. • Regular maintenance of production systems that host Vault products. • Driving the evolution of our SaaS products by defining and designing features that foster exceptional reliability and an unparalleled user experience. • Implementing and regularly testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform. • Maintain and promote high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations, • Working with your Manager in growing team members in their technical skills as well as their understanding of Vault Products.
Benefits
• Pension plan (match up to 5%) • Life insurance - three times annual salary • Competitive maternity (six months fully paid) and paternity leave (four weeks fully paid) • 25 days holiday and bank holidays • Flexible working hours • Cycle-to-work scheme • Electric car scheme • Season ticket loan • Access to outstanding learning materials and courses • Sports and hobby clubs, subsidised by Thought Machine • All the latest tech you need • Start the day properly with fresh fruit and cereals • Huge range of healthy (and not-so-healthy) snacks, smoothies and drinks • A talented and experienced team as your colleagues • An environment where we encourage learning and progress • Two charity days a year • Weekly food pop-up • We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.
No credit card. Takes 10 seconds.