Heidi - Senior Site Reliability Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Strong mid-level SREs ready to take on more ownership; senior SREs who enjoy being hands-on in operations. Experience of at least 3 years required with a preference for those having over 6+ years experience. • Comfortable debugging live systems under pressure, implying the need for skills that can handle high-stress situations effectively without compromising system integrity or patient care quality. • Working knowledge of Kubernetes and containerized workloads is explicitly stated as required. This implies a requirement to manage and improve cloud infrastructure related to containers specifically on AWS platform preferred but not limited, which suggests experience with Amazon Web Services (AWS) would be beneficial though it's not mandatory based sole0n the job posting provided. • Experience supporting production systems and participating in on-call rotations is explicitly stated as required; this implies a requirement for hands-on operational work within live environments, likely involving incident response or system reliability tasks which are part of SRE roles but not directly mentioned herein. • Infrastructure as Code experience with Terraform (or similar tools) implied by the mentioning "Infrastructure as Code" without specifying a particular tool suggests that familiarity and proficiency in infrastructure management through code is required, though it's important to note this isn’t explicitly stated. • Experience operating cloud infrastructure with AWS preferred but not limited indicates experience managing services on the Amazon Web Services platform would be beneficial for the role; however, as per job posting provided, there are no explicit requirements stating that only AWS knowledge is required or a necessity to have it based solely upon this information. • Experience in writing and maintaining runbooks suggests familiarity with operational documentation practices but isn't explicitly stated herein within the context of experience needed for this role as per job posting provided, though implied by its mention without stating explicit requirements regarding prior knowledge or skills necessary to perform such tasks based solely upon information given. • Experience in improving deployments and rollback mechanisms is required but not directly mentioned; however it's inferred from the need for safe change improvement which implies a requirement of experience with deployment processes, though this isn’t explicitly stated herein within context to role as per job posting provided based solely upon information given. • Experience in participating in blameless post-mortems is required but not directly mentioned; however it's inferred from the need for contributing to operational practices which implies a requirement of experience with retrospective analysis and improvement processes, though this isn’t explicitly stated herein within context to role as per job posting provided based solely upon information given. • Experience in collaborating closely with engineers is required but not directly mentioned; however it's inferred from the need for collaboration on operational practices
Responsibilities
• Participate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end. • Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements. • Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases. • Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals. • Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer. • Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change. • Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time. • Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.
Benefits
• Real product momentum. We’re not trying to generate interest, we’re channeling it. • Real product momentum. • Equity from day one. When Heidi wins, you win. You’ll share directly in the success you help create. • Equity from day one. • Unmatched impact. Play a pivotal role in defining and scaling customer success at a critical growth moment - all while working on a product that delivers tangible value to clinicians and patients every day. • Unmatched impact. • Work alongside world-class talent. Join a team of operators and builders who’ve scaled unicorns. • Work alongside world-class talent. • Global reach. Help shape our international expansion as we bring Heidi to key international markets. • Global reach • Growth and balance. Enjoy a personal development budget, work from anywhere for a month, dedicated wellness days, and your birthday off to recharge. • Growth and balance. • Flexibility that works. A hybrid environment, with 3 days in the office. • Flexibility that works. • Heidi’s commitment to Diversity, Equity and Inclusion
No credit card. Takes 10 seconds.