wagey.ggwagey.ggv1.0-e93b95d-4-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Site Reliability Engineer Role/havocai - Senior Site Reliability Engineer
havocai

havocai - Senior Site Reliability Engineer

United States$150k - $185k1mo ago
RemoteSeniorNACloud ComputingRoboticsSite Reliability EngineerGoPythonLinuxKubernetesAWS

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• 7+ years of experience in SRE, infrastructure, or systems engineering roles. • Strong experience operating large-scale distributed production systems. • Deep understanding of Linux systems, networking, and distributed systems fundamentals. • Hands-on experience with Kubernetes and container orchestration. • Programming or scripting experience in Go, Python, or similar languages. • Experience designing and operating observability systems for production environments. • Proven ability to lead incident response and reliability improvements. • Strong communication skills and ability to collaborate across engineering teams. • Must be a US Citizen. • Must be Eligible to obtain a Government Clearance - if required. • Experience supporting autonomy, robotics, simulation, or real-time systems. • Familiarity with AWS and large-scale cloud infrastructure. • Experience with chaos engineering, fault injection, or resilience testing. • Knowledge of CI/CD systems and progressive delivery practices. • Experience working in high-reliability or safety-critical environments.

Responsibilities

• RELIABILITY ENGINEERING & ARCHITECTURE • Design and evolve reliability architecture for distributed and cloud-hosted systems. • Define and implement SRE best practices, including SLIs, SLOs, error budgets, and capacity planning. • Partner with platform and application teams to design systems for reliability, scalability, and operability. • Identify and mitigate systemic reliability risks across infrastructure and services. • OPERATIONS & INCIDENT MANAGEMENT • Lead incident response processes including on-call rotations, escalation, and post-incident reviews. • Conduct root cause analysis for complex production incidents and drive long-term improvements. • Improve operational readiness through runbooks, automation, and resilience testing. • Reduce operational toil through tooling, automation, and process improvements. • OBSERVABILITY & PERFORMANCE • Design and maintain observability systems for metrics, logging, tracing, and alerting. • Ensure services and data pipelines are observable, debuggable, and performant in production. • Drive performance analysis and tuning across infrastructure and service layers. • AUTOMATION & PLATFORM COLLABORATION • Build automation to improve system reliability, deployment safety, and recovery processes. • Partner with DevOps and Cloud Platform teams on CI/CD reliability, rollout strategies, and safe deployment patterns. • Support and improve Kubernetes-based environments and containerized workloads. • SECURITY & RESILIENCE • Collaborate with security teams to ensure secure and resilient system design. • Participate in disaster recovery planning and testing. • Maintain strong operational practices around access control, secrets management, and change management.

Benefits

• $150K – $185K • Offers Equity • Offers Bonus • Our openings span more than one career level. The provided salary depends on many factors, such as work experience and transferable skills, business needs and impact, and market demands. • Upload your resume here to autofill key application fields. • Drop your resume here! • Parsing your resume. Autofilling key fields... • or drag and drop here • We can only hire US Citizens due to our specific industry and government contracts. • Decline to self-identify • Hispanic or Latino - A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race. • Hispanic or Latino • White (Not Hispanic or Latino) - A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. • White • Black or African American (Not Hispanic or Latino) - A person having origins in any of the black racial groups of Africa. • Black or African American • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) - A person having origins in any of the peoples of Hawaii, Guam, Samoa, or other Pacific Islands. • Native Hawaiian or Other Pacific Islander • Asian (Not Hispanic or Latino) - A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian Subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. • Asian • American Indian or Alaska Native (Not Hispanic or Latino) - A person having origins in any of the original peoples of North and South America (including Central America), and who maintain tribal affiliation or community attachment. • American Indian or Alaska Native • Two or More Races (Not Hispanic or Latino) - All persons who identify with more than one of the above five races. • Two or More Races • Hispanic or Latino • White (Not Hispanic or Latino) • Black or African American (Not Hispanic or Latino) • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) • Asian (Not Hispanic or Latino) • American Indian or Alaska Native (Not Hispanic or Latino) • Two or More Races (Not Hispanic or Latino) • I identify as one or more of the classifications of protected veteran listed above • I am not a protected veteran

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X