Backblaze External Website - Strategic Ops Engineer III
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Strong expertise in Incident, Problem, and Change Management (ITIL or similar frameworks). • Proven experience in governing and optimizing operational processes. • AI & Data Expertise: Strong knowledge of AI/ML concepts, including anomaly detection, predictive analytics, and data modeling. • AIOps Experience: Hands-on experience with AIOps platforms or building AI-driven operational solutions (event correlation, alert prioritization). • ITIL certification (Foundation or higher). • Proficiency with platforms such as Jira, SNOW, FireHydrant, Moogsoft, etc. • Experience working in high-availability, large-scale environments. • Key Competencies: • Key Competencies: • Positive Attitude! • Strong analytical and problem-solving skills. • Process-oriented mindset with a focus on governance and continuous improvement. • Excellent stakeholder communication and leadership skills. • Ability to drive change across cross-functional teams. • At this point, we hope you're feeling excited about the job description you're reading. Even if you don't meet every requirement, we still encourage you to apply. Learning, developing, and growing are key parts of our culture. We're eager to meet people who believe in our mission and can contribute to our team in various ways. We want people to feel comfortable expressing their true selves and to come, stay, and do their best work here.To provide greater transparency to candidates, we share base pay ranges for all US-based job postings regardless of state. We set standard base pay ranges for all roles based on function, level, and country location, benchmarked against similar-stage growth companies. Final offer amounts are determined by multiple factors, including candidate location, skills, depth of work experience, and relevant licenses/credentials, and may vary from the amounts listed below. • The expected salary range for this role is $123,000 - $175,000.
Responsibilities
• Incident Management • Incident Management • Available to Lead and govern the end-to-end incident management lifecycle, including detection, triage, escalation, and resolution. • Drive major incident management (MIM) processes and communications. • Improve MTTR (Mean Time to Resolution) through automation and process optimization. • Establish and maintain incident response playbooks and runbooks. • Problem Management • Problem Management • Maintain and improve intelligent heatmaps leveraging AI/ML to identify recurring technical themes and prioritize long-term remediation. • Implement trend analysis and proactive problem identification using observability data and AI. • Track and manage problem records to closure. • Change Management • Change Management • Govern change management processes (lead the CAB), ensuring safe, compliant, and low-risk deployments. • Define and enforce change policies, risk assessments, and approval workflows. • Drive continuous improvement in release and deployment practices. • Observability & Service Reliability • Maintain a strong understanding of system architecture and monitoring strategies, identifying gaps and opportunities for improvement. • Partner with engineering teams to improve system resilience and performance. • Reduce alert fatigue by improving signal-to-noise ratio in monitoring systems. • AI-Driven Operations (AIOps) • Leverage AI/ML for anomaly detection, predictive alerting, and automated root cause analysis. • Implement AI-driven solutions to optimize incident response and operational workflows. • Analyze large-scale operational data to identify patterns and recommend improvements. • Experience with AIOps platforms or building AI-driven operational solutions.
No credit card. Takes 10 seconds.