Oowlish Technology - Senior Site Reliability Engineer (SRE)
Requirements
• 5+ years of professional experience in Site Reliability Engineering, DevOps, or Production Engineering roles. • Strong understanding of Site Reliability Engineering principles and best practices. • Experience supporting and operating production systems at scale. • Strong knowledge of monitoring, observability, and reliability engineering concepts. • Experience working in cloud-based environments. • Strong troubleshooting and problem-solving skills. • Experience working with distributed systems and modern application architectures. • Proven Site Reliability Engineering experience. • Service Level Objectives (SLOs) • Service Level Indicators (SLIs) • Experience leading or actively participating in Incident Command and Incident Response processes. • Experience designing and implementing observability strategies. • Distributed Tracing • Experience improving system reliability, availability, and operational excellence. • Experience supporting mission-critical production environments. • Experience with cloud platforms (AWS preferred). • Strong automation mindset. • Experience conducting root cause analysis and postmortems. • Terraform or Infrastructure as Code experience. • Experience with containerized environments. • Experience with distributed microservices architectures. • Experience with performance engineering. • Experience mentoring engineers on reliability practices. • Experience working in highly regulated or high-availability environments.
Responsibilities
• Design, implement, and improve Site Reliability Engineering practices across production environments. • Define, manage, and continuously improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets. • Lead and participate in incident response and incident command processes. • Build and evolve observability strategies, including monitoring, logging, alerting, and distributed tracing. • Improve system reliability, availability, scalability, and operational efficiency. • Partner with engineering teams to improve application performance and production readiness. • Develop automation solutions that reduce operational overhead and improve reliability. • Participate in root cause analysis and post-incident reviews. • Drive continuous improvement initiatives based on operational insights and incident learnings. • Help establish reliability best practices across teams and services.
Benefits
• Competitive compensation based on experience; • Career plans to allow for extensive growth in the company; • International Projects; • Oowlish English Program (Technical and Conversational); • Oowlish Fitness with Total Pass; • Games and Competitions; • You can also apply here: • Website: https://www.oowlish.com/work-with-us/ • LinkedIn: https://www.linkedin.com/company/oowlish/jobs/ • Instagram: https://www.instagram.com/oowlishtechnology/ • We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses and identifying potential inconsistencies or verification signals in application materials based on available information. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT