serverobotics - Sr. Reliability Operations Engineer (Mexico)
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Bachelor’s degree in Computer Science, Information Technology, Engineering, or equivalent practical experience. • 5+ years of professional experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function. • Demonstrated experience owning or participating in Tier 2 or Tier 3 technical investigations, including triage, log analysis, and structured escalation. • Experience supporting distributed systems, cloud-hosted services, or production operational environments. • Hands-on experience participating in incident response processes. • Strong proficiency with Linux, including navigating systems, reviewing logs, and performing diagnostics. • Experience writing, executing, and maintaining runbooks, automations, and operational workflows. • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry. • Familiarity with modern cloud environments, preferably Google Cloud Platform (GCP), including basic debugging, permissions, and service-level triage. • Ability to investigate and remediate issues following documented procedures, escalating effectively when needed. • Understanding of CI/CD pipelines, deployed application behavior, and operational dependencies across microservices. • Proficiency with Jira or similar platforms for ticketing and structured incident tracking. • Exceptional communication skills, especially during high-pressure incidents where clear, concise updates are critical. • Calm and methodical approach to troubleshooting, prioritization, and decision-making. • Strong collaboration skills when coordinating with product engineering, SRE, and global support teams. • High level of ownership, reliability, and accountability when handling operational • What Makes You Stand Out • Experience acting as an incident commander or primary incident response lead for high-severity events. • Hands-on experience with robot fleets, IoT devices, or edge systems operations. • Experience building lightweight tools, scripts, or internal automations to increase operational efficiency. • Familiarity with incident management tools such as PagerDuty, OpsGenie, Jira Service Management, or Grafana IRM. • Background creating or improving operational documentation, runbooks, or support processes at scale. • Ability to coach and mentor others, and to uplift operational maturity within a region or team. • Strong networking fundamentals, including experience diagnosing connectivity issues across distributed systems. Familiarity with Tailscale or similar zero-trust networking tools is a major plus.
Responsibilities
• Serve as the primary incident lead during your region’s daytime hours, coordinating technical investigations, centralizing communication, and engaging the appropriate engineering and SRE teams when escalation is required. • Respond to escalations from Tier 1 support, using runbooks, metrics, logs, and system diagnostics to investigate and remediate issues or determine when escalation to Tier 3 is necessary. • Develop and update runbooks, workflows, and operational documentation to ensure consistent and reliable responses to recurring issues, collaborating with product teams to expand coverage over time. • Write, maintain, and enhance automation scripts and tools that streamline common remediation steps, improve response times, and reduce manual operational overhead. • Use metrics, logs, and tracing tools (Grafana/Prometheus, GCP Monitoring, OpenTelemetry) to proactively identify problems, validate system behavior, and support continuous improvement of detection mechanisms. • Act as the central point of communication during active incidents, ensuring timely updates and clear routing to the correct product engineering and SRE stakeholders. • Collaborate with reliability and product teams to share insights, recommend improvements, and help refine processes that enhance the stability and operability of our systems. • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise. • Help establish operational best practices, refine workflows, and prepare the foundation for a broader reliability operations function.
Benefits
• MX$757,214 – MX$1,400,000 • The salary range listed in this posting is representative of the range of levels being considered for this position. Total compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. • or drag and drop here • Decline to self-identify • Hispanic or Latino - A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race. • Hispanic or Latino • White (Not Hispanic or Latino) - A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. • White • Black or African American (Not Hispanic or Latino) - A person having origins in any of the black racial groups of Africa. • Black or African American • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) - A person having origins in any of the peoples of Hawaii, Guam, Samoa, or other Pacific Islands. • Native Hawaiian or Other Pacific Islander • Asian (Not Hispanic or Latino) - A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian Subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. • Asian • American Indian or Alaska Native (Not Hispanic or Latino) - A person having origins in any of the original peoples of North and South America (including Central America), and who maintain tribal affiliation or community attachment. • American Indian or Alaska Native • Two or More Races (Not Hispanic or Latino) - All persons who identify with more than one of the above five races. • Two or More Races • Hispanic or Latino • White (Not Hispanic or Latino) • Black or African American (Not Hispanic or Latino) • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) • Asian (Not Hispanic or Latino) • American Indian or Alaska Native (Not Hispanic or Latino) • Two or More Races (Not Hispanic or Latino) • I identify as one or more of the classifications of protected veteran listed above • I am not a protected veteran
Similar Jobs
No credit card. Takes 10 seconds.