serverobotics - Reliability Operations Engineer (Mexico)
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 2–4 years of experience in Reliability Operations, Site Reliability Engineering, DevOps, IT Operations, or a related technical support function. • Experience participating in Tier 1 or Tier 2 investigations, including log review, basic triage, and structured escalation. • Exposure to operational environments supporting distributed or cloud-based systems. • Participation in incident response workflows and/or on-call rotations. • Proficiency with Linux, including navigating systems, reviewing logs, and performing basic diagnostics. • Experience using and contributing to runbooks and operational workflows. • Ability to interpret metrics, logs, and traces using tools such as Grafana/Prometheus, Google Cloud Monitoring, and OpenTelemetry. • Familiarity with cloud platforms, preferably Google Cloud Platform (GCP). • Ability to follow documented remediation steps, with good judgment around when to escalate. • Understanding of CI/CD pipelines and how application deployments affect runtime behavior. • Experience using Jira or similar ticketing systems. • Clear and effective communicator, especially when providing updates during time-sensitive operational issues. • Calm, organized approach to troubleshooting and prioritization. • Collaborative mindset, working effectively with senior operations engineers, product teams, and SREs.
Responsibilities
• Lead incident investigations during your region’s daytime hours, providing timely updates, escalating appropriately, and supporting senior engineers leading the response. • Respond to escalations from Tier 1 support using established runbooks, metrics, logs, and diagnostics to remediate issues or escalate to Tier 3 when needed. • Update runbooks and operational documentation based on new issues, discoveries, and feedback, ensuring clarity and consistency across all procedures. • Run existing automations and collaborate with senior team members to enhance tooling and scripts that streamline troubleshooting and remediation tasks • Use observability tools such as Grafana/Prometheus, GCP Monitoring, and OpenTelemetry to interpret metrics, logs, and traces, helping identify anomalies and validate system performance. • Provide concise, accurate updates during incidents, ensuring information reaches the correct engineering and SRE contacts and supporting structured incident coordination. • Participate in discussions around root causes, share operational insights, and contribute to process improvements that enhance system stability and supportability. • Participate in a shared weekend on-call rotation to help maintain operational coverage for production systems, responding to incidents and escalations as needed and coordinating with engineering teams when issues arise. • Proactively strengthen workflows, adopt best practices, and build the foundation of the Reliability Operations function as it evolves. • What Makes You Stand Out • Prior experience participating in high-severity incident response or supporting operational incidents. • Exposure to robot fleets, IoT systems, or other distributed physical device environments. • Ability to write or modify lightweight scripts and automations to improve operational workflows. • Familiarity with incident management platforms such as PagerDuty, OpsGenie, Jira Service Management, or Grafana IRM. • Experience contributing to the creation or improvement of operational runbooks and support documentation. • Strong networking fundamentals; familiarity with Tailscale or similar zero-trust networking tools is a plus. • Demonstrated ability to learn quickly and contribute to improving operational maturity within a team
Benefits
• MX$500K – MX$1M • The salary range listed in this posting is representative of the range of levels being considered for this position. Total compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. • or drag and drop here • Decline to self-identify • Hispanic or Latino - A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race. • Hispanic or Latino • White (Not Hispanic or Latino) - A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. • White • Black or African American (Not Hispanic or Latino) - A person having origins in any of the black racial groups of Africa. • Black or African American • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) - A person having origins in any of the peoples of Hawaii, Guam, Samoa, or other Pacific Islands. • Native Hawaiian or Other Pacific Islander • Asian (Not Hispanic or Latino) - A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian Subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. • Asian • American Indian or Alaska Native (Not Hispanic or Latino) - A person having origins in any of the original peoples of North and South America (including Central America), and who maintain tribal affiliation or community attachment. • American Indian or Alaska Native • Two or More Races (Not Hispanic or Latino) - All persons who identify with more than one of the above five races. • Two or More Races • Hispanic or Latino • White (Not Hispanic or Latino) • Black or African American (Not Hispanic or Latino) • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) • Asian (Not Hispanic or Latino) • American Indian or Alaska Native (Not Hispanic or Latino) • Two or More Races (Not Hispanic or Latino) • I identify as one or more of the classifications of protected veteran listed above • I am not a protected veteran
No credit card. Takes 10 seconds.