Elastic - Site Reliability Engineer (Hosted Infra)
Requirements
• Experience building software with Golang. You are also comfortable reviewing others' code and offering constructive feedback. • Production experience operating large-scale cloud compute (hundreds of hosts or more) via automated workflows. • Deep experience with Linux systems — you are at home in the terminal debugging at the OS level. • Proficiency working with containerized workloads in production. • A customer-first, systems-thinking approach to operational problems — you care about root causes, not just symptoms. • Comfortable working across time zones in both real-time and asynchronous contexts. • You contribute clear and maintainable documentation such as software designs, runbooks, architecture diagrams/decisions, postmortems, etc. • You communicate project status regularly and clearly, flag blockers early, and follow through on action items. • A sensible approach to AI integration — identifying where AI tools genuinely reduce operational burden and embedding them into workflows without adding complexity. • Production experience with any of: Terraform, Puppet, Ansible, Argo CD, Argo Workflows, CUE, Docker, Kubernetes, Ubuntu, or Ubuntu Live Patch. • Experience being on-call during incidents and using observability tools (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues, quantify impact, and confirm mitigations. • Hands-on experience engineering solutions with the Elastic Stack. • Additional Information - We Take Care of Our People: • As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do. • We strive to have parity of benefits across regions, and while regulations differ from place to place, we believe taking care of our people is the right thing to do. • Competitive pay based on the work you do here and not your previous salary • Health coverage for you and your family in many locations • Ability to craft your calendar with flexible locations and schedules for many roles • Generous number of vacation days each year • Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service • Up to 40 hours each year to use toward volunteer projects you love • Embracing parenthood with a minimum of 16 weeks of parental leave • Security & Privacy Responsibilities: Take ownership of protecting the confidentiality, integrity, and availability of organizational data and systems by following applicable privacy and security policies, standards, and procedures. Ensure that all individual contributions follow Elastic’s Secure Software Development Framework (SSDF). Proactively participate in mandatory role-based training to ensure personal technical execution consistently aligns with the highest standards of data protection, data privacy, and system resilience.
Responsibilities
• Engineering software to automate large-scale systems — building internal tools and services, not just running scripts. • Optimizing the reliability and lifecycle of hosts across multiple cloud providers. • Strengthening our observability posture — crafting alerting and monitoring systems that drive incident prevention over incident response. • Scaling global infrastructure and evolving the infrastructure management processes to meet growing demand. • Contributing to code reviews, sharing your work, planning what we need to do next, and both mentoring and being mentored by teammates. • Being part of a balanced SRE on-call rotation: responding to incidents, improving runbooks, participating in postmortems, and championing reliability improvements.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT