certifyos - Senior Site Reliability Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• RELIABILITY ENGINEERING FUNDAMENTALS • 5+ years in SRE, DevOps, Platform Engineering, or Infrastructure Engineering — operating production systems at scale where your infrastructure is someone else’s dependency and failures have real downstream consequences • Track record of improving reliability end-to-end: you’ve debugged hard production problems, made them not happen again, and built the alerting to prove it • Strong Linux systems administration, incident response, and root cause analysis skills • Comfort influencing operational standards and mentoring teams on reliability practices • CLOUD INFRASTRUCTURE & PLATFORM ENGINEERING • Deep hands-on experience with GCP — GKE, Cloud Run, and containerized workloads at scale • Experience building and maintaining Infrastructure as Code with Terraform and/or Pulumi • Fluency across deployment patterns and the judgment to know when each fits: rolling deployments, blue/green, canary — and the rollback story for each • Experience with autoscaling, resource optimization, and infrastructure efficiency for distributed systems • Experience managing infrastructure security, secrets, and access controls in regulated or security-conscious environments • OBSERVABILITY & OPERATIONAL EXCELLENCE • Strong understanding of Golden Signals monitoring — latency, traffic, errors, saturation — and how to make them actionable rather than noisy • Experience designing SLIs, SLOs, error budgets, alerting strategies, dashboards, and escalation workflows • Hands-on experience with observability platforms: Google Cloud Monitoring, Datadog, Grafana, Prometheus, or similar • Strong sense of data platform health: lineage, freshness, and correctness matter as much to you as throughput • AUTOMATION & SOFTWARE DELIVERY • Experience building and maintaining CI/CD pipelines using GitHub Actions or similar • Scripting or programming fluency in Python, Bash, Go, or similar — you reduce toil through code, not process • Experience working with Git workflows and modern software delivery practices • COMMUNICATION & COMPLIANCE • Strong written and verbal communication — you can explain an operational risk to an engineer and a product manager in the same conversation • Experience operating systems handling sensitive data or PII in regulated or compliance-adjacent environments • Experience operating large-scale distributed systems or microservices architectures • Familiarity with healthcare, credentialing, or health-tech environments • Experience leveraging AI-assisted observability or incident response tooling • Familiarity with NodeJS, TypeScript, Java, or React application stacks • TECHNOLOGIES & TOOLS • GCP (GKE, Cloud Run, BigQuery, Cloud Monitoring) · Terraform / Pulumi · Docker / Kubernetes · GitHub Actions / Cloud Build · Prometheus / Grafana / Datadog · Python / Bash / Go · Sentry · Snyk · SonarQube · Jira / Slack
Benefits
• At Certify, we’re building with intention and taking care of the people doing the work. • Your well-being matters to us. We provide 100% coverage of health, dental, and vision insurance premiums for employees. Our US-based team benefits from unlimited PTO, with at least two weeks off each year to recharge. In India, employees are supported with health insurance, statutory leave benefits, and additional wellness (menstrual) leave for women.
No credit card. Takes 10 seconds.