wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Site Reliability Engineer Role(218)/RapidSOS (10) - Senior Site Reliability Engineer
RapidSOS

RapidSOS - Senior Site Reliability Engineer

Remote - New York (Remote) or Boston (Remote)$160k - $195k+ Equity2mo ago
RemoteSeniorNACloud ComputingSite Reliability EngineerAWSPythonKubernetesKafkaTerraform

Requirements

• 5+ years of professional engineering experience with deep expertise in Python • Real cloud infrastructure experience with AWS: networking, managed databases, cost implications of traffic routing decisions, IAM, DNS-based routing and failover • Hands-on kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate, you can read events, understand resource limits, know when to drain vs. delete a node, and understand the tradeoffs between orchestration models • Strong understanding of distributed systems and how they fail, including resource exhaustion, replication lag, queue backpressure, and other common failure modes • Experience operating high-throughput messaging systems (RabbitMQ, Kafka, AWS SNS / SQS, etc.)  and the infrastructure around them, including infrastructure-as-code (e.g., Terraform) and CI/CD pipelines, with an emphasis on improving reliability and scalability • Experience building or improving observability through logging, metrics, and alerting • Demonstrable experience in using AI to safely and securely enhance velocity, improve reliability and recoverability of services • Strong communication and interpersonal skills; is a team player with a positive attitude • Highly self-motivated; ability to adapt and learn quickly in a fast-paced environment with a strong sense of ownership • Strong proficiency in coding best practices – ability to write clean, maintainable, and testable code • Demonstrated expertise in problem solving – comfortable working across both infrastructure and application layers to diagnose and resolve issues at the source • Ability and willingness to collaborate in-person a few times per quarter, or as needed • Nice-to-have experience (but not required!): • Experience supporting production systems in an on-call or similar capacity where reliability matters • Experience with observability and GitOps tooling; hands-on with Datadog (APM, alerting), Elasticsearch/OpenSearch, and ArgoCD-based GitOps deployments; comfortable modernizing legacy CI/CD pipelines (e.g., Concourse, Jenkins) toward cloud-native approaches

Responsibilities

• Own performance and reliability outcomes: Ownership of how application-level decisions create system-level impact, including connection pooling, database architecture, traffic routing patterns, and memory allocation. Collaboration with engineering teams that own specific domains, partnering directly to improve reliability and performance across their systems. • Own performance and reliability outcomes: • Design for system resilience: Responsibility for strengthening reliability through proactive design decisions, including safer deployment patterns, failover strategies, and redundancy approaches that improve system behavior under stress. • Design for system resilience: • Build observability into system behavior: Proactively instrument services with structured logging, metrics, and alerting so systems are easier to understand and debug. The focus is on creating clear signals from production behavior before issues escalate. • Build observability into system behavior: • Own incidents from signal to resolution: Ownership of production issues from first signal through resolution, including investigation across infrastructure and application layers, root cause identification, and implementation of fixes that restore stability and strengthen system behavior long term. • Own incidents from signal to resolution: • Work across the stack without a permission slip: You’ll work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code. When issues come up, you don’t wait for a handoff—ownership is taken directly and driven through to resolution. • Work across the stack without a permission slip:

Benefits

• The chance to work with a passionate team on solving one of the largest challenges globally • Competitive salary and benefits and equity participation • A dynamic, flexible and fun start-up work environment with a highly talented team • If you're curious to learn more about RapidSOS, you can check out https://rapidsos.com/blog/ • If you're curious to learn more about RapidSOS, you can check out • https://rapidsos.com/blog/ • Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $160,000 - $195,000. This role will also be eligible to receive equity options. #LI-Remote • RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status. • Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

Parallel DomainParallel Domain - Senior Site Reliability Engineer1mo ago
·Remote - Pacific Northwest Area·$145k - $185k/year + Equity
RemoteNASeniorCloud ComputingArtificial IntelligenceSite Reliability EngineerTerraformAWSKubernetesHelmBash
Stack AVStack AV - Senior Site Reliability Engineer4d ago
·Remote - Pittsburgh, PA or Remote
RemoteNASeniorCloud ComputingGovernmentSite Reliability EngineerBashPythonLinuxGCPAWSTerraformKubernetesPrometheusIstio
synthesiasynthesia - Senior Site Reliability Enigneer1w ago
·Remote - USA
RemoteNASeniorCloud ComputingSite Reliability EngineerTeam ManagementAWSKubernetesMongoDBPythonTemporal
vibevibe - Senior Site Reliability Engineer1mo ago
·Remote - USA *·Equity
RemoteNASeniorCloud ComputingStreamingSite Reliability EngineerGoPythonTerraform
havocaihavocai - Senior Site Reliability Engineer2w ago
·Remote - USA *
RemoteNASeniorCloud ComputingRoboticsSite Reliability EngineerGoLinuxKubernetesPythonAWSPrometheusTerraformELKGrafanaDatadogPulumiObservableChange Management
HiiveHiive - Site Reliability Engineer2mo ago
·Vancouver, BC, Canada·$42k - $42k/year + Equity
In OfficeNACloud ComputingSite Reliability EngineerElixirKubernetesTerraformAWSVercel
closeclose - Site Reliability Engineer (USA Only - 100% Remote)2mo ago
·Remote - ET (Eastern)·$140k - $210k/year + Equity
RemoteNAMidCloud ComputingSite Reliability EngineerAWSTerraformKubernetesMongoDBPostgreSQL
deepgramdeepgram - Site Reliability Engineer - AI & ML Infrastructure (Kubernetes, AWS & Terraform)3mo ago
·Remote, California, United States - Hybrid·$150k - $220k/year
In OfficeNAInternCloud ComputingArtificial IntelligenceSite Reliability EngineerGoBashPythonKubernetesAWS
Chainlink LabsChainlink Labs - Site Reliability Engineer II3mo ago
·Remote - Canada, United States, Brazil...
RemoteNAMidCryptocurrencyCloud ComputingSite Reliability EngineerGoShellPythonKubernetesTerraform

Browse more by category

Show 218 moreSite Reliability EngineerShow 3,747 moreAWSShow 6,205 morePythonShow 1,860 moreKubernetesShow 527 moreKafkaShow 1,150 moreTerraform
Privacy·Terms··Contact·FAQ·Wagey on X