wagey.ggwagey.ggv1.0-68eec7a-3-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Director of Engineering Role/IntusCare - Director of SRE (FTE)
Pro members applied to this job 36 hours before you saw itGet Pro ›
IntusCare

IntusCare - Director of SRE (FTE)

Remote - USA$175k - $200k3d ago
RemoteDirectorNACloud ComputingSoftwareDirector of EngineeringBashPythonAzureKubernetesPipeline Management

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• · Strong hands-on experience with cloud infrastructure, preferably Microsoft Azure, including AKS, networking, storage, IAM, and security services. • · Deep expertise in Kubernetes, containerized workloads, and production-scale distributed systems. • · Experience building and managing CI/CD pipelines using GitHub Actions, ArgoCD, Terraform, or similar DevOps tooling. • · Strong background in monitoring, logging, tracing, and observability platforms such as Grafana, Prometheus, Datadog, Splunk, or equivalent. • · Experience with scripting and automation using Python, Bash, PowerShell, or similar languages. • · Strong understanding of release engineering, automated testing frameworks, QA tooling, and shift-left quality practices. • · Experience supporting SaaS applications with uptime, scalability, and security requirements in regulated industries such as healthcare. • · Knowledge of HIPAA, SOC2, vulnerability management, access controls, and infrastructure security best practices. • · Familiarity with databases, APIs, networking, and troubleshooting across modern web application stacks. • · Exposure to AI-powered DevOps / AIOps tooling for incident management, automation, and engineering productivity is a plus. • · 12+ years of SRE, infrastructure, or platform engineering experience, with 5+ years of engineering leadership roles. • · Proven track record owning site reliability for complex, multi-tenant SaaS platforms with demanding availability requirements. • · Demonstrated experience defining SLA and SLO frameworks, error budgets, and incident management processes at scale. • · Experience managing vendor relationships for managed infrastructure or SRE services, including SLA governance and performance management. • · Track record leading QA or quality engineering functions, including test automation maturity and release gate ownership. • · Strong communication and cross-functional influence skills — able to represent reliability to both technical and non-technical audiences • Experience in healthcare technology, HIPAA-compliant environments, or other highly regulated SaaS industries. • Familiarity with FHIR-native or EMR/EHR platform architectures and their specific reliability requirements. • Experience implementing AI-assisted SRE automation including runbook generation, anomaly detection, or incident triage tooling. • Background working with Playwright or equivalent test automation frameworks in a QA leadership capacity. • Experience building internal SRE capability alongside a managed services provider • Work location: This is a fully remote role based in the United States. • Sponsorship: This position is not eligible for sponsorship.

Responsibilities

• · Own and execute the SRE strategy and multi-quarter roadmap across reliability, observability, incident management, QA maturity, and release engineering. • · Define, measure, and continuously improve SLAs, SLOs, error budgets, uptime, performance, and operational health metrics across all products and services. • · Lead production reliability for the full platform, including monitoring, alerting, on-call operations, incident response, root cause analysis, and MTTR reduction. • · Establish release readiness standards, deployment safety controls, and quality gates to ensure stable and predictable product releases. • · Manage external SRE vendors and partners, including service delivery, SLA governance, escalations, performance reviews, and compliance expectations. • · Lead QA engineering strategy with a focus on automation, regression prevention, test coverage, and reducing escaped defects in production. • · Partner with Security and Engineering leaders to ensure cloud infrastructure, CI/CD pipelines, and operational tooling meet HIPAA, SOC2, and internal security standards. • · Oversee core platform operations including Azure AKS environments, Kubernetes, GitOps workflows, CI/CD pipelines, GitHub Actions, secrets management, access controls, and audit readiness. • · Drive observability maturity using tools such as Grafana, Prometheus, logging platforms, tracing tools, and automated alerting frameworks. • · Collaborate with Product, Platform, and Engineering teams to embed reliability and quality best practices throughout the software development lifecycle. • · Build, mentor, and scale high-performing SRE and QA teams while fostering a culture of ownership, accountability, learning, and continuous improvement. • · Drive adoption of AI-enabled automation and intelligent tooling to reduce manual toil, improve productivity, and strengthen operational excellence.

Benefits

• Own and build the SRE function for a modern healthcare EMR platform serving PACE populations — from the ground up. • Lead a blended team model combining managed services, internal QA, and internal SRE in a high-growth engineering organization. • Work on systems where reliability directly impacts clinical care delivery for vulnerable patient populations. • Shape engineering culture in a company that actively embraces AI-assisted software development with Claude Code. • Fully remote, collaborative engineering environment with direct access to executive leadership.

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X