HighLevel - SRE / Devops Engineer - Platform Team (Infrastructure)
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 4+ years of experience operating large-scale systems • Experience with GCP or other public cloud platforms • Experience with Kubernetes (GKE) in production • Ability to identify systemic issues and propose long-term fixes • Experience leading incident response or reliability initiatives • Strong understanding of reliability, security, and operational best practices • Comfortable working in on-call and incident response environments • Strong troubleshooting and communication skills • Experience supporting or operating production systems • Comfortable mentoring junior engineers and influencing peers • Familiarity with Cloudflare, networking, or edge security • Exposure to security tooling or vulnerability management • Scripting or automation experience (Python, Go, Bash, etc.) • Experience in compliance- or audit-driven environments (SOC2, ISO)
Responsibilities
• You will work closely with Cloud Infrastructure, Platform Engineering, Data Infrastructure, and Security teams to ensure systems are stable, resilient, and secure. This is a hands-on role with a strong operational and security mindset, critical to HighLevel’s platform maturity. • Production Operations & Reliability: • > Participate in 24/7 on-call rotations for core infrastructure systems • > Execute incident response during production events, including triage, mitigation, and recovery • > Maintain and improve runbooks, operational procedures, and escalation paths • > Help reduce MTTR and prevent repeat incidents through engineering solutions • Infrastructure Reliability Engineering: • >Improve reliability of core infrastructure components including: Kubernetes (GKE) clusters, Cloud networking and load balancing & Edge services (Cloudflare) • > Identify systemic reliability issues and drive corrective actions • > Support capacity planning, scaling, and resilience testing • Security Operations & Remediation: • > Execute security remediations across cloud and Kubernetes environments • > Support enforcement of: IAM least-privilege access, Network security controls & Runtime security policies • > Partner with Platform Security on vulnerability management and remediation • > Support security incident response and post-incident reviews • Automation & Tooling: • > Automate repetitive operational and security tasks • > Build tooling to improve:Incident response speed, Operational visibility & Security posture enforcement • > Reduce manual toil through scripts, tooling, and process improvements • Change Management & Governance: • > Support safe execution of infrastructure and configuration changes • > Ensure changes follow defined change management and audit requirements • > Contribute to incident reviews, postmortems, and continuous improvement initiatives • Collaboration & Growth: • > Work closely with Cloud Infrastructure, SRE, Platform, Data, and Security teams • > Contribute to shared documentation and operational standards • > Mentor junior engineers and lead small reliability or security initiatives
Similar Jobs
No credit card. Takes 10 seconds.