Zeta Global - Principal DevOps Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 10+ years of progressive experience in DevOps, SRE, Platform Engineering, or Infrastructure Engineering roles, with demonstrated impact at staff or principal level. • Expert-level Kubernetes knowledge, including cluster administration, Helm chart authoring, custom controllers/operators, network policies, RBAC, and multi-cluster management on AWS EKS. • Deep expertise in CI/CD pipeline architecture and advanced deployment strategies (canary, blue/green, progressive delivery, feature flag integration) at scale. • Strong proficiency with Infrastructure as Code using Terraform, including module design, state management, and multi-environment orchestration. • Expert knowledge of Docker containerization, including multi-stage builds, security hardening, image optimization, and container runtime management. • Production experience with Apache Kafka, including cluster management, topic design, consumer group strategies, and operational monitoring for high-throughput streaming workloads. • Strong networking fundamentals: DNS (Route 53, internal DNS), TCP/IP, routing, API Gateway, load balancing (ALB/NLB), service mesh, VPC peering, transit gateways, and network troubleshooting. • Extensive AWS experience spanning EKS, EC2, SQS, DynamoDB, IAM, VPC, CloudWatch, and related services in production environments. • Hands-on experience with observability platforms: Grafana (dashboards, alerting), Prometheus (metrics, PromQL), Loki (log aggregation), and Honeycomb (distributed tracing, BubbleUp analysis). • Working familiarity with multiple language stacks including Node.js, React, Python, Java, and Ruby, sufficient to understand build systems, dependency management, and runtime characteristics. • Experience operating within regulated environments, with practical knowledge of GDPR, CCPA, SOC 2, and compliance automation in MarTech or AdTech domains. • Proven ability to influence engineering culture, drive adoption of new practices, and communicate complex technical strategies clearly to both technical and non-technical stakeholders. • Demonstrated experience with GitLab CI/CD pipelines, including advanced pipeline features such as parent-child pipelines, dynamic environments, and security scanning integration. • AWS certifications: Solutions Architect Professional, DevOps Engineer Professional, or Security Specialty. • Experience with Statsig or similar feature flag and experimentation platforms for progressive delivery and A/B testing in production. • Background in chaos engineering tools and practices (Gremlin, Litmus, Chaos Monkey) for proactive resilience validation. • Experience building internal developer platforms (IDPs) or platform-as-a-product organizations. • Familiarity with FinOps practices and cloud cost optimization strategies. • Contributions to open-source DevOps/SRE tools or active participation in the broader infrastructure community. • Experience with service mesh technologies (Istio, Linkerd) for advanced traffic management and security.
Responsibilities
• CI/CD & Deployment Excellence • Design, build, and operate production-grade CI/CD pipelines enabling multiple developers on multiple teams to deploy concurrently to production, multiple times daily, with zero-downtime guarantees. • Implement and optimize advanced deployment strategies including canary releases, blue/green deployments, rolling updates, incremental rollouts, and feature flag-gated releases via Statsig. • Build self-service deployment tooling that empowers developers to own their release process while enforcing safety guardrails, automated rollback triggers, and automate compliance gates. • Establish deployment observability with real-time canary analysis, automated health scoring, and progressive delivery metrics integrated with Grafana, Prometheus, and Honeycomb. • Champion CI/CD workflows using GitLab CI/CD, Helm charts, and Terraform to ensure infrastructure and application deployments are version-controlled, auditable, and reproducible. • Platform Reliability & SRE • Define and enforce SLOs/SLIs/SLAs across services, establishing error budgets that balance velocity with reliability. • Lead incident response processes, including on-call rotations, runbook development, blameless postmortems, and incident command structure. • Design and implement robust observability stacks leveraging Grafana, Prometheus, Loki, and Honeycomb for metrics, logging, tracing, and alerting at scale. • Proactively identify and eliminate reliability risks through chaos engineering, load testing, capacity planning, and failure mode analysis. • Reduce operational toil through automation, self-healing infrastructure patterns, and intelligent alerting to minimize mean time to detection (MTTD) and recovery (MTTR). • Infrastructure & Architecture • Architect and manage Kubernetes clusters on AWS EKS at scale, including multi-cluster strategies, namespace governance, resource optimization, network policies, and security hardening. • Manage and optimize AWS infrastructure spanning EC2, SQS, DynamoDB, and related services with Infrastructure as Code (Terraform) best practices. • Design and operate Kafka-based event streaming infrastructure for high-throughput, low-latency data pipelines supporting real-time marketing and analytics workloads. • Ensure robust networking across the platform, including DNS management, service mesh configuration, load balancing, TCP/IP optimization, routing policies, and VPC architecture. • Manage containerization strategy using Docker, ensuring efficient image builds, vulnerability scanning, registry management, and runtime security. • Support data infrastructure operations across Snowflake, MySQL, and other database platforms, collaborating with data engineering teams on reliability and performance. • Compliance, Security & Governance • Embed compliance controls directly into CI/CD pipelines, ensuring automated enforcement of GDPR, CCPA, and SOC 2 requirements at every stage of the software delivery lifecycle. • Implement audit trails, change management controls, and deployment approval workflows required by regulatory frameworks in the MarTech and AdTech domains. • Collaborate with Security and Legal teams to ensure infrastructure and deployment processes meet global compliance obligations across all operating regions. • Maintain awareness of evolving privacy regulations (ePrivacy, state-level US laws, international data residency requirements) and proactively adapt infrastructure accordingly. • Technical Leadership & Influence • Serve as a technical leader and DevOps disruptor, challenging legacy processes and introducing modern practices that dramatically improve developer velocity and operational safety. • Influence software architecture decisions to simplify and streamline operational management, advocating for patterns that are deployment-friendly, observable, and resilient by design. • Clearly communicate complex technical strategies to engineering leadership, product stakeholders, and cross-functional teams to build alignment and drive adoption. • Develop reference architectures, internal standards, and golden path templates that codify best practices and accelerate onboarding of new services and teams. • Participate in on-call rotations and lead by example in incident response, demonstrating the operational discipline expected across the engineering organization.
Benefits
• Excellent medical, dental, and vision coverage • Employee Equity • Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!! • The salary range for this role is $180,000 - $210,000, depending on location and experience. • PEOPLE & CULTURE AT ZETA • Zeta considers applicants for employment without regard to, and does not discriminate on the basis of an individual’s sex, race, color, religion, age, disability, status as a veteran, or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation, gender identity or expression. • We’re committed to building a workplace culture of trust and belonging, so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate, support and advocate for one another. Learn more about our commitment to diversity, equity and inclusion here: https://zetaglobal.com/blog/a-look-into-zetas-ergs/ • ZETA IN THE NEWS! • ZETA IN THE NEWS! • https://zetaglobal.com/press/?cat=press-releases
No credit card. Takes 10 seconds.