DevOps Team Lead
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 7+ years of experience in cloud infrastructure and DevOps, with 3+ years in a technical leadership role • Proven track record of building and leading high-performing infrastructure teams • Strong experience with AWS, GCP and Azure • Deep expertise in Kubernetes, including multi-cluster management, GPU workload optimization, resource scheduling and autoscaling, and network policies and security • Extensive experience with cloud networking, including VPC design, load balancer configuration, network security and segmentation, and cross-cloud networking solutions • Strong CI/CD expertise, preferably with GitHub Actions • Proficiency in Terraform • Proficiency with GitOps tools (ArgoCD preferred) • Experience with monitoring and observability tools • Experience with FinOps practices and cloud cost optimization • Excellent communication skills with ability to translate technical concepts for diverse audiences • Experience with ML workflow tooling (MLflow, Kubeflow, or similar) • Experience with FastAPI and backend applications • Familiarity with data platforms like Databricks or Snowflake • SRE practices experience or cloud security certifications • Hands-on experience with Prometheus, Grafana, or Datadog • Experience scaling infrastructure for AI/ML startups
Responsibilities
• Lead and mentor a team of DevOps engineers, fostering technical growth and collaboration • Define and drive the infrastructure roadmap aligned with company objectives • Architect and oversee cloud infrastructure design and implementation • Establish best practices, standards, and processes for infrastructure development and operations • Partner with Engineering, Research, and FDE to align infrastructure capabilities with business needs • Drive the evolution of Kubernetes clusters optimized for GPU workloads, Production SaaS hosting and and varied enterprise deployment models • Champion GitOps practices using ArgoCD for continuous deployment • Establish infrastructure as code standards using Terraform • Define monitoring and observability strategy for distributed systems • Collaborate with ML engineers to optimize infrastructure for model training and serving • Own infrastructure reliability, performance, and security posture • Implement and maintain cost optimization strategies (FinOps) for cloud resources
Benefits
• Competitive compensation with salary and equity • Comprehensive health coverage, including medical, dental, vision, and 401K • Fertility support, as well as paid parental leave for all new parents, inclusive of adoptive and surrogate journeys • Relocation support for employees moving to join the team in one of our office locations • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action
Similar Jobs
No credit card. Takes 10 seconds.