wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/DevOps Engineer Role(213)/Menlo (13) - DevOps Engineer
Menlo

Menlo - DevOps Engineer

San Francisco, California, United States - Hybrid1w ago
In OfficeNACloud ComputingArtificial IntelligenceDevOps EngineerKubernetesDockerGoogle GKEAzureJenkinsLokiPrometheusGrafanaLinuxRHELUbuntuTerraformPulumiPostgreSQLKafkaRedisAWSGCPPlaneGoogle Workspace

Requirements

• Kubernetes -- deep, hands-on. Strong production experience with Kubernetes, fluent in workloads and controllers, networking (Services, Ingress, CNI basics), storage (PV/PVC, CSI), RBAC, and the autoscaling story end-to-end (HPA, VPA, Cluster Autoscaler, KEDA). Cloud-managed Kubernetes (GKE, EKS, AKS) is fine; on-premises / self-managed Kubernetes (kubeadm, Cluster API, k3s, etc.) is a strong plus. • Networking -- design-level, not just operator-level. You have designed real network topologies at some point in your career -- hub-and-spoke, multi-AZ / multi-VPC, or an equivalent enterprise pattern -- and can defend the tradeoffs. Comfortable with VPCs, firewalls, load balancers, private cluster architecture, DNS, and routing. On-premises networking experience (VLANs, BGP, L2/L3 fabrics, pfSense / Fortinet / Palo Alto / Cisco) is a strong plus. • CI/CD and Docker -- concepts over tooling. You can build and optimize Dockerfiles (multi-stage builds, layer caching, small/secure base images) and have owned full CI/CD pipelines end-to-end. Tooling is flexible -- GitHub Actions, GitLab CI, Azure Pipelines, Jenkins, Argo Workflows, etc. -- but you should be able to clearly articulate the full lifecycle of a typical pipeline, and explain how CI/CD changes when the deployment target is Kubernetes (ArgoCD / FluxCD, GitOps patterns, progressive delivery). • Observability -- you have built this before. You have stood up a full observability stack from scratch and operated it in production -- metrics, logs, traces, alerting, on-call. Familiarity with the Grafana stack (Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus) is a strong plus. Bonus points if you have experimented with agent-assisted SRE workflows or LLM-driven incident triage. • SSO and identity. When you bring a new tool into the platform, your instinct is to wire it into a central IdP rather than leave it on local accounts. Comfortable with OpenID Connect, SAML, and traditional directory services (LDAP / Active Directory), and you have integrated tools with an IdP like Keycloak, Okta, Azure AD, or equivalent. • Linux and automation fundamentals. Strong Linux proficiency (RHEL/Ubuntu or equivalent) including basic performance and networking debugging. Comfort with infrastructure-as-code (Terraform / Terragrunt / Pulumi or equivalent) and configuration management. • Ownership mindset. Comfortable operating in a high-ownership environment where you make architecture decisions, push them to production, and own the outcomes. • Optional but valuable: hands-on experience operating any of Kafka, Redis, PostgreSQL, OpenSearch -- at production scale, including HA, backup/restore, and upgrade planning. • Experience with OpenStack in production: Nova, Neutron, Cinder, Trove, Horizon, and CLI administration. • Experience with KVM virtualization and storage backends like Ceph or Rook-Ceph on Kubernetes. • Familiarity with vLLM internals: PagedAttention, continuous batching, tensor parallelism. • Background in AI/ML infrastructure or GPU cluster operations at scale. • Experience with KEDA or event-driven autoscaling patterns in anger. • Prior open-source contributions to Kubernetes, OpenStack, or adjacent projects. • Kernel-level Linux debugging and performance tuning.

Responsibilities

• Operate and evolve our Kubernetes platform across multiple clusters and environments (Prod, Dev, hybrid on-prem and public cloud), covering control plane operations, node lifecycle, upgrades, and autoscaling at every layer (Cluster Autoscaler, HPA, KEDA). • Architect and manage hybrid cloud infrastructure spanning on-premises and public clouds (GCP, AWS), including workload placement, cross-cloud networking, and unified resource management. • Own the CI/CD and GitOps experience end-to-end: container build pipelines, image optimization, and progressive delivery via ArgoCD / FluxCD. • Own the observability stack as a single pane of glass across all clusters: Grafana, Mimir, Tempo, Loki, Pyroscope, OnCall, Prometheus -- and help push toward agent-assisted SRE workflows. • Manage and improve our inference platform: vLLM serving and AIBrix for multi-model orchestration and autoscaling across a fleet of NVIDIA GPUs. • Operate platform services: Kafka, Redis, PostgreSQL, OpenSearch. • Manage identity and access via Keycloak integrated with Google Workspace; harden SSO, RBAC, and secrets management across the platform. • Harden network security across private load balancers, firewalls, and VPC segmentation; design and maintain hub-and-spoke / multi-AZ topologies. • Support training infrastructure: self-service VM provisioning, RunPod burst capacity, Weights and Biases integration. • Drive infrastructure reliability, cost efficiency, and capacity planning as the platform scales.

Benefits

• Most infrastructure teams manage someone else's cloud. At Menlo, you own the metal. Menlo Cloud is a first-class investment built from the ground up, and it sits at the center of everything we do, from coding agents to humanoid robots. You will have genuine ownership over a platform that is technically ambitious, cost-conscious by design, and critical to the mission. If you want to build infrastructure that actually matters and have the autonomy to do it right, this is the place. • A Note on AI • You don't need deep AI expertise for every role, but we do expect everyone at Menlo to be intellectually curious, drawn to tinkering and discovery, and excited to use AI as a real collaborator in their work. For some roles, AI fluency is a core requirement. When that's the case, we'll say so explicitly in the qualifications. People who thrive here don't treat AI as a novelty. They use it to think better, and make their work easier for others to build on. • Equal Opportunity and Accommodations

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

Teza TechnologiesTeza Technologies - DevOps Engineer1w ago
·Yerevan, Armenia, Hybrid, USA *
In OfficeNAFintechCloud ComputingDevOps EngineerLinuxDockerKubernetesPythonGCPAWSAzureTerraformAnsiblePulumi
ArtisanArtisan - Staff Devops Engineer4mo ago
·Remote - USA·Equity
RemoteNAStaffCloud ComputingDevOps EngineerKubernetesAWSTerraformPrometheusPulumi
Pickle Robot CompanyPickle Robot Company - DevOps Engineer2w ago
·Remote - Charlestown, MA·$110k - $130k/year
RemoteNASeniorCloud ComputingArtificial IntelligenceDevOps EngineerC++UbuntuPythonDebianTerraformGCPPrometheusGrafanaMLOpsDocumentation
RapidFort, Inc.RapidFort, Inc. - DevSecOps Engineer2w ago
·Remote - USA·$140k - $175k/year + Equity
RemoteNAMidBankingCloud ComputingDevOps EngineerKubernetesTeam ManagementHelmIstioJenkinsAWSAzurePrometheusGrafanaDatadogDocumentation
RapidFort, Inc.RapidFort, Inc. - DevOpsSec Engineer3w ago
·Remote - USA·$110k - $140k/year
RemoteNAMidBankingCloud ComputingDevOps EngineerKubernetesTeam ManagementHelmIstioJenkinsAWSAzurePrometheusGrafanaDatadogDocumentation
SGNL.aiSGNL.ai - DevOps Engineer2mo ago
·USA - Hybrid·Equity
In OfficeNAMidCloud ComputingDevOps EngineerAWSGCPAzureJenkinsGo
FundamentalFundamental - DevOps Engineer3mo ago
·United States·Equity
In OfficeNASeniorCloud ComputingSoftwareDevOps EngineerAWSGCPKubernetesTerraformMLflow
truemltrueml - Sr. DevOps Engineer2w ago
·Remote - USA
RemoteNASeniorCloud ComputingDevOps EngineerGoBashJenkinsPythonAWSKubernetesDockerTerraformDatadog
StriveworksStriveworks - DevOps Engineer (Active Secret Clearance)1mo ago
·Fort Carson, Colorado, United States - Hybrid·$140k - $165k/year + Equity
In OfficeNAMidCloud ComputingPublic SectorDevOps EngineerBashRHELPythonLinuxUbuntu

Browse more by category

Show 213 moreDevOps EngineerShow 1,860 moreKubernetesShow 1,051 moreDockerShow 35 moreGoogle GKEShow 1,615 moreAzureShow 226 moreJenkinsShow 45 moreLokiShow 263 morePrometheusShow 298 moreGrafanaShow 962 moreLinux
Privacy·Terms··Contact·FAQ·Wagey on X