wagey.ggwagey.ggv1.0-55c2ce9-10-Apr
Browse Tech JobsCompaniesFeaturesPricing
Log InGet Started Free
Jobs/Solutions Architect Role/Lavendo - HPC Solutions Architect
Lavendo

Lavendo - HPC Solutions Architect

San Francisco, California, United States - Hybrid$225k - $315k+ Equity1mo ago
In OfficePrincipalNASolutions ArchitectBashPythonLinuxKubernetescontainerd

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• A Bachelor’s or Master’s in Computer Science, Engineering, or a related field (PhD is a plus). • 3+ years actually building or running HPC or large GPU clusters—on‑prem, cloud, or hybrid. You’ve owned outcomes, not just submitted jobs. • 3+ years • Strong Linux background, plus Kubernetes and container runtimes (containerd, CRI‑O, Docker) in real environments, with CI/CD in the loop. • A solid handle on HPC networking and RDMA: InfiniBand, RoCE, NVLink/NVSwitch. You understand why topology and fabric design matter, and you’ve seen what happens when they’re wrong. • Experience with storage and I/O for big workloads: Ceph, Lustre, NFS at scale, GPUDirect Storage, or similar systems where throughput, latency, and contention actually matter. • Comfort with Terraform, Ansible, Helm, and GitOps‑style workflows to keep configurations reproducible and sane. • Good scripting skills in Python or Bash; you’re happy to automate checks, glue systems together, or prototype tooling. • You write and speak clearly, can lead a design review without losing the room, and can keep both engineers and non‑technical stakeholders on the same page. • Legal authorization to work in the U.S. on a full-time basis without visa sponsorship. • Hands‑on with the NVIDIA ecosystem: GPU Operator, MIG, DCGM, NCCL, Nsight, and managing CUDA stacks across production clusters. • Experience with MLflow, Kubeflow, NeMo, or similar for AI/ML pipelines, or with distributed training frameworks like PyTorch DDP, DeepSpeed, or Megatron. • Time spent with Slurm, LSF, PBS, or similar on real clusters, not just in a lab. • Experience with multi‑tenant GPU environments or “AI training farms.” • Familiarity with observability stacks for HPC: Prometheus, DCGM Exporter, Grafana, and NGC tools. • Any open‑source work in HPC, CUDA, or Kubernetes is a strong plus. • Who This Role Suits • You like understanding a workload deeply, then designing a cluster and config that fits it like a glove. • You’re comfortable saying, “This is fast, but we can make it faster—and here’s how,” and then proving it with numbers. • You enjoy working directly with customers and partners, but you still want to stay close to the technology. • You prefer a low‑ego, high‑ownership environment where people care more about doing the right thing than about title.

Benefits

• Estimated OTE $225K – $315K • Offers Equity • Offers Bonus • Medical, dental, vision; 401(k); PTO; Mobile & internet stipend • Upload your resume here to autofill key application fields. • Drop your resume here! • Parsing your resume. Autofilling key fields... • or drag and drop here • Please provide your country/city of tax residency. This information is required for proper tax withholding and compliance. • Our client for this position cannot offer visa sponsorship at this time and can only consider candidates who are currently fully authorized to work in the U.S., e.g. U.S. citizens or Green Card holders. • What aspects of your background and experience do you believe make you a strong fit for this role? • Decline to self-identify • Hispanic or Latino - A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race. • Hispanic or Latino • White (Not Hispanic or Latino) - A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. • White • Black or African American (Not Hispanic or Latino) - A person having origins in any of the black racial groups of Africa. • Black or African American • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) - A person having origins in any of the peoples of Hawaii, Guam, Samoa, or other Pacific Islands. • Native Hawaiian or Other Pacific Islander • Asian (Not Hispanic or Latino) - A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian Subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. • Asian • American Indian or Alaska Native (Not Hispanic or Latino) - A person having origins in any of the original peoples of North and South America (including Central America), and who maintain tribal affiliation or community attachment. • American Indian or Alaska Native • Two or More Races (Not Hispanic or Latino) - All persons who identify with more than one of the above five races. • Two or More Races • Hispanic or Latino • White (Not Hispanic or Latino) • Black or African American (Not Hispanic or Latino) • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) • Asian (Not Hispanic or Latino) • American Indian or Alaska Native (Not Hispanic or Latino) • Two or More Races (Not Hispanic or Latino) • I identify as one or more of the classifications of protected veteran listed above • I am not a protected veteran

Similar Jobs

CprimeCprime - Senior Cloud EngineerYesterday
·Hyderabad, India
In OfficeAPACSeniorCloud ComputingCloud EngineerAssociateBashPythonTerraformJenkinsGovernanceELKPrometheusGrafanaMentoringAWSDockerKubernetes
ParloaParloa - Forward Deployed Engineer, DevOpsYesterday
·Remote - Germany
RemoteEMEASeniorCloud ComputingDevOps EngineerSupport EngineerGoBashPythonAWSGCPAzureKubernetesDockerHelmJenkinsTerraformTalent Acquisition
PerkPerk - Senior DevOps EngineerYesterday
·Remote - London, United Kingdom
RemoteEMEASeniorCloud ComputingSenior DevOps EngineerSoftware EngineerAWSAzureDockerKubernetesLinuxBashPythonAnsible
go-nimblygo-nimbly - RevOps Solutions Architect (Remote; US or LATAM)Yesterday
·Remote - Americas·$140k - $160k/year
RemoteNAPrincipalSolutions ArchitectAmbassadorCoachingGoMentoring
go-nimblygo-nimbly - RevOps Architect (Gong - Remote; US)Yesterday
·Remote - Anywhere·$140k - $160k/year
RemoteNAPrincipalSolutions ArchitectAmbassadorProspectingHubSpotReportingSalesforceSnowflakeSlackSales ForecastingCoachingGongGoMentoring
Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact
Loading...