vultr - Staff AI/ML Infrastructure Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 5+ years experience working with bare metal infrastructure and hardware automation • Hands-on experience with modern NVIDIA/AMD GPU platforms and high-performance networking (RoCE, InfiniBand) • Deep knowledge of BIOS, BMC, firmware, NICs, Redfish/IPMI, and PCIe systems • Strong Linux systems experience including device drivers and package management • Experience building infrastructure automation using Python and Bash • Familiarity with GPU drivers, firmware ecosystems, and vendor collaboration • Experience designing and delivering complex infrastructure products • Proven ability to lead projects and mentor engineers • Experience optimizing multi-cluster GPU environments • Exposure to Machine Learning software stacks and GPU workloads
Responsibilities
• Design and maintain GPU and bare metal infrastructure in containerized and physical environments • Build scalable GPU clusters in partnership with networking and provisioning teams • Ensure reliable, high-performance provisioning of GPU infrastructure • Develop automated testing systems for GPU-based platforms • Implement infrastructure solutions for diverse AI/ML workloads • Benchmark, test, and troubleshoot GPU performance at scale • Collaborate with hardware vendors on drivers, firmware, and support • Resolve hardware, software, and performance issues across environments • Optimize rail and cluster performance across architectures • Lead technical direction and mentor engineers on infrastructure best practices
Benefits
• $145,000 - $160,000 • This salary can vary based on location, years of experience, background and skill set. • INCLUSION & PRIVACY
No credit card. Takes 10 seconds.