fluidstack - Network Engineer, Reliability & Observability
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• AI/HPC Fabric Operations: Experience operating AI/ML or HPC fabrics with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high-performance networking. You understand the operational precision required when network performance directly impacts workload completion. • Reliability Engineering: You have experience with observability and reliability engineering from network operations or in manufacturing quality. • Hardware Repair Experience: Hands-on experience coordinating hardware repairs, RMAs, and physical infrastructure work. You understand datacenter logistics, vendor escalation processes, and how to work effectively with onsite technicians. • Observability & Monitoring: Familiarity with network monitoring platforms, alerting systems, and telemetry collection. You've used monitoring tools to diagnose issues proactively and tune alerting to reduce noise. You have experience with SQL, MySQL, and building operations dashboards.
Benefits
• Competitive total compensation package (salary + equity). • Retirement or pension plan, in line with local norms. • Health, dental, and vision insurance. • Generous PTO policy, in line with local norms. • The base salary range for this position is $150,000 - $250,000 per year, depending on experience, skills, qualifications, and location. This range represents our good faith estimate of the compensation for this role at the time of posting. Total compensation may also include equity in the form of stock options. • We are committed to pay equity and transparency.
No credit card. Takes 10 seconds.