NexGen Cloud - HPC Cluster Architect
Requirements
• Experience with Spectrum-X or next-generation Ethernet fabrics • Prior involvement in large-scale cluster deployments (1,000+ GPUs) and performance benchmarking (NCCL, MLPerf) • Exposure to both air-cooled and liquid-cooled HPC environments, and/or automation/infrastructure-as-code
Responsibilities
• Rather than a long checklist, here’s what success in this role looks like: • Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments — from customer requirement through rack layouts, BOM, power and cooling design, to production handover • Design high-performance network fabrics across compute (InfiniBand, RDMA, NVLink/NVSwitch), storage, and WAN — defining topology, oversubscription models, and scaling strategies • Engage directly with OEMs and vendors — validating hardware configurations, reviewing quotes, and ensuring designs are both technically sound and commercially optimised • Provide technical oversight during deployment and bring-up — supporting hardware validation, performance testing, and acting as escalation point for complex integration issues • Act as a senior technical leader across Solutions Architecture, Cloud Engineering, and data centre partners — contributing to standardised reference designs and building out the HPC engineering function • We’re more interested in how you think and work than in a perfect CV. You’ll likely bring a combination of the following: • how you think and work • Essential • Deep hands-on knowledge of NVIDIA GPU platforms (H100/H200/B-series) and NVIDIA reference architectures • Strong InfiniBand/RDMA design experience — topology, performance tuning, and high-performance Ethernet fabrics • Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server-level performance considerations • Background from an OEM, hyperscaler, neo-cloud, or enterprise/research HPC environment — with demonstrable exposure to the full design-to-deployment lifecycle • Confident engaging with customers, vendors, OEMs, and internal engineering teams as a technical authority — able to translate complex design trade-offs into clear decisions
Benefits
• Competitive salary and annual discretionary bonus scheme • Employee wellbeing benefits • 25 days of holiday, plus public holidays • Flexible working arrangements (remote or hybrid, depending on role and location) • Real ownership and autonomy, with the trust to take initiative and experiment • The opportunity to make a visible, meaningful impact as we scale • Clear career progression and growth opportunities in a fast-growing company • A collaborative, international culture built on trust, transparency, and ownership • The chance to help shape NexGen Cloud’s team, culture, and future alongside ambitious, mission-driven colleagues • MORE INFORMATION • MORE INFORMATION • Head over to our NexGen Cloud careers page to view current opening and follow us on LinkedIn and X to learn more about our journey, newest releases and hear exciting news in the neocloud space.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT