Pluralis Research - Machine Learning Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Strong experience building and operating distributed systems in production. • Hands-on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar). • Deep understanding of model parallelism (data, tensor, pipeline parallelism). • Expert-level Python with production experience (concurrency, error handling, retry logic, clean architecture). • Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination. • Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.
Responsibilities
• Distributed Training Architecture & Optimization • Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions. • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead. • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes. • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs. • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks. • Decentralized Networking & Resilience • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave. • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes. • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management. • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.
Benefits
• Equity-heavy compensation with meaningful ownership in a mission-driven company • Competitive base salary for senior engineering roles in Australia • Visa sponsorship available for exceptional candidates • Remote-first with optional access to our Melbourne hub • World-class team — team mates were previously at at Google, Amazon, Microsoft, and leading startups • Backed by Union Square Ventures and other tier-1 investors, we're a world-class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.
No credit card. Takes 10 seconds.