wagey.ggwagey.ggv1.0-4558734-20-Apr
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Machine Learning Engineer Role/Pluralis Research - Machine Learning Engineer
Pluralis Research

Pluralis Research - Machine Learning Engineer

San Fransisco, California, United States+ Equity3w ago
In OfficeSeniorNAArtificial IntelligenceMachine Learning EngineerTraining DevelopmentPythonNATS

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Strong experience building and operating distributed systems in production. • Hands-on expertise with distributed training frameworks (FSDP, DeepSpeed, Megatron, or similar). • Deep understanding of model parallelism (data, tensor, pipeline parallelism). • Expert-level Python with production experience (concurrency, error handling, retry logic, clean architecture). • Strong networking fundamentals: P2P systems, gRPC, routing, NAT traversal, distributed coordination. • Experience optimizing GPU workloads, memory management, and large-scale compute efficiency.

Responsibilities

• Distributed Training Architecture & Optimization • Design and implement large-scale distributed training systems optimized for heterogeneous hardware operating under low-bandwidth, high-latency conditions. • Develop and optimize model-parallel training strategies (data, tensor, pipeline parallelism) with custom sharding techniques that minimize communication overhead. • Optimize GPU utilization, memory efficiency, and compute performance across distributed nodes. • Implement robust checkpointing, state synchronization, and recovery mechanisms for long-running, fault-prone training jobs. • Build monitoring and metrics systems to track training progress, model quality, and system bottlenecks. • Decentralized Networking & Resilience • Architect resilient training systems where nodes can fail, networks can partition, and participants can dynamically join or leave. • Design and optimize peer-to-peer topologies for decentralized coordination across non-co-located nodes. • Implement NAT traversal, peer discovery, dynamic routing, and connection lifecycle management. • Profile and optimize communication patterns to reduce latency and bandwidth overhead in multi-participant environments.

Benefits

• Equity-heavy compensation with meaningful ownership in a mission-driven company • Competitive base salary for senior engineering roles in Australia • Visa sponsorship available for exceptional candidates • Remote-first with optional access to our Melbourne hub • World-class team — team mates were previously at at Google, Amazon, Microsoft, and leading startups • Backed by Union Square Ventures and other tier-1 investors, we're a world-class, deeply technical team of ML researchers and engineers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X
Loading...