wagey.ggwagey.gg
Open Tech JobsCompaniesPricing
Log InGet Started Free
Jobs/Machine Learning Engineer Role/Machine Learning Engineer — Inference Optimization

Machine Learning Engineer — Inference Optimization

Featherless AIRemote - (world)+ Equity1mo ago
RemoteWWArtificial IntelligenceMachine Learning EngineerCUDATritonONNX

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Strong experience in ML inference optimization or high-performance ML systems • Solid understanding of deep learning internals (attention, memory layout, compute graphs) • Hands-on experience with PyTorch (or similar) and model deployment • Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations) • Experience scaling inference for real users (not just research benchmarks) • Comfortable working in fast-moving startup environments with ownership and ambiguity • Experience with LLM or long-context model inference • Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton) • Experience optimizing across different hardware vendors • Open-source contributions in ML systems or inference tooling • Background in distributed systems or low-latency services

Responsibilities

• Optimize inference latency, throughput, and cost for large-scale ML models in production. • Profile and bottleneck GPU/CPU inference pipelines including memory usage, kernel executions, batching strategies, and input/output operations. • Implement and tune quantization techniques such as fp16, bf16, int8, and fp8 to reduce model size and improve performance. • Optimize KV-cache for reuse in inference systems. • Apply speculative decoding strategies along with batching and streaming optimizations. • Perform model pruning or architectural simplifications specifically tailored for the purpose of inference efficiency. • Collaborate closely with research engineers to translate new model architectures into production environments, ensuring they are fast and reliable enough for real user interaction. • Build and maintain robust systems capable of serving ML models (e.g., Triton server or custom runtimes) that can handle various hardware configurations like NVIDIA/AMD GPUs as well as cloud infrastructures. • Benchmark performance across different types of hardware setups, including but not limited to specific GPU and CPU brands from vendors such as NVIDIA and AMD, along with diverse cloud environments. • Enhance system reliability by improving observability features under actual workload conditions. • Work towards optimizing the cost efficiency of inference operations within realistic user scenarios without compromising on performance or accuracy.

Benefits

• Real ownership over performance-critical systems • Direct impact on product reliability and unit economics • Close collaboration with research, infra, and product • Competitive compensation + meaningful equity at Series A • A team that cares about engineering quality, not hype

Similar Jobs

Head of Machine Learning12h ago
HightouchHightouch·Remote - (North America)·$230k – $350k/year + Equity
RemoteNADirectorArtificial IntelligenceHead of AnalyticsMachine Learning EngineerLearning & Development
Machine Learning Engineer18h ago
SpotifySpotify·New York, NY·$148k – $148k/year + Equity
In OfficeNAHealth InsuranceInsuranceMachine Learning EngineerJavaRayScalaPythonHugging FaceLearning & DevelopmentTransformersApache SparkGCPAWS
Machine Learning Engineer Lead, Vulcan18h ago
AIFTAIFT·Taipei, Hong Kong, Singapore, Japan, Abu Dhabi
In OfficeAPACStaffArtificial IntelligenceMachine Learning EngineerDVCMLOpsMLflowKubeflowAirflowHugging FaceTeam ManagementTraining DevelopmentDockerKubernetesPythonMentoringVectorTeam LeadershipData GovernanceStakeholder Management
Lead Machine Learning Engineer18h ago
FacultyFaculty·London, London, United Kingdom - Hybrid
In OfficeEMEAStaffCloud ComputingArtificial IntelligenceMachine Learning EngineerTeam ManagementCoachingPythonscikit-learnAzureAWSGCPDockerKubernetesFull StackMentoring
Senior Machine Learning Engineer18h ago
FacultyFaculty·London, London, United Kingdom - Hybrid
In OfficeEMEASeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerAdvisorPythonDockerKubernetesMentoringAWSGCPAzure

Stop filling. Start chilling.Start chilling.

Get Started Free

No credit card. Takes 10 seconds.

© 2026 Dominic Morris. All rights reserved.·Privacy·Terms·