wagey.ggwagey.ggv1.0-4558734-20-Apr
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/AI Engineer Role/perplexity - Member of Technical Staff (AI Inference Engineer)
perplexity

perplexity - Member of Technical Staff (AI Inference Engineer)

London, UK, United Kingdom+ Equity1w ago
In OfficeMidEMEAArtificial IntelligenceAI EngineerStaff EngineerJAXCUDATritonRustPython

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems programming experience is a plus. • You understand modern LLM architectures and are able to bring them up reliably in a production environment. • You've built and operated production distributed systems under real load - ideally performance-critical ones. • Comfortable working across languages and layers: Rust for the serving runtime, Python for model code, CUDA/CuteDSL for kernels. • You own problems end-to-end. You can read a research paper on Monday, write a kernel on Wednesday, and debug a production incident on Friday. • Self-directed. You do well in fast-moving environments where the path forward isn't laid out for you. • ML compilers and framework internals: PyTorch internals, torch.compile, custom operators. • Distributed GPU communication: NCCL, NVLink, InfiniBand, RDMA libraries, model/tensor parallelism. • Low-precision inference: INT8/FP8/FP4 quantization, mixed-precision serving. • Profiling and debugging tools: Nsight Compute/Systems, CUDA-GDB, PTX/SASS analysis. • Container orchestration: Kubernetes, GPU scheduling, autoscaling inference workloads. • 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems. • Familiarity with at least one deep learning framework (PyTorch, JAX, TensorFlow). • Understanding of GPU architectures (memory hierarchy, warp scheduling, tensor cores). • Understanding of common LLM architectures and inference optimization techniques (e.g. quantization, speculative decoding, prefill-decode disaggregation). • Final offer amounts are determined by multiple factors including experience and expertise. • Equity: In addition to the base salary, equity may be part of the total compensation package.

Responsibilities

• New models support. Support transformer-based retrieval, text-generation, and multimodal models in our inference infrastructure, from weight loading, request scheduling and KV-cache management to support in API Gateway. • GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to NVIDIA's CuTe DSL so they run on GB200 today and are portable to Vera Rubin racks tomorrow. • Rust-native serving runtime. Develop our internal Rust-based inference server to solve all Python pains and keep up with rapidly growing traffic. • Performance optimisation. Profile and fix bottlenecks from network ingress through continuous batching and GPU kernels interleaving. • Reliability and observability. Build dashboards, alerts, and automated remediation so we catch regressions before users do. Respond to and learn from production incidents.

Similar Jobs

ProvectusProvectus - Middle AI/ML Engineer (GenAI, AWS)2d ago
·Bogotá, Capital District
In OfficeJuniorCloud ComputingArtificial IntelligenceML EngineerAI EngineerClaudeCursorTerraformAWSTransformersHugging FaceChromaPineconeVectorMLflowPythonDockerSAFeSQLPandas
OpenAIOpenAI - Scaled AI Success Engineer2d ago
·San Francisco, California, United States·$185k - $260k/year
In OfficeNASeniorArtificial IntelligenceSoftwareAdvisorAI EngineerKPI TrackingReportingAccount ManagementProgram ManagementWAU
Mistral AIMistral AI - Systems Engineer, HPC3d ago
·Paris / London / Amsterdam / Barcelona/Madrid / Berlin/Munich/Frankfurt / Lausanne - Hybrid
In OfficeEMEACloud ComputingSystems EngineerAI EngineerLinuxKubernetes
Astera InstituteAstera Institute - Research Scientist: Energy Based Models3d ago
·Emerybille, California, United States·$150k - $300k/year
In OfficeNAGenomicsResearch ScientistJAX
Unstructured Technologies Inc.Unstructured Technologies Inc. - AI Engineer - Public Sector3d ago
·Remote - Europe *·Equity
RemoteEMEACloud ComputingArtificial IntelligenceAI EngineerDocumentationReportingAWSFastAPIVector
Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X
Loading...