wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Support Engineer Role(259)/Fundamental (11) - Model Serving Engineer
Fundamental

Fundamental - Model Serving Engineer

United States+ Equity2mo ago
In OfficeSeniorNAArtificial IntelligenceCloud ComputingSupport EngineerPythonTritonCUDAKubernetesHelm

Requirements

• Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience) • 5+ years of experience in model serving, ML infrastructure, or a closely related backend engineering role • Deep, production-level experience with Triton Inference Server, including custom Python backends, batching configuration, and model repository management • Expert-level Python skills with a thorough understanding of the GIL, multi-threading, multiprocessing, and async concurrency patterns • Strong understanding of neural network inference mechanics, forward passes, batching strategies, memory management, and numerical precision tradeoffs • Hands-on experience with other inference frameworks (TorchServe, TensorFlow Serving, ONNX Runtime, vLLM, etc.) and the ability to evaluate tradeoffs between them • Experience profiling and optimizing inference code for latency and throughput at production scale • Experience with GPU kernel-level optimizations or CUDA profiling tools • Familiarity with model quantization, pruning, or compilation toolchains (TensorRT, torch.compile, ONNX) • Experience with KServe or other Kubernetes-native serving platforms • Experience serving tabular or structured data models, including classical ML models such as XGBoost and CatBoost • Experience with observability tooling such as Prometheus, Grafana, or Datadog in the context of inference monitoring

Responsibilities

• Design, build, and maintain production model serving infrastructure using Triton Inference Server as the primary framework • Implement and optimize inference pipelines including custom backends, dynamic batching strategies, and model ensemble configurations in Triton • Optimize Python inference code for performance, with a strong focus on GIL contention, multi-threading, and concurrency patterns • Tune throughput and latency across the full serving stack, batching policies, thread pool sizing, model instance groups, and memory layout • Work closely with the research team to understand new model architectures at a computational level, batching behavior, dynamic shapes, memory access patterns etc • Own the full resource observability and control loop for production inference - instrument GPU memory, CPU, batch queue depth, and latency metrics, and actively tune model instance groups, concurrency limits, memory budgets, and batching configuration in response to observed behavior • Evaluate and integrate alternative inference frameworks and runtimes as the model ecosystem evolves • Contribute to GPU utilization improvements and resource efficiency across the serving fleet

Benefits

• Competitive compensation with salary and equity • Comprehensive health coverage, including medical, dental, vision, and 401K • Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys • Relocation support for employees moving to join the team in one of our office locations • A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

AB InBev  | Growth GroupAB InBev | Growth Group - Support Engineer1w ago
·Remote - Americas
RemoteNACloud ComputingHigher EducationSupport EngineerAzureKubernetesDockerAWSJavaPythonNew RelicGrafanaChange Management
Serve RoboticsServe Robotics - Lead Engineer, Reinforcement Learning & Scenario Generation5mo ago
·Redwood City , California , United States·$225k - $300k/year + Equity
RemoteNAStaffCloud ComputingArtificial IntelligenceSupport EngineerMachine Learning EngineerCUDABaseLearning & DevelopmentRayKubernetes
NorthbeamNorthbeam - Senior Support Engineer5mo ago
·Remote - ET (Eastern)·$100k - $100k/year + Equity
RemoteNASeniorData AnalyticsE-commerceSupport EngineerDocumentationSQLJavaScriptDjangoPython
SayariSayari - Forward Deployed Engineer, CE2mo ago
·Remote - USA·$100k - $100k/year + Equity
RemoteNASeniorCloud ComputingArtificial IntelligenceSupport EngineerMobile EngineerPythonPandasSQLAWSGCP
n8nn8n - Senior Support Engineer | Remote | North America1mo ago
·Remote - United States·Equity
RemoteNASeniorCloud ComputingSupport EngineerJavaScriptNode.jsDockerAWSLinux
AerospikeAerospike - Senior Support Engineer2mo ago
·Remote, Canada - Hybrid
In OfficeNASeniorDiagnosticsCloud ComputingSupport EngineerLinuxNoSQLBaseAWSDocker
FOSSAFOSSA - Senior Support Engineer - East Coast1w ago
·Remote - ET (Eastern)·$100k - $120k/year + Equity
RemoteNASeniorSoftwareSupport EngineerBashPythonWindsurfCursorClaudeB2BRecords ManagementJenkinsDocumentationSAFeStakeholder Management
Blackpoint CyberBlackpoint Cyber - Support Engineer II (Bilingual)2w ago
·Remote - USA
RemoteNAMidCloud ComputingSupport EngineerBashZendeskPythonAnsibleLearning & DevelopmentReportingDockerAzureAWSAccount ManagementDocumentation
meridianlinkmeridianlink - Sr. Software Engineer - Engineering Enablement1w ago
·Remote - US Remote
RemoteNASeniorFintechCloud ComputingSoftware EngineerSupport EngineerGitJenkinsClaudePythonTypeScriptCursorAWSKubernetesAzureDockerTerraformHelmHarnessPulumiJiraConfluenceDocumentation

Browse more by category

Show 259 moreSupport EngineerShow 6,205 morePythonShow 41 moreTritonShow 58 moreCUDAShow 1,860 moreKubernetesShow 133 moreHelm
Privacy·Terms··Contact·FAQ·Wagey on X