wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Machine Learning Engineer Role(464)/fal-ai (4) - Machine Learning Engineer, Reliability
Pro members applied to this job 36 hours before you saw itGet Pro ›
fal-ai

fal-ai - Machine Learning Engineer, Reliability

Hybrid - Asia-Pacific *2d ago
In OfficeJuniorAPACArtificial IntelligenceMachine Learning EngineerTransformersKubernetesPython

Requirements

• 3+ years of professional experience, with 1 year experience operating production ML or high-scale API systems, ideally with on-call ownership • Strong systems fundamentals: distributed systems, networking, observability, and incident management • Working knowledge of modern generative models (diffusion, transformers) and their failure modes in production • Familiarity with security and safety practices for ML systems ,abuse prevention, content safety, or trust & safety engineering experience is a strong plus • A bias toward automation, measurement, and blameless postmortems • Location: Remote (India, Australia, New Zealand)

Responsibilities

• Own availability, latency, and throughput SLOs across a large fleet of generative media model APIs serving production traffic at scale • Build the monitoring, alerting, and observability needed to catch ML-specific failures, output quality degradation, pipeline breakage, model regressions before customers do • Harden model deployment workflows with canary releases, shadow testing, automated rollbacks, and validation gates so new model versions ship safely • Drive the security posture of the model fleet: secure model serving, abuse and misuse detection, rate limiting, and protection against adversarial usage patterns • Operationalize safety systems for generative media, content moderation pipelines, safety classifiers, and guardrails that run reliably at inference time without compromising performance • Lead incident response for model API outages and degradations, run postmortems, and drive the engineering work that prevents recurrence • Improve capacity planning, autoscaling, and GPU fleet efficiency for inference workloads under highly variable traffic • Partner with model and infrastructure teams to make reliability, security, and safety requirements part of how new models get onboarded to the platform • You will have access to our massive GPU cluster for inference and evaluation • Some core technologies we use include Python, torch, diffusers, Kubernetes, and the fal Python SDK • You'll work alongside a team dedicated to quickly iterating on and deploying new AI breakthroughs — your job is to make sure that speed never comes at the cost of reliability

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

relationrxrelationrx - Machine Learning Scientist – Sequence Modelling2mo ago
·London, United Kingdom
In OfficeEMEAGenomicsArtificial IntelligenceMachine Learning EngineerPythonTransformers
cantinacantina - Machine Learning Engineer (Singapore)1mo ago
·Singapore·Equity
In OfficeAPACCloud ComputingArtificial IntelligenceMachine Learning EngineerPythonRayAirflowDockerKubernetes
Gather AIGather AI - Senior Machine Learning Engineer (Ops)3mo ago
·Remote - India
RemoteAPACSeniorCloud ComputingArtificial IntelligenceMachine Learning EngineerMLOpsDockerKubernetesPythonAirflow
Kronos ResearchKronos Research - Machine Learning Researcher5mo ago
·Singapore
In OfficeAPACArtificial IntelligenceFintechNonprofitMachine Learning EngineerC++PythonMarket ResearchTraining DevelopmentTransformers
Smart Working SolutionsSmart Working Solutions - Machine Learning Engineer (Remote, Full-Time) [AS207]3mo ago
·Remote - India / Ahmedabad / Bangalore / Chennai / Delhi NCR / Hyderabad / Mumbai
RemoteAPACJuniorCloud ComputingArtificial IntelligenceMachine Learning EngineerAWSMLOpsPythonSQLSnowflake
BetterHelpBetterHelp - Machine Learning Engineer3d ago
·Remote - US·$150k - $180k/year
RemoteNAMidArtificial IntelligenceMachine Learning EngineerPythonTransformersSQL
DeliverooDeliveroo - Senior Machine Learning Engineer3mo ago
·London, City of, United Kingdom, Hybrid
In OfficeEMEASeniorArtificial IntelligenceMachine Learning EngineerPythonLearning & DevelopmentDockerKubernetesTransformersGoMentoring
MozillaMozilla - Senior Machine Learning Engineer, AI Platform2w ago
·Remote - Canada·Equity
RemoteNASeniorArtificial IntelligenceMachine Learning EngineerPythonLearning & DevelopmentDockerKubernetes
Insider OneInsider One - Senior Machine Learning Engineer (Agentic AI)1mo ago
·Remote - Istanbul, Turkiye·Equity
RemoteEMEASeniorCloud ComputingArtificial IntelligenceSoftwareMachine Learning EngineerPythonSQLAWSKubernetes

Browse more by category

Show 464 moreMachine Learning EngineerShow 110 moreTransformersShow 1,860 moreKubernetesShow 6,205 morePython
Privacy·Terms··Contact·FAQ·Wagey on X