Gather AI - Senior Machine Learning Engineer (Ops)

Remote - India3mo ago

Remote Senior APAC Cloud Computing Artificial Intelligence Machine Learning Engineer MLOps Docker Kubernetes Python Airflow

Requirements

• 6+ years of industry experience (outside academia) in ML engineering, MLOps, or infrastructure engineering • Deep operational fluency with Kubernetes and Docker for ML workload orchestration • Strong production-grade Python skills with a track record of hardening research code into scalable microservices • Hands-on experience with CI/CD for ML (e.g., GitHub Actions, GitLab CI) and model serving frameworks (e.g., KServe, SageMaker, Vertex AI Endpoints) • Experience with pipeline orchestration and model lifecycle tools such as Airflow, MLflow, Kubeflow, or Flyte • Proven ownership of production system reliability, including SRE principles, observability stacks, and automated failure safeguards • Prior experience building end-to-end MLOps pipelines (data, model, and inference) from scratch • Domain experience in logistics, supply chain, or robotics-adjacent cloud platforms • Familiarity with feature stores and training/serving data consistency patterns • Experience with Infrastructure as Code tools such as Terraform

Responsibilities

• Migrate box and barcode detection pipelines to cloud infrastructure following MLOps best practices • Build and maintain CI/CD pipelines for deployment across production and non-production environments • Implement automated rollback, canary, and blue-green deployment strategies for ML microservices • Build out a multi-tenant MLOps platform using tools like Prefect, ZenML, or similar orchestration frameworks • Establish a centralized model registry and versioning system for all production assets • Instrument observability across the ML stack — logging, metrics, and distributed tracing — to ensure reliability at scale