wagey.ggwagey.ggv1.0-38ee235-5-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Machine Learning Engineer Role/Gather AI - Senior Machine Learning Engineer (Ops)
Gather AI

Gather AI - Senior Machine Learning Engineer (Ops)

Remote - India1mo ago
RemoteSeniorAPACCloud ComputingArtificial IntelligenceMachine Learning EngineerMLOpsDockerKubernetesPythonAirflow

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• 6+ years of industry experience (outside academia) in ML engineering, MLOps, or infrastructure engineering • Deep operational fluency with Kubernetes and Docker for ML workload orchestration • Strong production-grade Python skills with a track record of hardening research code into scalable microservices • Hands-on experience with CI/CD for ML (e.g., GitHub Actions, GitLab CI) and model serving frameworks (e.g., KServe, SageMaker, Vertex AI Endpoints) • Experience with pipeline orchestration and model lifecycle tools such as Airflow, MLflow, Kubeflow, or Flyte • Proven ownership of production system reliability, including SRE principles, observability stacks, and automated failure safeguards • Prior experience building end-to-end MLOps pipelines (data, model, and inference) from scratch • Domain experience in logistics, supply chain, or robotics-adjacent cloud platforms • Familiarity with feature stores and training/serving data consistency patterns • Experience with Infrastructure as Code tools such as Terraform

Responsibilities

• Migrate box and barcode detection pipelines to cloud infrastructure following MLOps best practices • Build and maintain CI/CD pipelines for deployment across production and non-production environments • Implement automated rollback, canary, and blue-green deployment strategies for ML microservices • Build out a multi-tenant MLOps platform using tools like Prefect, ZenML, or similar orchestration frameworks • Establish a centralized model registry and versioning system for all production assets • Instrument observability across the ML stack — logging, metrics, and distributed tracing — to ensure reliability at scale

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X