AI Serving: Triton, vLLM, custom inference services
Responsibilities
1. AI / MLOps (Production for Models)
GPU Infrastructure: Deploy and maintain high-performance GPU clusters.
AI Lifecycle: Manage the full lifecycle of AI services: inference deployment (Triton, vLLM, custom services), autoscaling, and seamless rollout/rollback strategies.
Data Management: Manage model storage, artifact versioning, caching, and high-speed data access via S3-compatible storage.
Observability: Monitor performance metrics including latency, throughput, error budgets, resource limits, and cost/performance ratios.
2. PSP / Fintech Reliability
High Availability: Ensure fault tolerance for payment services (SLA/SLO management, redundancy, Disaster Recovery planning, and regular recovery testing).