Baseten - Customer Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Deep Kubernetes troubleshooting expertise, including resource debugging, pod/runtime analysis, and log-based diagnostics with observability tooling (Grafana, Loki, Prometheus). • Strong infrastructure debugging across container orchestration, networking, and service dependencies, with hands-on production cluster experience. • Experience managing high-severity incidents with major customers — SLAs, war rooms, post-incident reviews, and clear executive-level communication throughout. • Proven project management skills with an ownership mindset: you can run multiple complex, multi-stakeholder initiatives in parallel without dropping threads. • Ability to translate recurring technical pain points into roadmap-level insights and product improvements. • Strong communication skills and executive presence during high-visibility situations, ensuring both technical clarity and customer confidence. • 3+ years of experience in a fast-paced, high-growth, or customer-facing engineering environment. • Familiarity with high-performance AI model serving, including troubleshooting ML pipelines from preprocessing through inference. • Experience with ticketing and incident-response platforms such as Pylon or Zendesk. • Hands-on experience with Helm, Flux, CI/CD tooling, or scripting automations for deployment and operational workflows. • Background in SRE, DevOps, or forward-deployed engineering roles at an infrastructure company.
Responsibilities
• Technical Support & Debugging • Serve as the first responder to all post-sales customer issues via ticketing (Pylon) and Slack, triaging and resolving Tier 1 and Tier 2 issues independently. • Diagnose runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management. • Debug infrastructure problems across Kubernetes (pods, controllers), networking, observability, and alerting systems. • Pull logs, read error traces, and correlate signals across Grafana, Loki, and Prometheus to pinpoint root causes — even when the real issue is buried layers deep. • Incident Response & Escalation • Lead incident response during outages and escalations, coordinating across Product, SRE, Sales, and Engineering. • Own customer communication through resolution — even when the fix is handed off to SRE or Infra — including delivering root-cause analyses after every P0/P1. • Escalate to SRE/ other engineering teams with structured context (customer, affected models, what you've already ruled out, specific ask) so nothing gets lost in translation. • Drive post-incident alerting reviews: why did the customer find this before we did, and what instrumentation or process change prevents it next time? • Proactive Account Ownership • Serve as the technical owner for top enterprise accounts with strict SLAs and high responsiveness expectations. • Set up and maintain proactive monitoring and alerts for all customer production models within 24 hours of handoff from SA(Solution Architect). • Drive the QBR process and proactive reengagement for expansion opportunities. • Track recurring failure patterns across accounts and push for durable fixes — not just incident closure. • Monitor internal feedback channels and route product-level issues to the right teams. • Cross-Functional Collaboration • Own the SA-to-CE handoff for new customers: validate architecture, confirm production-readiness milestones, and establish escalation paths. • Maintain and improve runbooks, knowledge bases, and diagnostic best practices so the team scales with the customer base. • Translate user feedback into roadmap signals, documentation improvements, and product enhancements. • Coordinate end-to-end on projects spanning feature requests, new deployments, and operational debugging — scoping, execution, communication, and stakeholder alignment.
Benefits
• $165K – $330K • Offers Equity • Competitive compensation. We aim to provide 90th percentile (or better) salaries and equity grants for every team member commensurate with their experience. • Upload your resume here to autofill key application fields. • Drop your resume here! • Parsing your resume. Autofilling key fields... • or drag and drop here • Decline to self-identify • Hispanic or Latino - A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race. • Hispanic or Latino • White (Not Hispanic or Latino) - A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. • White • Black or African American (Not Hispanic or Latino) - A person having origins in any of the black racial groups of Africa. • Black or African American • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) - A person having origins in any of the peoples of Hawaii, Guam, Samoa, or other Pacific Islands. • Native Hawaiian or Other Pacific Islander • Asian (Not Hispanic or Latino) - A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian Subcontinent, including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. • Asian • American Indian or Alaska Native (Not Hispanic or Latino) - A person having origins in any of the original peoples of North and South America (including Central America), and who maintain tribal affiliation or community attachment. • American Indian or Alaska Native • Two or More Races (Not Hispanic or Latino) - All persons who identify with more than one of the above five races. • Two or More Races • Hispanic or Latino • White (Not Hispanic or Latino) • Black or African American (Not Hispanic or Latino) • Native Hawaiian or Other Pacific Islander (Not Hispanic or Latino) • Asian (Not Hispanic or Latino) • American Indian or Alaska Native (Not Hispanic or Latino) • Two or More Races (Not Hispanic or Latino) • I identify as one or more of the classifications of protected veteran listed above • I am not a protected veteran
Similar Jobs
No credit card. Takes 10 seconds.