Smartsheet - Senior Manager, Engineering - Observability Platform
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 10+ years of software or platform engineering experience, with strong fundamentals in distributed systems, infrastructure, and backend services. • 3 years of engineering management experience, including direct team building, performance management, and cross-functional delivery ownership. • Deep hands-on expertise with observability tooling: Datadog (APM, metrics, logs, alerting), OpenSearch or Elasticsearch, distributed tracing (OpenTelemetry or equivalent), and SLO/SLA management at scale. • Proven experience operating observability platforms for high-availability, high-throughput production environments. • Experience building and scaling engineering teams in distributed or international focus • Strong execution track record on complex, cross-functional infrastructure programs with high ambiguity. • Clear, direct communication (written and verbal) with both technical and non-technical audiences, including leadership and executive stakeholders. • Proactive risk identification and status communication without prompting. • Experience managing vendors, external delivery partners, and third-party integrations in a platform context. • Preferred • Preferred • Hands-on experience with AI/ML observability: MLflow tracing, LLM evaluation pipelines, or observability for agentic AI systems. • Familiarity with Amazon Bedrock, ECS Fargate, or LangGraph-based multi-agent architectures. • Experience with cloud cost governance and FinOps practices for observability tooling • Exposure to data platform observability and data quality monitoring in a lakehouse context • Experience establishing internal developer platforms, shared libraries, or platform-as-a-service offerings for application teams. • Prior work in SaaS environments with enterprise compliance requirements (SOC 2, FedRAMP, HIPAA). • Education & Eligibility • CS, Engineering, or equivalent degree, or commensurate practical experience. • Legally eligible to work in the U.S. on an ongoing basis
Responsibilities
• Team & Platform Leadership • Lead a team of engineers focused on observability platform engineering, driving build-out of a unified observability stack used by all engineering teams at Smartsheet. • Own and evolve the platform's technical roadmap, consolidating multiple tooling platforms, and AI observability tooling into a coherent, scalable capability. • Define platform standards, contribute to architectural direction, and ensure the team operates with engineering rigor and strong operational habits. • Build and scale the team, hiring senior engineers and establishing effective global practices across distributed stakeholders. • Observability Engineering • Lead design and delivery of centralized observability infrastructure covering metrics pipelines, distributed tracing, alerting frameworks, and log analytics across Smartsheet services. • Drive SLO/SLA definition and tooling for platform-wide reliability visibility, partnering closely with infrastructure, platform engineering, and on-call teams. • Own governance including instrumentation standards, cost optimization, and rollout of advanced capabilities such as APM, RUM, and custom dashboards. • Lead architecture, scaling, and operational practices for log analytics across high-throughput production workloads. • Establish shared observability libraries, agents, and SDKs that reduce instrumentation burden for application engineering teams. • AI Observability • AI Observability • Build and maintain AI/ML observability integrations in partnership with the AI Platform team. • Partner with the Data & AI Platform team to integrate MLflow tracing, Inference Tables, and LLM-as-judge evaluation pipelines into the observability stack. • Develop dashboards and alerting for agentic AI workloads, including latency, token consumption, error rates, and evaluation metric drift. • Contribute to the AI governance and cost observability program, providing telemetry for model usage, cost attribution, and compliance reporting. • Cross-Functional Partnership & Execution • Serve as the primary engineering partner for platform consumers across Data & AI, Commerce, Infrastructure, and Security teams, ensuring observability needs are met across workstreams. • Lead complex, cross-functional observability projects with high ambiguity, managing delivery risk, communicating clearly to senior stakeholders, and building alignment across teams. • Partner with delivery partners to coordinate instrumentation across platform modernization and migration workstreams • Contribute to quarterly and annual platform goals, reporting on key reliability and observability metrics to engineering leadership. • Communicate platform status, risks, and roadmap progress to Engineering leadership and above audiences in a clear, executive-ready format. • Operational Excellence • Operational • Excellence • Embed on-call culture and incident management discipline into the team, ensuring clear runbooks, fast MTTR, and post-incident learning loops. • Drive cost governance for observability tooling, including spend optimization and efficient resource management. • Champion AI-assisted engineering practices within the team, applying tooling and automation to reduce toil and accelerate delivery.
Benefits
• Employer subsidized medical/vision and dental coverage for full-time employees • 401k Match to help you save for your future (50% of your contribution up to the first 6% of your eligible pay) • Monthly stipend to support your work and productivity • Flexible Time Away Program, plus Sick Time Off • US employees are automatically covered under Smartsheet-sponsored life insurance, short-term, and long-term disability plans • US employees receive 12 paid holidays per year • Up to 24 weeks of Parental Leave • Personal paid Volunteer Day to support our community • Opportunities for professional growth and development including access to Udemy online courses • Company Funded Perks, including a counseling membership, local retail discounts, and your own personal Smartsheet account • Teleworking options from any registered location in the U.S. (role specific) • Smartsheet provides a competitive base salary range for roles that may be hired in different geographic areas we are licensed to operate our business from. Actual compensation is determined by several factors including, but not limited to, level of professional, educational experience, skills, and specific candidate location. In addition, this role will be eligible for a market competitive incentive opportunity. • $205,000 - $275,000 USD • Get to Know Us: • Get to Know Us: • At Smartsheet, your ideas are heard, your potential is supported, and your contributions have real impact. You’ll have the freedom to explore, push boundaries, and grow beyond your role. We welcome diverse perspectives and nontraditional paths—because we know that impact comes from individuals who care deeply and challenge thoughtfully. When you’re doing work that stretches you, excites you, and connects you to something bigger, that’s magic at work. Let’s build what’s next, together.
No credit card. Takes 10 seconds.