Software Engineer, Observability
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 6+ years of software engineering experience, with 3+ years focused on observability, or infrastructure at scale. • Demonstrated success implementing and running production-grade logging, metrics, or tracing systems. • Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes). • Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse. • Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling. • Experience mentoring engineers and collaborating across multiple teams. • Strong communication skills to effectively present technical trade-offs and architectural plans. • Eagerness to own high-impact initiatives from design through production and maintenance. • Proven ability to balance short-term fixes with long-term strategic vision. • A passion for enabling all of Airtable’s engineering organization through reliable, intuitive observability tools. • Commitment to measuring success by the velocity and confidence with which product teams can ship.
Responsibilities
• Architect and scale core observability • Lead the design and evolution of logging, metrics, and tracing pipelines to handle massive data volumes • Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack) that enhance Airtable’s observability posture • Guide and mentor a growing team of infrastructure engineers; share best practices in distributed tracing, monitoring, and logging • Define and uphold coding standards and operational excellence across the org • Partner with Deploy Infrastructure, Service Orchestration, and Product teams to embed observability throughout the development lifecycle • Align infrastructure decisions with business goals to detect issues before they impact customers • Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets • Optimize performance and cost of large-scale data pipelines and storage • Shape the observability roadmap, prioritizing initiatives like improved tracing coverage, advanced monitoring dashboards, and next-gen logging pipelines • Continuously explore emerging trends to keep Airtable’s monitoring capabilities at the cutting edge • Extend observability to LLM and AI features • Instrument prompts, model calls, and RAG pipelines to capture latency, reliability, cost, and safety signals • Design online and offline evaluation loops for LLM quality, including canary analysis and drift detection • Build dashboards and alerts for token usage, error rates, guardrail triggers, and model performance; connect these signals to tracing for prompt lineage • Partner with AI and Product teams to define SLOs for AI features and close the feedback loop from incidents to model and prompt improvements
Benefits
• High ImpactLead the modernization of Airtable’s observability stack, influencing how every engineer monitors and debugs mission-critical systems. • High Impact • Room to InnovateDefine and execute on a multi-year roadmap, introducing advanced logging, tracing, and metrics solutions that shape the entire developer experience. • Room to Innovate • Career GrowthAs a Sr Software engineer, you’ll drive major projects across engineering organization to build platform and services for solving observability problems • Career Growth • Collaborative CultureWork alongside talented platform engineers, product teams, and leadership to make data-driven decisions and ensure platform reliability. • Collaborative Culture