6+ years of software engineering experience, with 3+ years focused on observability, or infrastructure at scale.
Demonstrated success implementing and running production-grade logging, metrics, or tracing systems.
Proficiency in distributed systems concepts, data streaming pipelines, and container orchestration (Kubernetes).
Deep hands-on knowledge of tools such as Prometheus, Grafana, Datadog, OpenTelemetry, ELK Stack, Loki, or ClickHouse.
Comfort with at least one programming language (e.g., Go, Python, Java) to build and maintain observability tooling.
Experience mentoring engineers and collaborating across multiple teams.
Strong communication skills to effectively present technical trade-offs and architectural plans.
Eagerness to own high-impact initiatives from design through production and maintenance.
Proven ability to balance short-term fixes with long-term strategic vision.
A passion for enabling all of Airtable’s engineering organization through reliable, intuitive observability tools.
Commitment to measuring success by the velocity and confidence with which product teams can ship.
Responsibilities
Architect and scale core observability
Lead the design and evolution of logging, metrics, and tracing pipelines to handle massive data volumes
Evaluate and integrate new technologies (e.g., OpenTelemetry, ClickHouse, ELK stack) that enhance Airtable’s observability posture
Guide and mentor a growing team of infrastructure engineers; share best practices in distributed tracing, monitoring, and logging
Define and uphold coding standards and operational excellence across the org
Partner with Deploy Infrastructure, Service Orchestration, and Product teams to embed observability throughout the development lifecycle
Align infrastructure decisions with business goals to detect issues before they impact customers
Own end-to-end reliability for observability tools and establish SLAs, SLOs, and error budgets
Optimize performance and cost of large-scale data pipelines and storage
Shape the observability roadmap, prioritizing initiatives like improved tracing coverage, advanced monitoring dashboards, and next-gen logging pipelines
Continuously explore emerging trends to keep Airtable’s monitoring capabilities at the cutting edge
Extend observability to LLM and AI features
Instrument prompts, model calls, and RAG pipelines to capture latency, reliability, cost, and safety signals
Design online and offline evaluation loops for LLM quality, including canary analysis and drift detection
Build dashboards and alerts for token usage, error rates, guardrail triggers, and model performance; connect these signals to tracing for prompt lineage
Partner with AI and Product teams to define SLOs for AI features and close the feedback loop from incidents to model and prompt improvements
Benefits
High ImpactLead the modernization of Airtable’s observability stack, influencing how every engineer monitors and debugs mission-critical systems.
High Impact
Room to InnovateDefine and execute on a multi-year roadmap, introducing advanced logging, tracing, and metrics solutions that shape the entire developer experience.
Room to Innovate
Career GrowthAs a Sr Software engineer, you’ll drive major projects across engineering organization to build platform and services for solving observability problems
Career Growth
Collaborative CultureWork alongside talented platform engineers, product teams, and leadership to make data-driven decisions and ensure platform reliability.