Strong 8+ years of experience with at least one programming language - any major language (Python, .NET, Java, Go, Rust, etc) is acceptable
Demonstrated working experience in operating high-scale production systems running on Kubernetes and monitoring it, including on-call participation, incident response, and postmortem practices
Familiarity with observability tooling (e.g. Grafana)
Strong understanding of time-series data, metrics cardinality challenges, and cost/performance tradeoffs/optimizations in observability systems
Experience in a hands-on technical leadership role - setting technical direction, leading project teams, and influencing architectural decisions beyond your immediate team
Deep understanding of distributed systems concepts including scalability, consistency, high availability, and failure modes in large-scale systems
Experience writing clean, maintainable, robust, and performant software
Experience with delivering projects from start to finish in a self-driven manner
Excellent problem-solving and debugging skills
Strong mentoring and leadership skills
Experience operating or scaling Prometheus in high-cardinality, multi-tenant environments
Experience working with OpenTelemetry Collector pipelines or similar telemetry ingestion systems
Certified Kubernetes Administrator (CKA)/ Certified Kubernetes Application Developer (CKAD) or any other Kubernetes related certification from CNCF
Experience developing Kubernetes operators, controllers, or custom resources
Strong understanding of metrics collection, visualization, and alerting concepts
Experience contributing to or maintaining open source projects, with evidence of successful pull requests and community collaboration
Experience designing and building observability backends for various systems and applications
Responsibilities
At Grafana Labs, our engineers have a dedicated career path and do not have to become managers to progress in their career. Staff Software Engineers at Grafana have a large amount of experience across multiple areas. They are able to estimate, plan, coordinate and deliver large tasks spanning multiple systems. They actively coach and mentor other team members in their team and are able to identify and resolve issues with technology and product processes.
You will bring your passion for observability and software engineering expertise to help us take our infrastructure monitoring capabilities within Grafana Cloud to the next level. This will include working with our Kubernetes monitoring solution.
Design and implement high-quality, scalable integrations for various infrastructure components, applications, and data ingestion pipelines
Create middleware components and libraries that simplify development and maintenance of observability solutions
When necessary, represent Grafana Labs in open source forums, working groups, and events
Work with product teams, in addition to design and docs, to develop features that align with wider product strategy and customer needs
Lead the technical direction and vision of the team, contributing to strategic discussions and future development of observability solutions
Work with other departments including Sales, Product, and Support teams to deliver a holistic product experience
Take ownership of the services you’re running by deploying well tested clean code
Embrace our open-source culture and contribute to other projects that may not directly fall within your team’s scope
As we are remote-first and our engineering organization is entirely remote, we provide guidance and meet regularly using video calls, so an independent attitude, good communication skills, and transparency are a must.
We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction. We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards. You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).
What Makes You a Great Fit:
You have a passion for observability and like to share your knowledge by writing documentation and blog posts.
You love to engage with customers and help them out.
You have excellent communication skills.
You have relevant open source experience, ideally in the observability domain.
You are willing to become an active member of the OpenTelemetry and Prometheus communities.
You’re curious and you enjoy learning new programming languages and frameworks, setting up examples, and figuring out how things work.
You have a good understanding of typical production environments. Ideally you have been responsible for operating production services and organizing on-call.
You actively mentor other team members, identifying areas for focus and improvement.
Benefits
100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose.
100% Remote, Global Culture -
Scaling Organization – Tackle meaningful work in a high-growth, ever-evolving environment.
Scaling Organization
Transparent Communication – Expect open decision-making and regular company-wide updates.
Transparent Communication
Innovation-Driven – Autonomy and support to ship great work and try new things.
Innovation-Driven
Open Source Roots – Built on community-driven values that shape how we work.
Open Source Roots
Empowered Teams – High trust, low ego culture that values outcomes over optics.
Career Growth Pathways – Defined opportunities to grow and develop your career.
Career Growth Pathways
Approachable Leadership – Transparent execs who are involved, visible, and human.
Approachable Leadership
Passionate People – Join a team of smart, supportive folks who care deeply about what they do.
Passionate People
In-Person onboarding - We want you to thrive from day 1 with your fellow new ‘Grafanistas’ to learn all about what we do and how we do it.
In-Person onboarding
Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.