wagey.ggwagey.ggv1.0-e93b95d-4-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Backend Engineer Role/Grafana Labs - Staff Backend Engineer - Application Core Services, Stacks | Canada | Remote
Pro members applied to this job 36 hours before you saw itGet Pro ›
Grafana Labs

Grafana Labs - Staff Backend Engineer - Application Core Services, Stacks | Canada | Remote

Remote - ET (Eastern)4d ago
RemoteStaffNACloud ComputingSoftwareBackend EngineerGrafanaAWSGCPAzureKubernetes

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• You have at least 1 year of fully remote work experience • You have worked on a big SaaS platform and dealt with common distributed systems problems (e.g. scalability, multi-tenancy, data isolation, HA, …) • Have professional experience with Golang and be willing to work across both backend service and application code • Care deeply about developer and user experience and the quality of the products that you work on • Have some experience with delivering projects from gathering requirements, and brainstorming ideas to shipping a product to the customer’s hands in a self-driven way • You write clean, robust, well-tested software that other engineers can understand, operate, and maintain • Have experience with mentoring junior engineers in a collaborative but asynchronous environment • Can take on complex challenges and break them down to achieve tight learning loops: to analyze, design, and build modular solutions, deliver MVPs, gather data and feedback, and then progress iteratively • You are willing to work across teams. Your work has to be aligned with the needs of other squads and external stakeholders. You make your plans transparent, bring stakeholders on board, and are open to feedback and suggestions • Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform, Jsonnet, etc.) • Experience participating in blameless incident response and writing high-quality post-incident reviews • Experience with TypeScript/Node.js • Experience with Kubernetes control-plane patterns, operators, reconcilers, or desired-state systems • Experience with Jsonnet/Tanka, Terraform, Flux, Argo, or similar deployment/configuration tooling • Experience working on SaaS provisioning, tenancy, regional expansion, plugin rollout, or customer lifecycle systems • Experience with incident response involving configuration drift, partial failure, or cross-service state mismatch

Responsibilities

• The AppCore Stacks squad owns the systems that create, configure, reconcile, migrate, and operate Grafana Cloud stacks at scale. A stack is the customer-facing Grafana Cloud environment that connects an organization to Grafana and the backend services it uses, including Mimir, Loki, Tempo, plugins, dashboards, data sources, and stack-level configuration. • Our work sits at the intersection of product, platform, and operations. We build the control-plane services and workflows that keep stack state aligned across grafana.com, Stack State Service (SSS), Hosted Grafana, cloud regions, and the underlying Grafana Cloud infrastructure. When this domain works well, customers get reliable stack creation, safe configuration rollout, predictable migrations, and fewer manual operational interventions. • Design, build, and operate reconciliation systems, including the SSS backend, to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient • Improve operational efficiency by reducing deployment complexity (e.g., aiming for single PR regional SSS deployment) and contributing to the Stack Config Reconciliation project • Manage rollout mechanisms for provisioned plugins, dashboards, data sources, Grafana versions, release channels, and stack-level configuration • Support new region and cluster rollouts, including the operational paths required to bring stacks online safely in new Grafana Cloud regions • Improve incident response and recovery paths for stack misalignment, reconciliation failures, plugin rollout issues, and Hosted Grafana integration failures • Partner with Product, Hosted Grafana, Infrastructure, Support, and adjacent AppCore squads on customer-impacting stack lifecycle work • Contribute to roadmap planning, technical design, OnCall improvements, and long-term simplification of stack operations • You will help own the production behavior of the systems you build. That includes improving runbooks, dashboards, alerts, reconciliation safety, rollout controls, and recovery procedures. You should be comfortable debugging across service boundaries and making careful changes in systems that affect customer stacks • Of course, there is an on-call component to this role and one that we take seriously. As a company, we hire globally (remote-first) to ensure our on-call remains healthy and aligned to approximately 12 daylight hours per day. You will work closely with counterparts in other regions to provide balanced coverage and shared ownership. • We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction. We encourage pragmatic AI-assisted development: faster prototyping, test generation, refactors, documentation, and incident follow-ups—always paired with strong code review and quality standards. You’ll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro). • What Makes You a Great Fit: • At Grafana, we actively embrace AI-assisted and agentic development practices, integrating these technologies into both our engineering workflows and the systems we deliver. We encourage our engineers to thoughtfully leverage AI tools to enhance every stage of the lifecycle, from design and implementation to testing, documentation, and operations. We also look for strategic opportunities to embed agentic capabilities within our services to eliminate toil, bolster reliability, and ensure that complex customer workflows remain resilient and safe. • We are seeking a Staff Backend Engineer who thrives on building production systems where correctness, scalability, and operational clarity are paramount. As a remote-first organization, you should be comfortable collaborating asynchronously across time zones and taking full ownership of the critical systems powering Grafana Cloud. Our team is small and operates with a high degree of independence; you will be expected to lead major projects, coordinate across service boundaries, and help define the technical direction for our domain. • You will be particularly successful in this role if you enjoy solving challenges related to stateful systems, eventual consistency, and reconciliation loops. We value engineers who can take ambiguous lifecycle requirements and transform them into explicit, modular solutions. You should be adept at breaking down complex systems work into safe, iterative increments while clearly communicating technical tradeoffs to both internal stakeholders and adjacent product teams. • Some things you might be expected to do could include: • Writing efficient, readable, and easy to maintain code • Designing new microservices or systems • Collaborating with teammates and other departments to reach consensus on proposed solutions • Coordinating with product and UX when needed • Responding to customer requests and feedback • When ready, participating in our follow-the-sun OnCall rotation • Participating in team decisions, such as roadmap planning and prioritization

Benefits

• 100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose. • 100% Remote, Global Culture - • Scaling Organization – Tackle meaningful work in a high-growth, ever-evolving environment. • Scaling Organization • Transparent Communication – Expect open decision-making and regular company-wide updates. • Transparent Communication • Innovation-Driven – Autonomy and support to ship great work and try new things. • Innovation-Driven • Open Source Roots – Built on community-driven values that shape how we work. • Open Source Roots • Empowered Teams – High trust, low ego culture that values outcomes over optics. • Career Growth Pathways – Defined opportunities to grow and develop your career. • Career Growth Pathways • Approachable Leadership – Transparent execs who are involved, visible, and human. • Approachable Leadership • Passionate People – Join a team of smart, supportive folks who care deeply about what they do. • Passionate People • In-Person onboarding - We want you to thrive from day 1 with your fellow new ‘Grafanistas’ to learn all about what we do and how we do it. • In-Person onboarding • Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable. • Balance is Key

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X