Spotify - Site Reliability Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Orchestrate the Fleet: Maintain and improve Portal’s SaaS infrastructure for reliability, security, and scalability. This covers the runtime environments supporting the platform and workflows powered by large language models. • Modern Infra-as-Code: Collaborate with senior engineers to build infrastructure on GCP and AWS using Terraform and emerging infrastructure-from-code patterns where agents assist in defining the stack. • Support Fullstack Systems: Operate in a modern web stack environment (TypeScript, React, Python). While this isn’t a frontend-heavy role, comfort with debugging fullstack systems and web infrastructure is key. • Reliability Engineering: Participate in on-call rotations to ensure systems meet reliability and availability goals, employing AI assistants to accelerate root cause analysis and incident resolution. • Collaborate & Innovate: Participate in the planning and delivery of technical projects, defining how infrastructure evolves to support the next wave of generative AI features. • Cloud Native & AI Curious: Brings hands-on experience with cloud infrastructure (GCP or AWS) and IaC tools like Terraform, with an interest in LLMs, RAG, or agents in an operational context. • Systems Thinker: Understands distributed systems principles and how to operate them reliably at scale, specifically addressing the challenges posed by non-deterministic AI workloads. • Polyglot Practitioner: Experienced with at least one modern programming language (e.g., TypeScript, Java, Go, Python) and comfortable navigating codebases where AI-generated PRs are the norm. • Quality & Automation: Prioritizes code quality and reliability, looking for ways to build systems that test themselves and improve through automated feedback loops. • Growth Mindset: Eager to evolve as an engineer in a landscape where the definition of "operations" changes rapidly. Familiarity with open-source projects or building "coding assistant" bots is a plus. • This role is based in NYC • We offer you the flexibility to work where you work best! There will be some in person meetings, but still allows for flexibility to work from home.
Responsibilities
• Orchestrate the Fleet: Maintain and improve Portal’s SaaS infrastructure for reliability, security, and scalability. This covers the runtime environments supporting the platform and workflows powered by large language models. • Modern Infra-as-Code: Collaborate with senior engineers to build infrastructure on GCP and AWS using Terraform and emerging infrastructure-from-code patterns where agents assist in defining the stack. • Support Fullstack Systems: Operate in a modern web stack environment (TypeScript, React, Python). While this isn’t a frontend-heavy role, comfort with debugging fullstack systems and web infrastructure is key. • Reliability Engineering: Participate in on-call rotations to ensure systems meet reliability and availability goals, employing AI assistants to accelerate root cause analysis and incident resolution. • Collaborate & Innovate: Participate in the planning and delivery of technical projects, defining how infrastructure evolves to support the next wave of generative AI features.
No credit card. Takes 10 seconds.