Site Reliability Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• A foundational understanding of Linux systems, processes, and basic networking concepts. • Familiarity with at least one scripting or programming language, such as Python, Bash, or Go. • An interest in site reliability, monitoring, and operating production infrastructure. • Clear written and verbal communication skills, with a willingness to ask questions and learn. • The ability to remain calm, methodical, and responsive during incidents or operational events. • Exposure to cloud platforms such as AWS or GCP. • Familiarity with containerization or orchestration technologies, including Docker or Kubernetes. • Basic understanding of blockchain or Web3 concepts, such as nodes, RPC services, or validators. • Experience with monitoring and observability tools such as Grafana, Prometheus, Datadog, or ELK-based stacks.
Responsibilities
• As a Site Reliability Engineer (SRE) at Polygon Labs, you will play a key role in helping operate and support the production infrastructure that powers the Polygon network. Working alongside experienced SREs and protocol engineers, you will gain hands-on exposure to running large-scale, distributed blockchain systems while learning best practices for reliability, observability, and incident response. • This is an ideal role for someone early in their SRE or infrastructure career who is curious about how production systems work, motivated to learn through real-world operational challenges, and excited to grow within a collaborative and mentorship-driven environment. Your work will directly contribute to the reliability and performance of critical public infrastructure used by developers and users globally. • Monitoring production systems, alerts, dashboards, and logs across Polygon networks, including Polygon PoS and the Agglayer. • Assisting with incident detection, triage, escalation, and resolution under the guidance of senior engineers. • Supporting on-call and operational coverage through structured rotations, with training and mentorship. • Following, maintaining, and improving runbooks and standard operating procedures. • Assisting with routine operational tasks such as service restarts, upgrades, and configuration changes. • Helping maintain and improve monitoring, logging, and alerting systems, including dashboards for network health, RPC performance, and node metrics. • Learning to improve alert signal quality and reduce operational noise. • Supporting cloud-based and containerized infrastructure, including nodes, RPC endpoints, and supporting services. • Collaborating with protocol, product, and cross-functional teams to understand production issues and user impact. • Participating in post-incident reviews and contributing to root-cause analysis documentation. • Continuously building knowledge of blockchain fundamentals, distributed systems, and networking.
Benefits
• The goal of the Polygon Labs total rewards program is to support the health and well-being of you and your family. Our comprehensive compensation plan includes the following benefits for our full time employees: • Remote first global workforce • Industry leading Medical, Dental and Vision health insurance* • Company matching 401k with 3% match* • $1,500 Home Office Set Up Allowance (life-time max) • $75 Monthly internet or phone reimbursement • Flexible Time Off • Company issued laptop • Egg freezing, mental health, and employee wellness benefits • In certain countries medical, dental and vision is fully covered for employees & their dependents. This is country and plan specific. • 401k is for United States employees only
Similar Jobs
No credit card. Takes 10 seconds.