wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Site Reliability Engineer Role(222)/k-ID (11) - Lead Site Reliability Engineer
k-ID

k-ID - Lead Site Reliability Engineer

Singapore+ Equity2mo ago
In OfficeStaffAPACCloud ComputingSite Reliability EngineerGoPythonTypeScriptAWSKubernetes

Requirements

• 7 or more years of experience in site reliability engineering, infrastructure engineering, platform engineering, or software engineering with significant production ownership • Strong experience operating production systems in AWS • Strong hands on experience with Kubernetes, containerized services, and modern infrastructure tooling • Experience building and improving observability across metrics, logs, tracing, alerting, and service health • Deep understanding of distributed systems, service failure modes, traffic management, capacity planning, and recovery design • Experience designing or running incident response programs, on call operations, escalation frameworks, and post incident review processes • Experience leading or managing NOC, production operations, or support functions in a high availability environment • Strong experience with infrastructure as code such as Terraform • Experience improving CI and CD workflows, release safety, rollback practices, and change management • Ability to write code or automation in one or more languages such as Go, Python, or TypeScript • Strong written and verbal communication skills, especially in high pressure operational settings • Experience working in fast moving startup environments is strongly preferreded

Responsibilities

• Own the reliability and operational health of k-ID’s production systems and critical services • Lead the NOC function, including shift structure, escalation paths, incident handling standards, readiness processes, and operational reporting • Act as the senior escalation point for major incidents and serve as incident commander for high severity events when needed • Design and improve monitoring, alerting, and operational tooling so the NOC can detect issues early and respond effectively • Drive root cause analysis and post incident review practices that produce real corrective action rather than superficial summaries • Partner with engineering teams to improve system resilience, deployment safety, service ownership, and production readiness • Identify systemic risks across infrastructure, services, dependencies, and operational processes, then drive plans to reduce them • Improve platform performance, availability, and recovery time through architecture changes, better automation, and stronger operating discipline • Build and maintain runbooks, readiness checklists, service health standards, and escalation playbooks across the organization • Help define service level objectives, operational metrics, and reliability targets that align with business needs • Support and mentor senior NOC engineers and other operations team members, helping raise technical depth and decision quality across the function • Contribute hands on to infrastructure and reliability engineering work where needed, especially in high leverage areas

Benefits

• A competitive startup salary aligned with experience and market benchmarks. • Employee Stock Ownership Plan so you participate directly in the long term upside of the company. • HEALTH AND WELLBEING • Comprehensive family health coverage, including medical, dental, and vision benefits • Provided Mental Health and Wellness support benefit • PROFESSIONAL DEVELOPMENT • Hands on exposure with key clients in a scaling global tech company • Opportunities for continuous learning through real ownership rather than formal training alone. • Direct collaboration with the Founders and the tech leadership team • CULTURE AND WAYS OF WORKING • A collaborative, inclusive and low politics work environment. • Flexible, trust based working culture shaped by a US startup operating model. • A mission driven company focused on improving online experiences for kids and teens globally. • Applicants Privacy Policy https://k-id.com/job-applicants-privacy-notice

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

k-IDk-ID - Senior Site Reliability Engineer2mo ago
·Remote - Singapore
RemoteAPACSeniorCloud ComputingSite Reliability EngineerGoPythonTypeScriptAWSKubernetes
Okta, Inc.Okta, Inc. - Staff Site Reliability Engineer3w ago
·Bengaluru, India
In OfficeAPACStaffCloud ComputingSoftwareSite Reliability EngineerKubernetesTimeline ManagementAWSGCPHelmTerraformPythonGoGoogle GKELinuxAnsible
MegaportMegaport - Senior Site Reliability Engineer1w ago
·Remote - Brisbane, Queensland
RemoteAPACSeniorCloud ComputingMaterialsSite Reliability EngineerLinuxKubernetesAWSBashGoPythonTerraformCassandra
PlaudPlaud - Senior Site Reliability Engineer5mo ago
·Singapore·Equity
In OfficeAPACSeniorCloud ComputingArtificial IntelligenceSite Reliability EngineerJavaGoPythonAWSGCP
deepgramdeepgram - Site Reliability Engineer - AI & ML Infrastructure (Kubernetes, AWS & Terraform)3mo ago
·Remote, California, United States - Hybrid·$150k - $220k/year
In OfficeNAInternCloud ComputingArtificial IntelligenceSite Reliability EngineerGoBashPythonKubernetesAWS
RedditReddit - Staff Site Reliability Engineer - Site Experience1mo ago
·Remote - UK
RemoteEMEAStaffCloud ComputingSite Reliability EngineerGoPythonPerformance ManagementLinuxKubernetes
RedditReddit - Staff Site Reliability Engineer1mo ago
·Dublin, Ireland
In OfficeEMEAStaffCloud ComputingSite Reliability EngineerGoPythonPerformance ManagementLinuxKubernetes
OKXOKX - DevOps / Site Reliability Engineer2w ago
·Singapore
In OfficeAPACMidCloud ComputingArtificial IntelligenceSite Reliability EngineerGoJavaPythonReactAlibaba CloudAWSData VisualizationGovernancePrometheusELKGrafana
Backblaze External WebsiteBackblaze External Website - Site Reliability Engineer II1mo ago
·Remote - Bangalore
RemoteAPACMidCloud ComputingSoftwareSite Reliability EngineerBashGoPythonLinuxDocker

Browse more by category

Show 222 moreSite Reliability EngineerShow 2,075 moreGoShow 6,324 morePythonShow 2,507 moreTypeScriptShow 3,831 moreAWSShow 1,919 moreKubernetes
Privacy·Terms··Contact·FAQ·Wagey on X