Agoda - Senior Lead Engineer (DevOps and Process Optimization/Automation) – Bangkok based, relocation provided
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Expertise in one or more programming languages (e.g., Golang, Python) with a proven ability to design and implement robust, maintainable, and scalable software solutions. • Proven experience designing, implementing, optimizing, and owning the full Software Development Lifecycle (SDLC) for complex, large-scale systems. • Deep theoretical and practical understanding of computer science fundamentals and their application in building and operating distributed systems. • Expertise in containerization and orchestration technologies (e.g., Docker, Kubernetes, Helm), including design, deployment, security, and operational management at scale. • Demonstrated experience leading complex, cross-functional technical projects and initiatives from conception to completion. • Strong ability to troubleshoot and perform root cause analysis on complex distributed systems, networking, and performance issues. • Experience mentoring other engineers and driving the adoption of technical best practices within a team or organization. • Deep hands-on experience with Agile/XP practices (e.g., Scrum, TDD, CI/CD) and driving process improvements. • Excellent communication and collaboration skills, with the ability to articulate complex technical concepts clearly and influence diverse audiences. • Proven track record of designing, building, and operating large-scale, highly available distributed systems. • Hands-on experience designing and implementing Service Mesh and Service Discovery solutions (e.g., Istio, Linkerd, Consul, Envoy) in complex environments. • Extensive experience with configuration management and Infrastructure as Code (IaC) tools (e.g., Ansible, Puppet, Chef, Terraform). • Experience with advanced monitoring, logging, and observability platforms (e.g., Prometheus, Grafana, Datadog, Splunk, ELK stack). • Experience with cloud platforms (GCP, AWS, Azure) and managing cloud resources efficiently. • Please review our Hiring Process Guidelines before your interview — click here to learn how interviewing at Agoda works.
Responsibilities
• Design, lead, and evolve Agoda’s private cloud platform architecture across multiple data centers, with a strong focus on scalability, resilience, performance, and operational excellence. • Lead the operation and continuous improvement of large-scale Kubernetes environments spanning multiple clusters and regions. • Drive architectural decisions around cluster networking, service communication, and platform reliability in high-demand distributed environments. • Contribute directly to the implementation of platform capabilities through hands-on coding, automation, and system design. • Improve continuous delivery and deployment capabilities to enable reliable, low-friction software releases from source control to production. • Support and enhance progressive delivery practices such as canary rollouts, deployment health monitoring, and uptime protection. • Lead the adoption and evolution of service mesh capabilities, including traffic management, service connectivity, and platform observability. • Define and improve secrets and key management practices using platforms such as HashiCorp Vault, AWS SSM, and KMS, ensuring strong security and audit readiness. • HashiCorp Vault • AWS SSM • Drive compliance-aligned infrastructure capabilities to support internal controls, audit requirements, and future financial-grade security expectations. • Lead the design and execution of Agoda’s Platform Identity Access Management (PIAM) initiative, expanding identity and access controls beyond Kubernetes to cover VMs, bare metal infrastructure, laptops, mobile devices, and other platform connections. • Platform Identity Access Management (PIAM) • Partner with security, infrastructure, and engineering stakeholders to gather requirements, define architecture, and develop a roadmap for PIAM implementation. • Solve complex technical challenges across platform engineering, infrastructure automation, distributed systems, security, and reliability. • Set a high technical bar through strong engineering judgment, code quality, system design, and operational excellence. • Capture, document, and share best practices to improve platform consistency, maintainability, and engineering effectiveness. • What You’ll Need to Succeed • Extensive experience in platform engineering, infrastructure engineering, site reliability engineering, or DevOps roles in large-scale production environments. • Proven experience in senior, staff, principal, or equivalent technical leadership roles with significant system ownership and architectural responsibility. • senior, staff, principal, or equivalent technical leadership roles • Strong hands-on expertise with Kubernetes and containerized platforms, including operating multi-cluster or multi-region environments at scale. • Kubernetes • Strong software engineering ability with hands-on coding experience in Golang; proficiency in Python, Bash, or Shell scripting is also highly valuable. • Golang • Python • Shell scripting • Proven ability to design, build, and operate robust platform systems through code, not only through architecture or coordination. • Strong experience designing, building, and operating complex distributed systems with high reliability and performance requirements. • Strong understanding of networking, service-to-service communication, and infrastructure architecture in large distributed environments. • Experience leading and improving continuous delivery / CI/CD capabilities and deployment workflows for engineering organizations. • continuous delivery / CI/CD • Hands-on experience with service mesh technologies such as Istio, and a solid understanding of traffic management and platform observability patterns. • service mesh • Istio • Strong experience with secret management, encryption, and platform security controls, including technologies such as HashiCorp Vault, AWS SSM, and KMS or equivalent solutions. • secret management • encryption • HashiCorp Vault • AWS SSM • Experience working with compliance, audit, or security-sensitive infrastructure requirements. • #Bengaluru #Hyderabad #Pune #Chennai #Gurugram #Singapore #Seattle #SanFrancisco #SanJose #Bellevue #Redmond #Austin #NewYorkCity #Toronto #Vancouver #London #Dublin #Amsterdam #Berlin #Munich #Warsaw #Kraków #Wrocław #Prague #Brno #Barcelona #Madrid #Stockholm #Zurich #TelAviv #Sydney #Melbourne #Tokyo #Osaka #Seoul #Taipei #Hsinchu #Hong Kong #Shenzhen #Shanghai #Beijing #Hangzhou #HoChiMinhCity #Hanoi #KualaLumpur #Cyberjaya #Bangkok #Jakarta #SãoPaulo #MexicoCity #Oakland #SantaClara #Sunnyvale #MountainView #PaloAlto #SanDiego #LosAngeles #Boston #Chicago #Denver #Raleigh #Durham #Portland #Washington #Arlington #Dallas #Plano #Atlanta #Montreal #Ottawa #Waterloo #Calgary #Manchester #Cambridge #Bristol #Edinburgh #Cork #Utrecht #Eindhoven #Frankfurt #Hamburg #Cologne #Gdańsk #Poznań #Lisbon #Porto #Copenhagen #Helsinki #Oslo #Vienna #Budapest #Bucharest #Tallinn #Vilnius #Kaunas #Noida #Mumbai #Kochi #Coimbatore #Canberra • Please review our Hiring Process Guidelines before your interview — click here to learn how interviewing at Agoda works.
No credit card. Takes 10 seconds.