• Design, implement, and operate cloud-native infrastructure across GCP, AWS, or Azure using Terraform.
• Take full ownership of MongoDB Atlas in production, including:
• Cluster architecture and scaling
• Replication and high availability
• Backup and disaster recovery strategies
• Performance tuning and query optimisation
• Security and access control
• Architect and manage containerised and serverless workloads (e.g., Cloud Run, ECS, Kubernetes, or equivalents).
• Design and operate event-driven systems (e.g., Pub/Sub, SQS/SNS, EventBridge, or equivalents).
• Build and maintain CI/CD pipelines with a strong focus on automation, reliability, and scalability.
• Develop reusable Infrastructure as Code (Terraform) modules and manage multi-environment setups.
• Collaborate with engineering teams on system architecture, scalability, and performance optimisation.
• Implement robust monitoring, alerting, and observability across distributed systems.
• Lead incident response and root cause analysis, driving long-term improvements.
• Own infrastructure decisions end-to-end, including architecture, cost optimisation, and performance.
• Document systems, create runbooks, and establish best practices.
• Mentor engineers and promote DevOps best practices across the organisation.