Infrastructure Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 4+ years of experience in infrastructure, DevOps, or platform engineering • Strong proficiency with AWS services and architecture patterns • Hands-on experience with Terraform for infrastructure as code • Solid understanding of Kubernetes concepts: deployments, services, ingress, RBAC, Helm charts • Experience with containerization with Docker and container registries • Proficiency in TypeScript and Bash scripting • Familiarity with CI/CD tools (GitHub Actions, GitLab CI, ArgoCD) • Understanding of networking fundamentals (DNS, load balancing, firewalls, VPNs) • Strong troubleshooting and problem-solving skills • Good communication skills and ability to work collaboratively • Knowledge of GitOps practices and tools (ArgoCD, Flux) • Familiarity with observability stacks (Prometheus, Grafana, Datadog, ELK) • AWS certifications (Solutions Architect, DevOps Engineer) • CKA/CKAD certification • Experience with service mesh (Istio, Linkerd) • Background in cost optimization and FinOps practices
Responsibilities
• Design, implement, and manage AWS cloud infrastructure including VPCs, EC2, RDS, S3, IAM, Lambda, and other services • Develop and maintain Terraform modules to provision and manage infrastructure in a repeatable, version-controlled manner • Deploy, manage, and optimize Kubernetes clusters (EKS) including workload orchestration, autoscaling, and resource management • Build and improve continuous integration and deployment pipelines to enable fast, safe releases • Implement logging, monitoring, and alerting solutions to ensure system health and rapid incident response • Apply infrastructure security best practices, manage secrets, and ensure compliance with security policies • Write scripts and tools to automate operational tasks and improve developer experience • Create and maintain runbooks, architecture diagrams, and technical documentation • Partner with development teams to improve platform reliability and streamline deployments • Participate in an on-call rotation schedule to respond to production incidents and ensure system reliability