Infrastructure Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Hands-on experience managing production AWS environments. • Strong experience with Infrastructure as Code (Terraform and/or Pulumi). • Proven experience designing and maintaining high-availability and fault-tolerant systems. • Solid understanding of cloud security, data privacy, and compliance principles (SOC 2 required; FedRAMP or similar a plus). • Experience operating high-throughput systems in an enterprise or scale-up environment. • Practical experience with monitoring, logging, and observability in distributed systems. • Familiarity with PostgreSQL and comfort working with application code (Node.js, TypeScript, Python) when required. • Experience with AWS services supporting AI/ML development and inference, such as managed model hosting, GPU-backed workloads, vector storage, or large-scale data processing pipelines. • Experience with automated testing tools, with a strong emphasis on load and performance testing in cloud environments.
Responsibilities
• Designing, implementing, testing, documenting, maintaining, monitoring, troubleshooting, optimizing infrastructure to ensure reliability, availability and scalability of IT systems. • Collaborating with cross-functional teams including developers, operations staff, security professionals, business analysts, etc., for the design and implementation of new or modified applications/systems that meet organizational goals while ensuring high performance, efficiency, reliability, availability, scalability, compliance. • Managing infrastructure lifecycle from planning to decommissioning with a focus on cost optimization without compromising service quality. • Implementing and maintaining disaster recovery plans for critical systems/applications in the event of system failure or data loss. • Monitoring IT environment using various tools, identifying performance bottlenecks, security vulnerabilities, etc., to ensure optimal operation of infrastructure components. • Troubleshooting issues related to hardware and software failures promptly with minimal downtime for users/systems affected by the issue. • Optimizing resource utilization (CPU, memory, storage) across IT environment while ensuring performance requirements are met without overprovisioning resources leading to unnecessary costs. • Ensuring compliance with industry standards and regulations related to data security, privacy, etc., during infrastructure design/implementation process.
Benefits
• Equity options mentioned as part of compensation: "Equity Options." • Paid Time Off (PTO) is included with benefits such as vacation time and holidays, along with sick leave days; however, specific numbers are not given but rather described qualitatively in the job posting. • Insurance coverage for medical, dental, vision, life insurance plans offered to employees or their dependents (though no explicit details provided). • Perks such as a company car and free meals/snacks during work hours are mentioned: "Company Car" and "Free Meals & Snacks." • Remote work options available for the role.