knock - Infrastructure Engineer
Responsibilities
• As an early stage company, everyone (including you) is involved in building every part of the company from the product and the infrastructure that it runs on, to how we get work done internally. Here are a collection of hats we need you to be OK with wearing: • 1. Adopting a Terraform-backed EKS cluster, modernizing & maintaining it for elastic scale, reliability, performance, security, etc. • 2. Going deep into troubleshooting Postgres performance, queues of every shape and size, and come out the other side with a plan for scaling another 10x to 100x. • 3. Identifying and correcting scaling issues before they affect our customers by relying on and improving our telemetry and traces in Datadog, AWS Cloudwatch, and Honeycomb. If you see a blind spot, you are comfortable getting into the codebase to fix it. • 4. Maintaining and improve upon our >99.95% uptime track record. • 5. Supporting our product engineering team at moving fast to deliver customer value. Improving the day-to-day developer experience through canaries, faster cycle time, blue/green deploys, etc. • 6. Joining on-call rotations on a schedule with the rest of the engineering team. • This position is both high autonomy and high accountability: you will have a lot of room to work and raise our existing standards, while also communicating those changes and bringing the rest of the team along for the ride, often in the form of runbooks & internal documentation. • 4+ years experience as a DevOps engineer or similar in a startup or mid-sized company working with complex systems that operate at scale. • Experience working in and on production Kubernetes clusters using infrastructure as code (we use Terraform, but others like Pulumi or Cloudformation are fine too). • Experience working on complex AWS deployments (multi-account, complex VPC structure to support EKS, EKS experience). • Experience operating and scaling different database technologies. We use Aurora Postgres, Mongo, and ClickHouse so significant experience with at least one of these is a must. • Some past experience or familiarity operating and scaling different queues and streams across SQS, Kinesis, Kafka or similar. • Strong problem-solving skills with a focus on reliability, scalability, and performance. • Strong communications skills, with the ability to work in a fully distributed, remote-first team. We love to write long-form documents for us, our future selves, and our AI companions. • A NOTE ON AI AT KNOCK • We’re a team that has fully embraced AI tools to help us in our day-to-day. We use these tools to accelerate us, but remain clear-eyed about where they shine and where the pitfalls lie. • As a member of the Knock team, we expect you to be familiar with tools like Cursor, Claude Code, Codex, or similar to assist you in your job. You’ll be allowed to use these tools in some parts of your interview loop, but there will be times where we’ll ask that you refrain.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT