roboflow - Infrastructure Engineer

NY, SF or Remote+ Equity1mo ago

Remote NA Health Insurance Insurance Infrastructure Engineer CTO MLOps Kubernetes AWS Node.js GCP

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Primarily, you like to make great things with passionate colleagues. You are someone that likes to own outcomes, not only inputs. You’re motivated by having responsibility and accountability. You’re eager to ‘do the work,’ big and small. • You’re curious and learning about new technologies, perhaps an early tinkerer with MLOps products. You show more than you tell. • You’re motivated by the question, “How can I improve this?” and have a track record of doing so, even in ways adjacent to your role. Much of our current team is made up of former founders and thrive in the level of autonomy at Roboflow. Maybe you had a side hustle in high school or college. • Many Roboflowers have used our tools before joining. One of the best ways to stand out amongst other applicants is to write about something you have built with Roboflow or contribute to one of ouropen source projects https://roboflow.com/open-source.Likewise we highly value users with meaningful contributions to successful open source devtool and security projects. • We're looking for a versatile engineer excited by high-impact challenges. At Roboflow, we are AI-native: we expect our team to use AI to accelerate everything from writing code and fixing bugs to analyzing security, cost, and performance. Experience in some or all of the following areas will be crucial: • Production experience with Kubernetes: Building and managing containerized applications at scale. • Infrastructure-as-Code (IaC): Using Terraform, Helm charts, bash scripting, and Python to automate everything. • Scale & Site Reliability: Operating, monitoring, and scaling large-scale applications (especially in ML/AI) in AWS and/or GCP. • Development Skills: Proficiency in Node.js and Python, with the ability to collaborate with full-stack developers on designing and operating SaaS applications. • ML/Big Data Ops: Hands-on experience with the infrastructure required for machine learning at scale (GPUs, Docker, Kubernetes) and familiarity with libraries like PyTorch or Tensorflow. • CI/CD Automation: Experience with tools like GitHub Actions or Spacelift to build and deploy code efficiently. • Pragmatic Security: Awareness of security best practices for cloud operations and how they can be applied to startup environments. • AI-Native Engineering: Leveraging LLMs and AI tools to accelerate the development lifecycle—from writing and refactoring code to identifying security vulnerabilities and optimizing infrastructure costs. • A GLIMPSE OF YOUR WORK • No two days will be the same. Your tasks will be a blend of strategic projects and hands-on implementation. Examples include: • Running and optimizing a high-availability machine learning inference service. • Collaborating with customer security teams to ensure secure integration. • Developing creative IaC solutions to scale our platform cost-effectively. • Working with the engineering team to define SLOs/SLAs and participating in incident response. • Improving the Observability and Alerting stack and the processes built around it. • Diving deep into our stack to identify and act on cost-optimization opportunities. • Contributing code (Python, JavaScript, etc.) as part of a team designing and deploying new product features. • Fixing security vulnerabilities and bugs • Hardening our systems and processes to meet SOC 2, HIPAA, and GDPR requirements, making us audit-ready. • Participating in an on-call rotation to ensure platform reliability.

Responsibilities

• As a member of our infrastructure team, you'll be at the heart of a fast-paced startup environment. Your primary focus will be on striking the right balance between rapid delivery, high reliability, and robust security. This isn't a traditional, siloed role; you'll need to wear many hats—acting as an infrastructure engineer one moment, and a developer, or even a security analyst. • You will be securing, scaling, and maintaining the core infrastructure that powers our product. This includes our cloud architecture, databases, file storage, search clusters, microservices, and machine learning pipelines. You'll work closely with our product team and collaborate across the company on product, operations, and customer-facing projects, constantly context-switching to solve the next critical challenge. • Learn all about computer vision, our product, company, customers, and vision. • Ship something substantial to an end user • Start learning our infrastructure and security practices. • Onboard in person with your manager • Build your first computer vision project with Roboflow (if you haven't already) • Start contributing to infra-as-code • Start working with customers to help with their security questions and onboarding • Understand the architecture of Roboflow • Attend your first all company onsite https://blog.roboflow.com/remote-company-onsite-ideas/ • Be ramped up on other relevant parts of the Roboflow product. • WHO YOU'LL BE WORKING WITH • Our team of ~100 attracts talent like executives that wanted to return to building, founders with a 100M+ exit, Roboflow users turned team members, open source contributors, a cyclist who biked across the United States, prolific high school hackers, a CTO from 100+ engineering organization, amongst many exceptional others. • You will directly be working with our Engineering Lead and a team of product, infrastructure and security engineers. • WHERE YOU'LL WORK • Roboflow is distributed across the US and Europe. We currently have Hubs in New York City and San Francisco (and plan to open more as we grow density in new cities). We provide opportunities (like team on-sites in different cities) and resources https://blog.roboflow.com/how-we-work-together-at-roboflow/ (like a $4000/yr travel stipend) to work in person with other team members as much as you'd like, while also supporting remote team members. You can work from one of our Hubs (we offer a relocation bonus), work from home, work at co-working spaces, etc. We want you to work where you work best! • WHEN YOU'LL WORK • Roboflow primarily operates during the daytime hours in the US and there are some synchronous meetings you’ll be expected to attend each week. Apart from that, we have a flexible schedule that allows you to work collaboratively with other team members and asynchronously when needed. • What You'll Receive • 💰 The target compensation for this role is USD $165,000 base - $200,000 base. • 📈 In addition to our cash compensation, we offer generous perks and benefits. Below are some of the highlights: • $4000/yr Travel Stipend to travel anywhere anytime to work alongside other Roboflowers • $350/mo Productivity stipend to spend on things that make your work environment more productive, like high-speed internet at home or a co-working space • Cover up to 100% of your health insurance costs for you and your partner or family • Equity in the company so we are all invested in the future of computer vision • INTERVIEW PROCESS (~5 HOURS) • Below is the interview process you can expect for this role. We are all motivated to work with an exceptional team and don't currently have in-house recruiters. You will be speaking directly with our team about what it's like to work and thrive at Roboflow. We like to be decisive and work fast, so don't be surprised if all the below conversations happen over a day or two. • Before the Interview: • We’ll review your application, LinkedIn, Github, etc. • The best way to stand out is to write about something you’ve built with Roboflow or contribute to one of our open source projects https://roboflow.com/open-source, or highlight your contributions to devtools/infrastructure/security engineering open-source projects. • We may send you a technical screen if applicable. • Introduction Phase: • [45m] Meet with hiring manager for introduction, Sachin Agarwal, to assess overall mindset and skillset. This first interview is a time to get to know more about the role, allow us to get to know you better, and ensure it's a good fit for both parties to continue moving forward in the process • Team Interview Phase: • [45m] Meet with our CTO, Brad Dwyer • [90m] Meet with hiring manager and team for a technical infrastructure hands-on interview • Final Interview Stage: • [45m] Meet with Kate Wagner, Head of Operations for a culture discussion • [60m] Meet with Joseph Nelson, CEO • We check references and conduct a background check • Note: you are welcome to request additional conversations with anyone you would like to meet and we will accommodate as best we can.

Benefits

• Tell us about a time you were responsible for improving the scalability or reliability of a production system. What specific problem were you solving, what changes did you make, and how did you measure the impact?

Similar Jobs

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities