vultr - Senior Manager, Platform, Lifecycle, & Troubleshooting

Remote - United States$120k - $140k+ Equity1mo ago

Remote Senior NA Cloud Computing Materials Senior Community Manager Senior Advisor Bash Python Ansible Documentation Vultr

Requirements

• This is your opportunity to join our fast-growing team and leave your mark on Vultr and the future of Cloud Infrastructure. You will lead the team responsible for keeping thousands of production servers running reliably, owning complex platform troubleshooting, large-scale migrations (including OS/distribution changes), and post-onboard lifecycle — directly contributing to Vultr’s uptime, performance leadership in GPUs and bare metal, and ability to support demanding AI and enterprise workloads. • 8+ years of experience in Linux systems administration, platform engineering, or SRE-style operations in cloud or large-scale infrastructure environments. • Deep expertise in troubleshooting GPU, storage, RDMA, and high-performance networking issues. • Proven track record leading technical teams, including on-call rotations and complex migrations. • Strong scripting/automation skills (Python, Bash, Ansible, etc.) and experience with monitoring tools. • Excellent problem-solving, documentation, and cross-team communication abilities. • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

Responsibilities

• Lead the Platform, Lifecycle & Troubleshooting team in resolving complex incidents and platform issues. • Own server repurposing, migrations (e.g., OS/distribution upgrades), and deeper lifecycle management. • Perform and guide advanced troubleshooting for RDMA links, GPU, storage, and server-side networking. • Validate firmware choices and handle complex/ongoing firmware updates. • Provide 24/7 on-call leadership and drive incident response improvements. • Develop runbooks, automation, and self-healing processes to reduce toil and improve MTTR. • Collaborate closely with Hardware and Onboarding teams on handoffs and mixed tickets. • Partner with Engineering, Networking, and Solutions teams on technical escalations and improvements. • Mentor senior engineers and build a high-performing team focused on root-cause analysis. • Track key metrics (uptime, incident trends, migration success) and drive operational maturity.

Benefits

• $120,000 - $140,000 • Final compensation will vary depending on years of experience, background/skill set, location, and applicable laws. • INCLUSION & PRIVACY

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Similar roles

PlayStation Global - Senior Pricing Strategist3mo ago

·United States, Remote - Hybrid·$148k - $148k/year

In Office NA Senior Cloud Computing Data Analytics Senior Advisor B2B AWS Azure GCP Python

TrueML - Senior Manager, DevOps3mo ago

·San Francisco, CA·$150k - $220k/year

Remote NA Senior Cloud Computing Senior Community Manager Senior DevOps Engineer Go Bash Python Team Management Goal Setting

GuidePoint Security - Senior Security Consultant- Threat & Attack Simulation- Remote (Anywhere in the U.S.)5mo ago

·Hybrid - USA *

In Office NA Senior Cybersecurity Cloud Computing Senior Advisor Security Engineer Bash C#Go Python Team Leadership

Browse more by category