Telnyx - Senior Infrastructure Engineer (Bare Metal)
Requirements
• Bachelor's or Master's degree in Computer Science, Engineering, or a related field. • 3–5 years of production infrastructure engineering experience. • Strong Kubernetes production experience, preferably on bare metal environments. • Experience developing Kubernetes Operators, controllers, or other Kubernetes-native automation components. • Strong programming and software engineering experience focused on infrastructure automation, reconciliation loops, and idempotent systems management. • Experience building distributed systems, infrastructure automation, or platform engineering tooling. • Experience deploying and operating Rook-managed Ceph clusters and high-performance NVMe-backed storage platforms in production Kubernetes environments. • Strong Linux systems administration and Linux kernel troubleshooting knowledge. • In-depth knowledge of Linux networking and distributed systems.Experience with container-native virtualization platforms such as KubeVirt. • Experience with SDN technologies, WireGuard, and container networking technologies including Calico, Flannel, and/or Cilium eBPF. • Experience implementing Kubernetes network policies and workload isolation strategies. • Experience deploying and managing GPU-enabled infrastructure and associated drivers/operators in Kubernetes environments. • Strong understanding of high-performance networking technologies including RoCE, InfiniBand, Mellanox SR-IOV, and virtual function (VF) networking. • Strong problem-solving and troubleshooting skills. • Experience with NVIDIA datacenter GPUs including H200 and B200/B300 platforms. • Experience with AMD MI300 series accelerators. • Familiarity with NVLink and GPU interconnect technologies. • Experience with NVMe-oF (NVMe over Fabrics) architectures and high-performance storage networking. • Experience with VXLAN, software-defined networking, SR-IOV technologies, and advanced NIC offloading. • Familiarity with the BGP protocol and configuring FRR or Bird. • Experience contributing to open-source infrastructure or Kubernetes ecosystem projects. • Experience designing or operating AI/HPC infrastructure platforms. • Familiarity with eBPF-based observability, networking, or security tooling. • Experience with GitOps workflows and Kubernetes operational models. • #LI-Brazil#LI-ARGENTINA
Responsibilities
• Design, deploy, and manage highly available, scalable, and secure infrastructure solutions, including Kubernetes on bare metal and Rook-managed Ceph storage platforms. • Design and maintain Kubernetes and Rook/Ceph platforms for engineering team consumption. • Deploy and operate GPU-accelerated infrastructure for AI and high-performance compute workloads using NVIDIA and AMD datacenter GPUs, including H200, B200/B300, and AMD MI300 series hardware. • Architect and maintain high-performance networking stacks leveraging RoCE, InfiniBand, NVLink, Mellanox SR-IOV, virtual functions (VFs), and advanced NIC technologies. • Design and operate high-performance storage architectures leveraging NVMe-oF (NVMe over Fabrics) technologies for low-latency distributed storage workloads. • Develop and operate storage infrastructure using Rook for Ceph lifecycle management within Kubernetes environments. • Build and maintain Kubernetes-native infrastructure platforms using KubeVirt, software-defined networking (SDN), WireGuard, and container networking technologies such as Calico, Flannel, and Cilium with eBPF. • Design and implement Kubernetes network policies to isolate workloads, secure east-west traffic, and uphold infrastructure security best practices. • Develop Kubernetes Operators, controllers, and automation services for infrastructure lifecycle management and platform orchestration. • Contribute to infrastructure software engineering efforts focused on infrastructure reconciliation, idempotent automation, and declarative systems management. • Develop internal tooling, APIs, and automation frameworks to support large-scale bare metal and AI infrastructure operations. • Manage Linux kernel-level performance tuning, hardware enablement, and low-level systems troubleshooting. • Deploy and maintain GPU, networking, and hardware drivers using Kubernetes Operators and containerized lifecycle management techniques. • Evaluate and recommend new technologies and tools to improve the efficiency, performance, and scalability of infrastructure platforms. • Ensure the reliability, performance, and scalability of our edge data centers. • Troubleshoot and resolve complex infrastructure issues across compute, networking, storage, and Kubernetes layers. • Participate in architecture design, technical planning, and documentation for new infrastructure initiatives.
Apply in one click
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT