wagey.ggwagey.gg
Open Tech JobsCompaniesPricing
Log InGet Started Free
Jobs/Training Development Jobs/Software Engineer, Platform Systems

Software Engineer, Platform Systems

openaiLondon, Greater London, United Kingdom1mo ago
In OfficeEMEAEducationArtificial IntelligenceSoftware EngineerPlatform EngineerTraining Development

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Skills needed: Designing distributed failure detection, tracing, and profiling systems; developing tools for identifying slow/faulty nodes. • Years of experience: Not explicitly stated. • Education: Systems engineering background preferred (Bachelor's degree or equivalent). • Certifications: None mentioned. • Must-haves: Experience with hardware, operating systems, networking, concurrency and distributed systems; understanding high-performance computing is a plus.

Responsibilities

• Design and build distributed failure detection, tracing, and profiling systems for large-scale AI training jobs • Develop tooling to identify slow, faulty, or misbehaving nodes and provide actionable visibility into system behavior • Improve observability, reliability, and performance across OpenAI’s training platform • Debug and resolve issues in complex, high-throughput distributed systems • Collaborate with systems, infrastructure, and research teams to evolve platform capabilities • Extend and adapt failure detection systems or tracing systems to support new training paradigms and workloads • Care deeply about performance, stability, and observability in distributed systems • Enjoy finding and fixing issues in large-scale systems and automating operational workflows • Have experience writing low-level software where system details matter • Understand hardware, operating systems, networking, concurrency, and distributed systems • Have a background in high-performance computing or low-level systems engineering • Are excited to work on critical infrastructure that powers frontier AI research

Benefits

• Equity options mentioned as part of compensation package. • Paid time off (PTO) is included in benefits. • Insurance coverage provided to employees. • Remote work option available for this role at OpenAI's headquarters or a nearby office location.

Similar Jobs

Engineering Manager at Clutch1h ago
ClutchClutch·Remote - Europe *·$155k – $155k/year + Equity
RemoteEMEAStaffSoftwareEngineering ManagerSoftware EngineerCoaching
Senior Software Engineer1h ago
AlpacaAlpaca·Remote - Americas·Equity
RemoteNASeniorCloud ComputingSoftware EngineerSenior Software EngineerC++GoZigRustJavaScalaPythonRisk ManagementGCPDockerKubernetesBrexAWSHeroku
Software Engineer-Test Frameworks & Automation1h ago
EXANTEEXANTE·Remote - Almaty, Almaty, Kazakhstan; Armenia; Georgia - Remote; Uzbekistan·$146k – $146k/year
RemoteAPACSoftware EngineerAutomation EngineerPythonReporting
Principal Software Engineer - Engineering Applications1h ago
PhysicsXPhysicsX·London - Hybrid·Equity
In OfficeEMEAPrincipalDiagnosticsCloud ComputingPrincipal EngineerSoftware EngineerPythonGovernanceAzureMentoringPlane
Senior Engineer (Full Remote1h ago
EverAIEverAI·Remote - Ireland·$31k – $31k/year
RemoteEMEASeniorArtificial IntelligenceData AnalyticsSoftware EngineerSenior Software EngineerProject PlanningBack-endMLOpsFull StackReporting

Stop filling. Start chilling.Start chilling.

Get Started Free

No credit card. Takes 10 seconds.

© 2026 Dominic Morris. All rights reserved.·Privacy·Terms·