wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Principal Engineer Role(200)/Cerebras Systems (8) - Principal Engineer, AI Inference Reliability
Cerebras Systems

Cerebras Systems - Principal Engineer, AI Inference Reliability

Remote - California, United States; Sunnyvale CA or Toronto Canada4mo ago
RemotePrincipalNASemiconductorsPrincipal EngineerAI EngineerC++GoRustPython

Requirements

• Bachelor's or master's degree in computer science or related field. • 7+ years of experience in backend, infrastructure, or reliability engineering for large-scale distributed systems. • Strong programming skills in at least one popular backend programming language such as Python, C++, Go, or Rust. • Deep and hard-earned experience of reliability principles: SLO/SLI/SLA design, incident response, and postmortem culture. • Excellent communication and cross-functional leadership skills. • Bonus: prior experience building large-scale AI infrastructure systems. • This offer is contingent upon Cerebras successfully obtaining an export license from the U.S. Department of Commerce’s Bureau of Industry and Security authorizing the release to you of certain software source code and/or technology that is subject to the Export Administration Regulations. However, we can make no assurances with respect to the final disposition of an export license application.

Responsibilities

• Define and drive reliability strategy: establish SLOs and ensure alignment across engineering. • Design and implement reliability mechanisms: build and evolve systems for fault detection, graceful degradation, failover, throttling, and recovery across multiple regions and data centers. • Lead large-scale incident management: own postmortems, root-cause analysis, and prevention loops for reliability-related incidents. • Architect for reliability and observability: influence system design for redundancy, durability, and debuggability. • Develop reliability tooling: create internal tools and frameworks for chaos testing, load simulation, and distributed fault injection. • Collaborate broadly: work across software, infrastructure, and hardware teams to ensure reliability is embedded into every layer of our inference service. • Monitor and communicate reliability metrics: build dashboards and alerts that measure service health and provide actionable insights. • Mentor and influence: guide engineers and set best practices for designing, testing, and operating reliable large-scale systems.

Benefits

• People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection  point in our business. Members of our team tell us there are five main reasons they joined Cerebras: • Build a breakthrough AI platform beyond the constraints of the GPU. • Publish and open source their cutting-edge AI research. • Work on one of the fastest AI supercomputers in the world. • Enjoy job stability with startup vitality. • Our simple, non-corporate work culture that respects individual beliefs. • Read our blog: Five Reasons to Join Cerebras in 2026.

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

aledadealedade - Principal Engineer - AI Data and Infrastructure, Remote1mo ago
·Remote - USA
RemoteNAPrincipalDigital HealthCloud ComputingPrincipal EngineerAI EngineerRemote AssistantJavaC#C++GoPython
Articul8Articul8 - Staff Applied AI Researcher - Agentic Reasoning Systems (Brazil)1mo ago
·Dublin, California, United States
In OfficeNAStaffArtificial IntelligenceSemiconductorsAI EngineerTeam LeadershipMentoringPythonGoFull Stack
kAIgentickAIgentic - Principal Engineer, AI Research3w ago
·Bengaluru, Karnataka, India
In OfficeAPACPrincipalArtificial IntelligencePrincipal EngineerAI EngineerGoGovernanceDocumentationPython
eigen-labseigen-labs - Senior Agentic AI Engineer2mo ago
·Remote - Seattle, Washington, United States·$187k - $253k/year + Equity
RemoteNASeniorCryptocurrencyArtificial IntelligenceAI EngineerGoRustPythonTypeScriptAirflow
nethermindnethermind - Complete Robot: Embedded / Edge Engineer - Kiev, Ukraine3w ago
·Remote - Ukraine
RemoteNABankingPaymentsDiagnosticsMobile EngineerAI EngineerC++RustGoPythonLinux
nethermindnethermind - Complete Robot: Embedded / Edge Engineer - Lviv, Ukraine3w ago
·Remote - Lviv, Ukraine
RemoteNABankingPaymentsDiagnosticsMobile EngineerAI EngineerC++RustGoPythonLinux
Gravity WellGravity Well - Principal Engineer (Rendering / Graphics + Core)1mo ago
·Remote - USA·$190k - $210k/year + Equity
RemoteNAPrincipalLife InsuranceInsurancePrincipal EngineerC++
RedisRedis - Principal Engineer – AI Search & Vector Infrastructure3w ago
·Bulgaria - Hybrid
In OfficeEMEAPrincipalPublic SectorPrincipal EngineerAI EngineerC++RustVectorTeam LeadershipPerformance Management
kAIgentickAIgentic - Principal Engineer, AI Infrastructure & Platform4w ago
·Bengaluru, Karnataka, India
In OfficeAPACPrincipalBankingArtificial IntelligencePrincipal EngineerAI EngineerGoGovernanceTeam LeadershipPythonTemporal

Browse more by category

Show 200 morePrincipal EngineerShow 1,044 moreAI EngineerShow 924 moreC++Show 2,085 moreGoShow 732 moreRustShow 6,338 morePython
Privacy·Terms··Contact·FAQ·Wagey on X