Abnormal - Staff Software Engineer Platform Infrastructure

Unknown - USA *$210k - $210k+ Equity1mo ago

In Office Staff NA Cloud Computing Data Analytics Staff Engineer Performance Management Kubernetes Kafka PostgreSQL Redis

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Tackles complex, ambiguous problems and turns them into actionable plans. • Leads by example and dives deep when needed. • Embodies our VOICE values and builds software that delights customers. • Earns trust across Engineering, Product, and Design through thoughtful collaboration. • Team mission: Build and evolve the core infrastructure—compute, orchestration, and data platform—that powers Abnormal’s AI/ML products at scale. We treat platforms as products: usable, reliable, secure, and cost-efficient. • Team mission: • Proven experience building and scaling data-intensive, distributed backend systems in high-growth environments. • data-intensive, distributed backend systems • 5+ years as a Senior/Staff engineer building platforms, tools, or infrastructure that materially increase engineering velocity and reliability. • 5+ years • platforms, tools, or infrastructure • A strong track record as a change agent—reshaping infra strategy and shipping impactful, self-service platform offerings in startup settings. • change agent • self-service platform • Depth in at least two of the following three areas: • Compute (e.g., EC2, autoscaling, container runtimes, networking, security hardening) • Compute • Orchestration (e.g., Kubernetes/EKS, controllers/operators, scheduling, policies, multi-cluster) • Orchestration • Data Platform (e.g., Kafka/Kinesis/SQS; Spark/Databricks/DBT/Airflow; S3; PostgreSQL/MySQL; DynamoDB/RocksDB/Redis/OpenSearch; data governance/quality/lineage) • Data Platform • Hands-on with our stack (or equivalent): Python, Golang, Terraform/Terragrunt, PostgreSQL, Kafka, Redis, OpenSearch, AWS, Kubernetes.Strong IaC, observability, and SRE fundamentals (SLOs, error budgets, incident management, postmortems, capacity planning). • Hands-on with our stack • Experience building multi-tenant or regulated (e.g., FedRAMP-like) platforms, isolation boundaries, and guardrails. • multi-tenant • regulated • Background with feature stores, offline/online consistency, model serving, and evaluation/feedback loops. • feature stores, offline/online consistency • Prior leadership of cross-org migrations (e.g., to Kubernetes, event-driven architectures, or a unified data platform). • How we work • How we work • Product mindset: platform as a product with clear APIs, docs, SLAs, and adoption metrics. • Product mindset: • Automation first: paved paths and golden configs over bespoke snowflakes. • Automation first: • Measured outcomes: reliability, latency, cost, and developer experience over vanity metrics. • Measured outcomes: • Actual compensation will be determined based on several non-discriminatory factors including skills, experience, qualifications, and geographic location.In addition to base salary, this role may be eligible for bonus or incentive compensation, equity, and a comprehensive benefits package.

Responsibilities

• Shape the core areas of Platform Infrastructure such as compute (EC2/EKS, autoscaling, container runtime) and orchestration (Kubernetes, workload APIs, multi-cluster, policy/quotas), as well as data platform (streaming, batch, durable storage, data tooling)—with demonstrated depth in at least two of these. • Shape the core areas of Platform Infrastructure • compute (EC2/EKS, autoscaling, container runtime) • orchestration (Kubernetes, workload APIs, multi-cluster, policy/quotas) • data platform (streaming, batch, durable storage, data tooling) • Design and drive platform architecture & roadmap to support Abnormal’s expanding AI/ML portfolio—scaling seamlessly across services, tenants, and regions. • Design and drive platform architecture & roadmap • Partner deeply with product & ML workflows to make pragmatic trade-offs, accelerating our shift to a platform-first operating model and enabling self-service. • Partner deeply with product & ML workflows • platform-first • Raise the bar on operational excellence (SLOs, availability, performance, incident response, change management, on-call hygiene) and help teams consistently meet it. • Raise the bar on operational excellence • Act as the team’s technical lead: define quarterly roadmaps, de-risk delivery, mentor engineers, and land high-leverage, cross-team initiatives. • Act as the team’s technical lead: • Champion AI-native software development, guiding teams on architecture, data gravity, feature stores, model/service interfaces, and evaluation pipelines. • Champion AI-native software development, • Own cost-conscious engineering, optimizing design and operations to balance performance, reliability, and spend (capacity planning, right-sizing, caching, storage tiers). • Own cost-conscious engineering, • Instill strong platform product practices: crisp APIs, great docs, clear SLAs/SLOs, telemetry by default, and paved paths that increase developer velocity. • Instill strong platform product practices:

Benefits

• $210,400—$302,500 USD

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities