Trase Systems - Principal Software Engineer (Platform Architecture & Execution Model)
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 12-15+ years of experience building distributed/platform systems, including significant experience defining architecture across teams or domains • 10+ years owning mission-critical runtimes or workflow/orchestration systems • Deep expertise with durable execution (e.g., state machines, event sourcing, saga/compensation, idempotency, exactly/at-least-once semantics) • Proven track record with security & governance in production systems (auth, RBAC, audit, policy) • Hands-on with observability (Grafana or equivalent), including trace correlation across async boundaries • Strong systems design across storage, queues, schedulers, and evented architectures; performance tuning under load • Excellence in a modern language (e.g., Go, Rust, Java, or TypeScript) and cloud-native stacks (containers, CI/CD, IaC) • Comfortable operating in regulated or high-assurance environments; bias toward correctness, clarity, and documentation • Proven ability to influence technical direction across an organization and drive adoption of architectural standards • Ability to incorporate advance LLM capabilities into system design and platform architecture decisions where appropriate • Prior work on workflow engines (Temporal/Cadence/AWS Step Functions, Argo, Airflow) or serverless runtimes • Experience with policy engines (OPA), secrets/KMS, or data-handling controls (PII/PHI) • ML/LLM evaluation frameworks, tool/plugin architectures, or embedding model governance into execution • Government or healthcare experience (HIPAA, audit readiness) and multi-tenant isolation • Salary Range: $240,000-290,000. This represents the typical salary range for this position based on experience, skills, and other factors.
Responsibilities
• Architect & lead the core execution model (state machine, lifecycle, resource model, failure semantics) • Design platform APIs/SDKs connecting workflows, agents, tools, and product surfaces; drive versioning & compatibility • Guarantee correctness via idempotency, deterministic replays, compensating actions, and data integrity • Engineer reliability at scale: concurrency controls, rate limits, backpressure, sharding/partitioning, and workload isolation • Build security & governance into the core: RBAC/ABAC, policy enforcement, fine-grained audit & lineage • Deliver observability: distributed tracing, structured logs, metrics, and evaluation hooks; build an “explainable trail” of agent actions • Own quality: design reviews, test strategy (unit, property, chaos), performance baselines, SLOs, incident response, and postmortems • Mentor & unblock senior engineers; partner with Product, Security, and Customer teams to translate requirements into durable primitives • Make pragmatic choices on storage, queueing, and compute; create paved roads that accelerate all other teams • Define system boundaries and reduce cross-service coupling through clear architectural patterns • Drive platform-wide standards for correctness, reliability, and API design across teams • Balance short-term delivery with long-term architectural integrity, ensuring the platform evolves without accumulating systemic risk • Principal-level Technical Leadership • Define and drive the long-term technical architecture of Trase OS across teams and domains • Influence company-wide technical direction for platform and product systems • Lead cross-team initiatives that shape how workflows, agents, and platform primitives are built and evolve • Partner with leadership to align technical architecture with product and business strategy • Mentor senior and staff engineers and raise the bar for system design and architectural thinking
Benefits
• Trase OS is an orchestration-heavy system coordinating long-lived workflows, agents, and tools across multiple services and environments. • As the platform evolves, the primary risks shift from implementation to system design quality: • Poor abstractions create tight coupling across services • Workflow execution becomes difficult to reason about under failure • Platform capabilities fragment instead of becoming reusable primitives • Scaling introduces complexity instead of leverage • This role exists to: • Define clean, durable abstractions for the platform execution model • Ensure correctness and determinism in workflow execution • Translate evolving product requirements into coherent platform architecture • Enable teams to build on Trase OS without introducing systemic complexity • What Makes This Role Hard • You are designing systems where failure is the norm, not the exception, and correctness must be preserved across retries, restarts, and partial execution • You must balance clean abstractions with real-world constraints (performance, security, multi-tenant environments) • Decisions made here become foundational primitives used across all products and teams • The system must remain understandable and auditable, even as complexity and scale increase • For full-time roles only • Career track opportunity with potential for rapid advancement with strong performance as the firm grows • 100% employer paid, comprehensive health care including medical, dental, and vision for you and your family. • Paid maternity and paternity for 14 weeks at employees' normal pay. • Unlimited PTO, with management approval. • Opportunities for professional development and continued learning. • Optional 401K, FSA, and equity incentives available. • Mental health benefits are available through Tara Mind.
Similar Jobs
No credit card. Takes 10 seconds.