wagey.ggwagey.ggv1.0-e93b95d-4-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Software Engineer Role/Best Egg - Lead Software Engineer II, AI Operations
Best Egg

Best Egg - Lead Software Engineer II, AI Operations

Remote - / Flexible$150k - $170k3mo ago
RemoteStaffWWPaymentsArtificial IntelligenceCloud ComputingSoftware EngineerAI EngineerPythonTeam LeadershipOllamaAWSMetaflow

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• Experience: 5–10 years of professional software engineering (or equivalent) with 2+ years building AI/LLM applications; portfolio of shipped AI projects (links to code, demos, or case studies). • Exploration: Demonstrated passion for relentless exploration of the latest AI models, frameworks, and tooling, ensuring constant adoption of state-of-the-art innovations in the workflow. • LLM product engineering: Hands‑on with some/all of OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi‑turn orchestration, streaming, and prompt/version management. • RAG expertise: Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking), integration with vector database, and measuring retrieval quality. • Full‑stack or equivalent backend depth: Comfortable building APIs/services and simple UIs where needed; strong fundamentals in Python and modern packaging/testing. • DevOps & deployment: CI/CD, containers, cloud fundamentals (AWS), and runtime performance tuning; experience operating services in production. • Platform & orchestration: Metaflow (Outerbounds) preferred; Databricks familiarity is a plus; ability to integrate data/feature pipelines and schedule/operate flows. • Observability & testing for AI: Tracing and logging, expertise in tools like Datadog, Dynatrace or Grafana where relevant for AI monitoring is essential. • Cost, quality, and risk mindset: Comfortable optimizing latency/throughput/cost, and implementing guardrails for PII/safety/compliance. • Collaboration & mentorship: Partner effectively with data scientists, analysts, and engineers; promote best practices and high‑leverage abstractions. • Bonus points: Fine‑tuning or distillation experience; Kubernetes or FastAPI exposure; familiarity with Snowflake or similar warehousing for retrieval sources. • $150,000 - $170,000 a year • In addition to semi-monthly salary payments, this position is also eligible for an annual incentive bonus based on individual and company performance. Yearly incentive bonus target 20% of base salary. This position may also be eligible for a long-term cash incentives. • This role sits in AI Operations and focuses on making AI safe, fast, and economical to scale—unlocking multiple use cases through one high‑leverage engineering hire. • Please include links to your portfolio (GitHub, write‑ups, or demos) with your application.

Responsibilities

• Build and ship LLM apps & agents: Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one. • Own RAG pipelines: Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics. • AWS Infrastructure & Orchestration: Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement, automated provisioning, and resource-usage governance as core platform capabilities. • Observability & SRE for AI: Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests. • Guardrails & governance: Enforce PII redaction, safety filters, role-based access, audit logs, and human‑in‑the‑loop review paths to control quality and risk. • CI/CD for AI artifacts: Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys with automatic rollback triggers. • Cost & performance: Cut run‑rate spend through caching, truncation, batching, autoscaling, and model routing; establish clear unit economics per workflow. • Developer enablement: Provide templates, SDKs, and high‑quality abstractions that let product teams ship safely without bespoke plumbing; improve developer experience. • Platform integration: Build primarily in Python and Metaflow (Outerbounds); deploy on AWS (Bedrock + core services) and OpenAI; use Cursor in daily workflows; help evaluate and, when appropriate, run on Databricks. • Production posture: Participate in on‑call, author runbooks, and remove single‑thread risk for AI services; drive reliability and resilience akin to ML Ops.

Benefits

• Best Egg offers many additional benefits for our employees, including (but not limited to): • · Pre-tax and post-tax retirement savings plans with a competitive company matching • · Generous paid time-off plans including vacation, personal/sick time, paid short-- • term and long-term disability leaves, paid parental leave, and paid company • · Multiple health care plans to choose from, including dental and vision options • · Flexible Spending Plans for Health Care, Dependent Care, and Health • Reimbursement Accounts • · Company-paid benefits such as life insurance, wellness platforms, employee • assistance programs, and Health Advocate programs • · Other great discounted benefits include identity theft protection, pet insurance, • fitness center reimbursements, and many more! • In compliance with the CCPA, Best Egg is fully committed to handling the personal information and data of employees and job applications responsibly with respect and due care. Review our CCPA Employee Policy here

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X