wagey.ggwagey.gg
39,612  jobs39,612  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(39,612)/Software Engineer Role(2,570)/Elicit (13) - Evaluation Engineer
Elicit

Elicit - Evaluation Engineer

Oakland, CA, United States - Hybrid$170k/month+ Equity3mo ago
In OfficeMidNAPharmaceuticalsDeveloper ToolsSoftware EngineerTest EngineerPythonFront-endTypeScriptACCA

Requirements

• At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.) • Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitude. • Will make you more competitive for the role • Knowledge of statistics (for e.g. calculating power and credence intervals for evals) • Experience with advanced Python (asyncio/trio and parallel processing strategies) • Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plus. • Experience building developer tools (ML engineers are one of your most important clients) • Previous experience as a data engineer or working on AI infrastructure • Knowledge of pharma/biomed • Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it) • This is a diverse list of nice-to-haves. We expect the candidate we select to have some, but not all, of these. Other team members can fill in for skills you lack. • Location and travel • We have a lovely office in Oakland, CA, but we’re flexible about where you work. You’re welcome to work remotely, from our Oakland headquarters, or in a hybrid setup. The only in-person requirement is attending our quarterly team retreats, typically held on the west coast.

Benefits

• We need someone to own the technical foundation of our auto-evaluation systems. Our evals are currently much slower than they need to be, and our interfaces aren't optimized for the diverse set of people who need to use them—ML engineers iterating on models, product managers monitoring quality, and customers assessing trust in results. • The right person for this role won't just build infrastructure. You'll think deeply about what it actually means for Elicit to help with decision-making in pharma and encode that understanding into our evaluation systems. • What you'll own • The core auto-eval platform • You'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals: • Speed: You’ll build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; and then you’ll figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs) • Speed: • Interfaces: ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into. Product managers need dashboards showing performance over time and what's going wrong in production. • Interfaces: • Architecture: Your code must be well-architected so other team members and ML engineers can understand and build on it. An engineer starting on a new feature should be able to quickly add examples and run an eval. • Architecture: • Ensuring evaluations are accurate and reliable • We need to evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure. This requires encoding real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards). • You'll provide appropriate statistical tests and confidence intervals so we can trust our results. • A month in your life • In a typical month, expect to spend: • 60% working on the core eval platform • 15% working closely with the evals team to build and improve specific evals (e.g., an eval of our paper search within our systematic review flow) • 10% mentoring our evals engineering intern • The rest on learning how people interact with the eval system so you can make it work better for them, and understanding what our users want from Elicit so evals measure what matters • The rest • In addition to working on important problems as part of a productive and positive team, we also offer great benefits (with some variation based on location): • Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events • Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family • Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays • 401K with a 6% employer match • A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter • $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events • A team administrative assistant who can help you with personal and work tasks • For all roles at Elicit, we use a data-backed compensation framework to keep salaries market-competitive, equitable, and simple to understand. For this role, we target starting ranges of: • Career (L3): $140-170k + equity • Senior (L4): $165-200k + equity

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

NateraNatera - Software Test Engineer4d ago
·Remote - US Remote
RemoteNAMidGenomicsDiagnosticsTest EngineerSoftware EngineerSQLDocumentationPythonQuality AssuranceChange Management
Arcesium LLCArcesium LLC - Software Engineer4mo ago
·New York; Remote - US - Hybrid·$200k - $240k/year
In OfficeNASeniorAsset ManagementBankingSoftware EngineerJavaSQLPythonTypeScriptACCA
Obsidian SecurityObsidian Security - Software Engineer1mo ago
·Remote - US Remote·$131k - $131k/year + Equity
RemoteNAMidCloud ComputingArtificial IntelligenceSoftware EngineerGoSQLPythonTypeScriptGit
newtonxnewtonx - Software Engineer - LLM Systems3mo ago
·Remote - USA *·$180k - $220k/year + Equity
RemoteNAMidCloud ComputingArtificial IntelligenceSoftware EngineerReactTypeScriptPythonNode.jsAWS
CompaCompa - Software Engineer, All Teams3mo ago
·Remote - Irvine, California, United States·$125k - $180k/year + Equity
RemoteNAMidCloud ComputingSoftwareSoftware EngineerReactDjangoPythonTypeScriptAWS
modernfimodernfi - Software Engineer3mo ago
·Remote - NYC·$150k - $220k/year + Equity
RemoteNAMidBankingCloud ComputingSoftware EngineerPythonFastAPITypeScriptReactPostgreSQL
OkloOklo - Software Engineer4mo ago
·Remote - Santa Clara, CA or Remote·$110k - $155k/year + Equity
RemoteNAMidHealth InsuranceInsuranceSoftware EngineerGitRustPythonTypeScriptJavaScript
Beacon AIBeacon AI - Software Engineer, Backend1mo ago
·San Carlos, California, United States - Hybrid·$135k - $190k/year + Equity
In OfficeNAMidArtificial IntelligenceAirlinesSoftware EngineerBackend EngineerPythonJavaScriptTypeScriptNode.jsACCA
NotionNotion - Software Engineer, Trust1mo ago
·San Francisco, California, United States·$200k - $280k/year
In OfficeNASeniorArtificial IntelligenceSoftware EngineerTest EngineerReactTypeScriptNode.jsMemcachedNotion

Browse more by category

Show 2,570 moreSoftware EngineerShow 111 moreTest EngineerShow 6,331 morePythonShow 456 moreFront-endShow 2,517 moreTypeScriptShow 506 moreACCA
Privacy·Terms··Contact·FAQ·Wagey on X