Speak - Machine Learning Engineer, Assessments

Remote - San Francisco, California, United States$220k - $300k3w ago

Remote Mid NA Artificial Intelligence Machine Learning Engineer Design Engineer Python

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Domain expertise in spoken language proficiency assessment (linguistics, applied linguistics, pedagogy, or equivalent experience) • Strong experience designing and running evaluation + validation for assessment/scoring systems, and tailoring approaches to a specific product use case • 4+ years building automatic proficiency assessment systems (or equivalent depth in closely related scoring/evaluation domains) • Proven ability to ship ML models to production (not only research), including reliability, monitoring, and iteration • Strong generalist ML/analysis skills (statistics, Python, PyTorch/model training) • Ability to operate cross-functionally and communicate clearly with non-technical partners (Content/LD, PM, leadership) • Experience with psychometrics concepts (reliability/validity, calibration) • How we work (collaboration expectations) • This role is designed to be highly collaborative with the Assessment Design Lead. Success depends on a tight loop where constructs/rubrics and model outputs co-evolve — not a sequential handoff.

Responsibilities

• Ship and own assessment ML systems end-to-end • Build, deploy, and maintain scoring models/pipelines (feature extraction → model training → inference → feedback generation) • Own monitoring, regression tests, and ongoing iteration to maintain accuracy targets • Define and operationalize evaluation • Implement validation/evaluation frameworks for assessments, including metrics, test sets, and offline/online analysis • Translate assessment requirements into measurable acceptance criteria and guardrails • Partner deeply with the Assessment Design Lead • Co-develop the strategy, together with the Content team, to grow assessments into a core platform at Speak • Work in a tight weekly loop to deliver incremental improvement • Drive near-term delivery across products • Stand up or improve summative assessments (spoken language ability) and bring them reliably to production • Prototype and validate formative assessment approaches to measure improvement over weeks/months • Support data and labeling strategy • Help define data needs for training/evaluation (including psychometric measurement needs) • Build or improve pipelines that support label collection and analysis (especially for efficacy studies)

Benefits

• Do your life's work with people you’ll love working with: we care strongly about our craft and want every person at Speak to feel like they're growing every day. We believe in the idea that working with people you both enjoy and have respect for makes everything better. We hire thoughtfully and only work with people we admire deeply.

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities

Benefits