Turing - Research Engineer

Remote - Brazil2mo ago

Remote Mid LATAM Software Research Engineer C++Java Go Rust Python

Requirements

• 4–5 years of experience building or improving deep learning systems where data quality mattered materially (training, post-training, evals, or agentic systems). • Strong intuition for the “data ingredients” that drive model improvements: what to collect, what to filter, what to synthesize, and how to measure. • Ability to communicate clearly with researchers and engineers: turning research objectives into concrete specs, and turning messy outputs into actionable insights. • Demonstrated ability to be extremely detail-oriented in diagnosing subtle data quality issues and failure modes. • Solid programming ability with a bias for shipping: • Python proficiency required • Comfort with SQL/structured data workflows strongly preferred • For coding-focused work: proficiency in one or more major languages (e.g., C++, Java, Go, Rust, JS/TS) is a plus • Comfort designing quality systems: • Rubrics, validation scripts, gold sets, sampling strategies • Statistical checks and slice-based evaluation • Human-in-the-loop review loops grounded in measurable criteria • Strong pluses • Strong pluses • RL or post-training experience (any of: RLHF/RLAIF, verifier training, reward modeling, RL fine-tuning, environment design). • Experience with agentic evaluation (tool use, multi-step workflows, long-horizon tasks, trajectory analysis). • Multimodal expertise (document understanding, charts, diagrams, OCR, UI/vision grounding; audio/video optional). • STEM depth (math/physics/engineering) with an eye for verifiability and rigorous correctness. • Modern embodied AI / VLM-driven agent experience (vision-language(-action) models, interaction datasets, embodied evals, long-horizon grounding, tool/sensor/action interfaces). • Systems thinking: ability to “simulate” an application’s API/data schema and design tasks that realistically reflect real-world constraints and workflows.

Responsibilities

• 1) Own data and environment quality from an AI researcher perspective • Translate ambiguous research goals into clear data requirements: target skills, failure modes, difficulty calibration, coverage, and success metrics. • Define what “good” looks like by creating detailed rubrics, counterexamples, and boundary cases (what to include vs. exclude). • Perform deep, detail-oriented audits of produced data: spot subtle errors, reward hacking opportunities, leakage, ambiguity, inconsistent assumptions, and distribution shifts. • Drive iterative improvements using evidence: error taxonomies, slice-based quality metrics, and model-behavior-informed refinements. • 2) Design and build datasets and RL environments for your capability area(s) • Contribute to or lead the design of: • Task suites (single-step and long-horizon workflows) • Task suites • Ground-truth signals (verifiers, unit tests, structured checks, reward functions, automatic validators) • Ground-truth signals • Environment interfaces (APIs, tool schemas, state abstractions, database schemas, simulator-like dynamics) • Environment interfaces • Depending on your mapped capability area(s), you may focus on: • Coding / SWE agents: data reflecting real development work (codebase navigation, bug localization, patching, tests, code reviews, CI-like constraints, refactors, security fixes). • Coding / SWE agents: • Multimodality: tasks that test true multimodal reasoning (chart reading, document QA, UI understanding, diagram-based STEM reasoning, OCR-aware tasks). • Multimodality: • STEM: tasks with verifiable solutions (symbolic checks, reference solvers, numerical validation, step consistency, unit sanity). • STEM: • Modern embodied AI / VLM-driven agents: interaction data and environments for vision-language(-action) models (long-horizon tasks, instruction following grounded in visual context, robust action selection, safety/constraint adherence, adversarial state coverage). • Modern embodied AI / VLM-driven agents: • 3) Build robust validation, denoising, and synthetic data systems • Implement automated validation and filtering to achieve frontier-grade signal-to-noise: • Deduplication, decontamination, leakage checks • Consistency checks (format, schema, invariants) • Difficulty and diversity controls (coverage, novelty, long-tail) • Develop synthetic data generation and augmentation pipelines where appropriate: • Programmatic task generators • Controlled perturbations to create hard negatives • Scenario templating with diversity constraints • Simulator-/tool-driven rollouts for trajectory data • Create documentation and data cards: dataset intent, known limitations, recommended use, and evaluation linkage. • 4) Use evaluations and training runs to prove impact • Design and run evals that reflect the customer’s intended usage. • Produce analysis that connects data to outcomes: • Pre/post comparisons on targeted capability slices • Error breakdowns and “why the model failed” narratives • Ablations to identify which data attributes drive lift • When needed, run in-house fine-tuning or RL-style experiments (or partner with research) to demonstrate that the data/environment improves model behavior in measurable ways. • 5) Collaborate effectively with large production teams without being ops-heavy • Work with cross-functional teams (engineers, researchers, QAs, domain SMEs, and large-scale data production groups) by providing: • Clear specs, examples, and edge cases • Fast feedback loops based on audits and quantitative signals • Structured review processes focused on quality, not throughput alone • You are expected to be highly engaged in reviewing and improving outputs from large annotation/creation efforts, but not primarily responsible for hiring, staffing, or people operations.

Benefits

• Work directly with the world’s leading AI labs and enterprises at the cutting edge of post-training and RL environment design. • Real impact (path to AGI): your datasets and environments will directly influence the trajectory toward Artificial General Intelligence and, ultimately, Superintelligence. • Real Impact (GDP): the systems you help build and evaluate target high-value workflows across industries, where even incremental improvements translate to significant productivity gains. • Talent-dense team, where you'll find high autonomy, rapid iteration, and an exceptional learning curve. • Values: • Values: • We are client first: We put our clients at the center of everything we do, because their success is the ultimate measure of our value. • We are client first • We work at Start-Up Speed: We move fast, stay agile and favor action because momentum is the foundation of perfection • We work at Start-Up Speed: • We are Al forward: We help our clients build the future of Al and implement it in our own roles and workflow to amplify productivity. • We are Al forward: • Advantages of joining Turing: • Amazing work culture (Super collaborative & supportive work environment; 5 days a week) • Awesome colleagues (Surround yourself with top talent from Meta, Google, LinkedIn etc. as well as people with deep startup experience) • Competitive compensation • Flexible working hours