LILT - Research Engineer, Evaluations, Applied AI

Argentina1mo ago

In Office Senior LATAM Artificial Intelligence Research Engineer AI Engineer Python Transformers Vector Data Quality Salesforce

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Education: B.S. in Computer Science, AI, or a related field or 5+ years of relevant experience in a high-growth AI/Research environment. • Deep Technical Proficiency: Expert-level Python skills and hands-on experience with modern AI frameworks (PyTorch, Transformers, LangChain/LlamaIndex). • Evaluation Experience: Experience building model evaluation suites (e.g., MMLU-style benchmarks, custom RAG metrics, or human-preference alignment). • Domain Expertise: Deep understanding of RAG architectures, vector database retrieval logic, and agentic workflows. Experience with RLHF/RLAIF environments and the mechanics of preference signaling/reward modeling. • Multimodal & Multilingual Rigor: Experience handling data quality at scale across different languages and modalities (images, video, or audio). • Precision- and Quality-Orientation: You find bugs in model reasoning that others miss. You are comfortable being the final quality arbiter for technical deliverables that others produce. • Fluency in multiple languages (highly preferred for multilingual model calibration). • Experience in Frontier Labs or high-tier AI research environments. • At least one of: a portfolio of research contributions, an example of evals or “model-breaking” samples, or use of open-source AI evaluation tools. • Our Story • Our Story • Our founders, Spence and John met at Google working on Google Translate. As researchers at Stanford and Berkeley, they both worked on language technology to make information accessible to everyone. While together at Google, they were amazed to learn that Google Translate wasn’t used for enterprise products and services inside the company.The quality just wasn’t there. So they set out to build something better. LILT was born. • LILT has been a machine learning company since its founding in 2015. At the time, machine translation didn’t meet the quality standard for enterprise translations, so LILT assembled a cutting-edge research team tasked with closing that gap. While meeting customer demand for translation services, LILT has prioritized investments in Large Language Models, human-in-the-loop systems, and now agentic AI. • With AI innovation accelerating and enterprise demand growing, the next phase of LILT’s journey is just beginning. • Our Tech • Our Tech • What sets our platform apart: • Brand-aware AI that learns your voice, tone, and terminology to ensure every translation is accurate and consistent • Agentic AI workflows that automate the entire translation process from content ingestion to quality review to publishing • 100+ native integrations with systems like Adobe Experience Manager, Webflow, Salesforce, GitHub, and Google Drive to simplify content translation • Human-in-the-loop reviews via our global network of professional linguists, for high-impact content that requires expert review • LILT in the News • LILT in the News • Featured in The Software Report’s Top 100 Software Companies! • LILT makes it onto the Inc. 5000 List. • LILT’s continues to be an intellectual powerhouse, holding numerous patents that help power the most efficient and sophisticated AI and language models in the industry. • Check out all our news on our website. • Information collected and processed as part of your application process, including any job applications you choose to submit, is subject to LILT's Privacy Policy at https://lilt.com/legal/privacy. • At LILT, we are committed to a fair, inclusive, and transparent hiring process. As part of our recruitment efforts, we may use artificial intelligence (AI) and automated tools to assist in the evaluation of applications, including résumé screening, assessment scoring, and interview analysis. These tools are designed to support human decision-making and help us identify qualified candidates efficiently and objectively. All final hiring decisions are made by people. If you have any concerns, require accommodations, or would like to opt-out of the use of AI in our hiring process, please let us know at [email protected].

Responsibilities

• Eval Architecture & Benchmarking: Design and implement automated and human-in-the-loop evaluation frameworks to measure model performance across multiple modalities (text, code, image, etc.). • Calibration & Peer Review: Act as the Gold Standard reviewer for other engineers. You will calibrate their data generation and evaluation contributions, providing technical feedback to ensure scientific consistency and high-fidelity output. • Frontier Sample Generation: Write and refine complex prompts and golden response pairs for frontier-model training, specifically focusing on edge cases in reasoning and multilingual contexts. • Quality Control (End-to-End): Develop the logic for multi-modal QC checks, ensuring that high-volume data samples are correct across diverse domains and languages. • Technical Mentorship: Bring new knowledge and best practices to our established delivery and forward-deployed engineering teams on model evaluations.

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities