lilt-production - AI Benchmark Engineer - Native Language Specialist | Czech

Remote - Prague, Czech Republic1mo ago

Remote Senior EMEA Artificial Intelligence Higher Education AI Engineer Technical Support Specialist Python Shell Quality Assurance Quality Control

Upload My Resume

Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Experience: 5+ years of industry experience in software engineering. • Background: Proven track record at leading technology companies and/or graduation from top-tier engineering universities. • Language: Native or near-native fluency, with a deep understanding of its grammar, register, and phrasing rules. High English proficiency. • Technical Stack: Strong proficiency in Python, standard shell scripting, and data processing. • Workflow: Extensive experience with Terminal/CLI-based development workflows and a working familiarity with coding agents. • Domain Expertise: Deep technical understanding of multilingual text processing pitfalls, including: • Encoding/decoding robustness and Unicode normalization. • Locale-dependent conventions (collation, casing, non-Gregorian dates). • Text I/O, toolchain interoperability, and safe string operations. • (For specific languages) Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts.

Responsibilities

• Task Engineering: Evaluating Coding Agents. • Asset Creation: Build realistic task environments using datasets and files in your native language. Crucially, these assets must remain in the target language to genuinely measure multilingual handling. • Prompting & Translation: finding failure points where AI does not work, in your native language • Implementation & Verification: Support the development of robust solutions (reference implementations) and write highly reliable, deterministic verifier scripts (using rubric-based judging only when strictly necessary). • Calibration & Execution: Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku, Sonnet, Opus). • Quality Assurance: Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity.

Benefits

• Your schedule, your rules. As an independent contractor, work when you want, as much or as little as you want. No fixed hours, no check-ins, no micromanaging. • Get paid quickly and fairly. We respect your time and your expertise. Competitive rates, prompt payments, no chasing invoices. • Work on projects that actually matter. Contribute to cutting-edge AI and language technology that is shaping how humans and machines communicate. • Be part of something bigger. Join a global community of linguists, subject matter experts, and language professionals who are advancing human knowledge together. • Grow without limits. As a Lilt contractor you get access to diverse, innovative projects that expand your portfolio and sharpen your skills across industries and domains. • Have fun doing what you love. Bring your language skills to life on projects that are as interesting as they are impactful.We are building a rigorous, verifiable evaluation suite of Terminal-Bench tasks designed to test the limits of large language models on multilingual software challenges. Our goal is to measure multilingual robustness across prompt language effects, non-English data processing, and complex locale/encoding edge cases in terminal workflows. • How to join our expert community

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities

Benefits