wagey.ggwagey.ggv1.0-1fede34-14-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Staff Engineer Role/Reka - Member of Technical Staff (Data Intelligence)
Reka

Reka - Member of Technical Staff (Data Intelligence)

Remote - US, UK, Singapore2w ago
RemoteStaffAPACStaff EngineerSenior Data ScientistAirflowRayPythonData Quality

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems • Comfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systems • Demonstrated research experience with data compositions, quality, and dataset releases • Ability to design and execute experiments with convincing unbiased outcomes • Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents) • Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking) • Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale • Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers • Bonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments

Responsibilities

• Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds • Explore open source datasets and create internal ones most suitable to build fundamental World Models • Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data. • Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs • Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction • Track and optimize throughput, storage, and compute utilization across pipelines and related assets

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X