Cantina - Member of Technical Staff, Data & ML Infrastructure for Video Models
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 3+ years of experience in machine learning, applied ML, data pipelines, or related engineering roles, ideally working on large-scale multimodal, video, or vision-based systems. • Strong programming skills in Python and solid experience building reliable data processing and preprocessing pipelines for ML workflows. • Hands-on experience preparing training data for ML models, including parsing, filtering, dataset curation, quality control, and large-scale data handling using tools such as AWS S3 and DynamoDB. • Familiarity with annotation and labeling workflows, including task design, vendor or crowd-platform orchestration such as MTurk or Prolific, and methods for ensuring label quality. • Experience working with Kubernetes for orchestrating distributed workloads, including data preprocessing, pipeline execution, and dataset delivery to training clusters. • Comfort working across cloud and on-demand compute environments such as AWS and RunPod, with the ability to port and optimize pipelines across infrastructure. • Familiarity with distributed data processing frameworks and experience designing systems that operate reliably at scale across many nodes or workers. • Working knowledge of PyTorch and the broader deep learning stack, with the ability to read, debug, and optimize research model inference code for use in production preprocessing pipelines. • Ability to work cross-functionally with research and engineering teams and translate experimental ideas into robust, scalable systems. • Bachelor's, Master's, or PhD in Computer Science, Machine Learning, Engineering, Mathematics, or a related technical field; experience in generative video, computer vision, or multimodal ML is strongly preferred. • Bonus: Experience training, evaluating, or fine-tuning smaller ML models used for classification, filtering, ranking, quality assessment, or other supporting tasks in an ML pipeline.
Responsibilities
• Build and maintain data pipelines for large video generation models, including data ingestion, parsing, filtering, preprocessing, and dataset curation at scale, using tools such as AWS S3 and DynamoDB. • Design and run annotation workflows across platforms such as MTurk, Prolific, and Mechanical Turk, including task design, quality control, and label validation. • Train, evaluate, and improve smaller supporting models used for data filtering, quality assessment, preprocessing, or other parts of the ML pipeline. • Partner closely with research and engineering teams to turn experimental workflows into scalable, repeatable systems that support model training and evaluation. • Own data quality across the pipeline by identifying bottlenecks, failure modes, and low-quality sources, and continuously improving tooling and processes. • Build internal tools and automation that make it easier to prepare datasets, launch annotation jobs, monitor outputs, and support model development end to end. • Drive larger pipeline projects from start to finish, such as new dataset creation efforts or upgrades to labeling and preprocessing infrastructure. • Work within a Kubernetes-based training infrastructure, ensuring datasets are properly prepared, formatted, and delivered to training clusters. • Profile and optimize research model inference scripts used in preprocessing steps, ensuring that model-driven filtering and transformation stages run within practical time and cost constraints when applied to large-scale raw data.
Benefits
• The anticipated annual base salary range for this role is between $200,000-$260,000 (€170,000-€225,000). When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data. • Competitive salary and generous company equity • Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina • 42 days of paid time off, including: • 15 company holidays • 2 floating holidays • Generous parental leave & fertility support • 401(k) retirement savings plan • Lifestyle spending account – $500/month to use however you’d like • Complimentary lunch and snacks for in-office employees • One Medical membership, and more!
Similar Jobs
No credit card. Takes 10 seconds.