wagey.ggwagey.ggv1.0-e93b95d-4-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Research Scientist Role/Snorkel AI - Research Scientist, RL Training
Snorkel AI

Snorkel AI - Research Scientist, RL Training

Redwood City, CA (Hybrid); San Francisco, CA (Hybrid); United States (Remote) - Hybrid$200k - $200k2w ago
In OfficeNACloud ComputingArtificial IntelligenceResearch ScientistTraining SpecialistPythonAWSGCPKubernetes

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• Deep expertise in reinforcement learning from human or AI feedback, reward modeling and credit attribution ideally with a clear perspective on what data makes these techniques work. • Experience training or fine-tuning 30B+ large language models at scale, including familiarity with distributed training infrastructure. • Strong proficiency in Python and ML frameworks, especially PyTorch and HuggingFace and hands-on experience with RL frameworks such as Verl and SkyRL. • Solid software engineering fundamentals — you can build research prototypes that others can run, extend, and integrate into data production workflows. • Familiarity with ML infrastructure and cloud platforms and tools (AWS, GCP, Kubernetes, Slurm, etc.); experience with large-scale RL training pipelines a strong plus. • Comfort operating in a high-iteration environment with open-ended research questions and shifting, customer-driven technical constraints. • Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred; exceptional industry experience considered. • $200,000—$275,000 USD • Be Your Best at Snorkel

Responsibilities

• Research and implement reinforcement learning techniques — including GRPO, RLHF, RLAIF, DPO, and reward modeling — and translate them into data products (preference datasets, reward signals, verifiable rewards) that customers can use to train and fine-tune large language models. • Design and build data pipelines that generate high-quality training signal for RL workflows, including AI-assisted data annotation and curation data pipelines to improve model generalization to unseen benchmarks . • Prototype and iterate on end-to-end RL training recipes that inform what data Snorkel ships as part of its data-as-a-service deliveries. • Work closely with research scientists, ML engineers, and delivery teams to translate RL research into customer-ready data products. • Stay current with the latest developments in large-scale muli-node LLM training, alignment research, and scalable RL methods (on complex environments such as Terminal-Bench), bringing relevant advances into Snorkel's data-as-a-service approach. • Contribute to Snorkel's research publications and internal knowledge base in RL and model training.

Benefits

• $200,000—$275,000 USD • Be Your Best at Snorkel

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X