RL Environments Engineer

Preference ModelRemote - PT (Pacific)+ Equity1mo ago

Upload My Resume

Drop here or click to browse · PDF, DOCX, TXT

Apply in One Click

Requirements

Strong Python skills with an engineering quality focus. Docker experience is preferred but not mandatory. Understanding of LLMs and their limitations required. Must meet throughput expectations and respond quickly to feedback. Ability in CUDA/Pallas kernel development or expert knowledge in a specific DL research area, such as architectures (SSMs, KANs), generative modeling (diffusion, flow matching), reasoning methods (neuro-symbolic methods), mechanistic interpretability techniques (circuit analysis, causal discovery), foundations of learning theory and control optimization. Experience with ML for science applications like physics-informed neural nets or computational neuroscience is a plus. Familiarity with numerical & simulation methods such as stochastic time series modeling, fluid dynamics simulations, Bayesian inference techniques, Monte Carlo methods also beneficial but not required. Broad research interests and the ability to translate complex papers into RLVR problems are preferred qualifications for this role.

Design MLE environments for LLMs to learn better reasoning/advanced concepts from modern ML.
Optimize non-trivial neural modules using CUDA or Pallas kernel development skills if applicable.
Develop and maintain RLVR problems based on current research areas in AI, with a focus on math-heavy topics that do not require massive compute resources. Examples include architectures like SSMs, KANs, tensor networks, Hypernetworks; generative modeling techniques such as diffusion, flow matching, probabilistic programming; and reasoning methods including neuro-symbolic approaches and algorithmic reasoning.
Contribute to the creation of RL environments where models encounter research and engineering problems for iterative learning with realistic feedback loops.
Possess a clear understanding of LLMs' current limitations and be able to meet throughput expectations while responding quickly to feedback, potentially involving circuit analysis or causal discovery if relevant tasks are assigned.

Remote work options are clearly mentioned: "This is a remote contractor role." This indicates that candidates can expect to work remotely, which may be considered an indirect benefit in terms of flexibility and commuting time savings.