OKX - Staff AI Engineer, Model Post-Training and Alignment

Singapore2mo ago

In Office Staff APAC Artificial Intelligence Oil & Gas AI Engineer

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• Strong hands-on experience across the full post-training pipeline for large models. • Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies. • DPO, GRPO, and RL-based post-training methodologies • Proven experience designing domain-specific data strategies and training methodologies. • domain-specific data strategies • Experience training and post-training specialized small models from scratch. • specialized small models from scratch • Solid understanding of reinforcement learning fundamentals and their application to model alignment. • Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar. • vLLM, SGLang, or similar

Responsibilities

• Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods. • Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization). • DPO (Direct Preference Optimization) • GRPO (Generalized Reward Policy Optimization) • Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance. • Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy. • Build and refine Reward Models to support alignment and downstream optimization. • Reward Models • Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems. • RLAIF (Reinforcement Learning from AI Feedback) • Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang. • SGLang • Evaluate model performance using both automated benchmarks and human/AI feedback loops. • Collaborate with research and infrastructure teams to productionize training and deployment workflows. • What We Look For In You • Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience.

Benefits

• Competitive total compensation package • L&D programs and Education subsidy for employees' growth and development • Various team building programs and company events • Wellness and meal allowances • Comprehensive healthcare schemes for employees and dependants • More that we love to tell you along the process! • All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website. • If in doubt, please apply directly through our official careers website. • Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.