Bjak - Principal Machine Learning Engineer

Beijing, China5mo ago

In Office Principal APAC Artificial Intelligence Machine Learning Engineer Principal JAX Ray

Requirements

• Strong background in deep learning and transformer-based architectures. • Hands-on experience training, fine-tuning, or deploying large-scale ML models in production. • Proficiency with at least one modern ML framework (e.g. PyTorch, JAX), and ability to learn others quickly. • Experience with distributed training and inference frameworks (e.g. DeepSpeed, FSDP, Megatron, ZeRO, Ray). • Strong software engineering fundamentals – you write robust, maintainable, production-grade systems. • Experience with GPU optimization, including memory efficiency, quantization, and mixed precision. • Comfort owning ambiguous, zero-to-one ML systems end-to-end. • A bias toward shipping, learning fast, and improving systems through iteration. • Experience with LLM inference frameworks such as vLLM, TensorRT-LLM, or FasterTransformer. • Contributions to open-source ML or systems libraries. • Background in scientific computing, compilers, or GPU kernels. • Experience with RLHF pipelines (PPO, DPO, ORPO). • Experience training or deploying multimodal or diffusion models. • Experience with large-scale data processing (Apache Arrow, Spark, Ray). • Our organization is very flat and our team is small, highly motivated, and focused on engineering and product excellence. All members are expected to be hands-on and to contribute directly to the company’s mission.

Responsibilities

• Build and own end-to-end ML pipelines spanning data, training, evaluation, inference, and deployment. • Fine-tune and adapt models using state-of-the-art methods such as LoRA, QLoRA, SFT, DPO, and distillation. • Architect and operate scalable inference systems, balancing latency, cost, and reliability. • Design and maintain data systems for high-quality synthetic and real-world training data. • Implement evaluation pipelines covering performance, robustness, safety, and bias in partnership with research leadership. • Own production deployment including GPU optimization, memory efficiency, latency reduction, and scaling policies. • Collaborate closely with application engineering to integrate ML systems cleanly into backend, mobile, and desktop products. • Make pragmatic trade-offs and ship improvements quickly, learning from real usage under real production constraints of latency, cost, reliability, and safety.