hcompany - Member of technical staff (Inference)
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• MS or PhD in Computer Science, Machine Learning or related fields • Proficient in at least one of the following programming languages: Python, Rust or C/C++ • Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc. • Experience in model compression and quantization techniques • Collaborative mindset, thriving in dynamic, multidisciplinary teams • Strong communication and presentation skills • Eager to explore new challenges • Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc. • Experience with CUDA kernel programming and NCCL • Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.) • Paris or London. • This role is hybrid, and you are expected to be in the office 3 days a week on average. • The final decision for this will lie with the hiring manager for each individual role
Responsibilities
• Develop scalable, low-latency and cost effective inference pipelines • Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms • Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc. • Collaborate with H research teams on model architectures to enhance efficiency during inference • Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.) • Prioritize and implement state-of-the-art inference techniques
Benefits
• Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups • Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment • Unlock opportunities for professional growth, continuous learning, and career development • If you want to change the status quo in AI, join us.
No credit card. Takes 10 seconds.