Tech Holding - ML / AI Data Engineer (Contract)
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 5+ years of experience in data engineering, ML pipelines, or distributed systems. • 5+ years • data engineering, ML pipelines, or distributed systems • Strong experience building scalable data pipelines for large datasets (video/audio preferred). • scalable data pipelines • Hands-on experience with cloud platforms (AWS, Azure, or GCP). • cloud platforms • Experience working with GPU-based environments and distributed computing. • GPU-based environments • Strong programming skills in Python, Scala, or similar languages. • Python, Scala, or similar languages • Experience with data processing frameworks (Spark, Ray, Kafka, Airflow, or similar). • data processing frameworks • Understanding of ML workflows, training pipelines, and inference systems. • ML workflows, training pipelines, and inference systems • Experience designing fault-tolerant, high-availability systems. • fault-tolerant, high-availability systems • Strong knowledge of data storage systems (data lakes, object storage, distributed file systems). • data storage systems • Ability to handle high-throughput, large-scale data ingestion and processing. • high-throughput, large-scale data ingestion and processing • Good to Have • Experience with multimodal AI (video, audio, NLP) systems. • multimodal AI (video, audio, NLP) • Familiarity with annotation tools and data labeling workflows. • annotation tools and data labeling workflows • Experience with containerization and orchestration (Docker, Kubernetes). • containerization and orchestration • Knowledge of cost optimization strategies for large-scale cloud workloads. • cost optimization strategies
Responsibilities
• Design, deploy, and scale large-scale ML and data processing pipelines across cloud infrastructure. • large-scale ML and data processing pipelines • Build systems to ingest, process, and serve 250,000+ hours of multimodal data (video, audio, metadata). • 250,000+ hours of multimodal data • Architect and optimize GPU-based compute environments (e.g., NVIDIA Tesla clusters) for distributed training and inference. • GPU-based compute environments • Develop high-throughput backend systems for video ingestion from desktop and mobile platforms. • high-throughput backend systems • Implement distributed processing workflows, including job scheduling, fault tolerance, and resource allocation. • distributed processing workflows • Design and build human-in-the-loop and automated annotation systems to ensure data quality and scalability. • human-in-the-loop and automated annotation systems • Translate ML and multimodal research into scalable, production-grade cloud architectures. • ML and multimodal research • Optimize pipelines for performance, reliability, and cost efficiency across compute, storage, and networking layers. • performance, reliability, and cost efficiency • Collaborate with ML, data, and engineering teams to deliver end-to-end data workflows. • end-to-end data workflows
No credit card. Takes 10 seconds.