Big Data Platform Engineer, AI Agent Platform
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 5+ years of experience building large-scale data platforms (Hadoop/Spark/Flink or equivalent) • Deep expertise in distributed storage and compute systems (MaxCompute, Hologres, ClickHouse, Hive) • Strong software engineering skills in Java, Scala, or Python; experience with API-first design • Hands-on experience with task scheduling systems (Airflow, DolphinScheduler, or in-house equivalents) • Solid understanding of multi-cloud architectures and cost governance • Familiarity with LLM integration patterns: tool calling, RAG pipelines, context management • Experience with MCP or similar agent-tool frameworks is a strong plus • Passion for building systems that make other engineers 10x more productive • IELTS score of 7 or above in all four components. • IELTS score of 7 or above • 3+ years of work experience in English-speaking regions. • Experience in cross border e-commerce, and familiarity with multi-country, multi-language architectural design is a plus. • cross border e-commerce • multi-country, multi-language architectural design
Responsibilities
• Platform Core: Design and operate large-scale distributed data systems • Platform Core • Own the big data compute and storage infrastructure (MaxCompute/ODPS, Hologres, Spark) • Build and maintain multi-site task orchestration that dynamically selects engines and enforces policy • Drive reliability and performance improvements across batch and real-time pipelines • AI Integration: Build the AI-native platform layer • AI Integration • Develop and expose MCP (Model Context Protocol) tool interfaces so AI agents can interact with platform APIs • Build the scheduling and cost-optimization agents that auto-tune resource allocation and alert severity • Instrument platform telemetry to feed AI-driven SLA monitoring and anomaly detection • Design context retrieval pipelines (RAG / vector search) for SQL code and config knowledge bases • Tooling & DX: Evolve the developer experience • Tooling & DX • Own the internal data development platform — IDE integrations, code review automation, deployment tooling • Build APIs-first tools (backfill, ingestion automation) designed for future MCP integration • Collaborate with data warehouse and service teams to define platform contracts • Ops & Governance: Drive operational excellence • Ops & Governance • Establish SLA benchmarks, cost metrics, and latency dashboards as AI optimization targets • Build automated incident response and root-cause analysis pipelines • Define and enforce infrastructure policies across multi-cloud environments
Benefits
• L&D programs and Education subsidy for employees' growth and development • Various team building programs and company events • Wellness and meal allowances • Comprehensive healthcare schemes for employees and dependants • More that we love to tell you along the process! • All official OKX vacancies are published on this website. While roles may appear on selected third-party platforms from time to time, information on other sites may be inaccurate or outdated. If in doubt, please apply directly through our official careers website. • If in doubt, please apply directly through our official careers website. • Information collected and processed as part of the recruitment process of any job application you choose to submit is subject to OKX's Candidate Privacy Notice.