Yassir - Senior Data Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Build a centralized data lake on GCP Data services by integrating diverse data sources throughout the enterprise. • Develop, maintain, and optimize SPARK-powered batch and streaming data processing pipelines. Leverage GCP data services for complex data engineering tasks and ensure smooth integration with other platform components • Design and implement data validation and quality checks to ensure data's accuracy, completeness, and consistency as it flows through the pipelines. • Work with the Data Science and Machine Learning teams to engage in advanced analytics. • Collaborate with cross-functional teams, including data analysts, business users, operational and marketing teams, to extract insights and value from data. • Collaborate with the product team to design, implement, and maintain the data models for analytical use cases. • Design, develop, and upkeep data dashboards for various teams using Looker Studio. • Engage in technology explorations, research and development, POC’s and conduct deep investigations and troubleshooting. • Design and manage ETL/ELT processes, ensuring data integrity, availability, and performance. • Troubleshoot data issues and conduct root cause analysis when reporting data is in question. • PySpark Batch and Streaming • GCP Dataproc, Dataflow, DataStream, Dataplex, Pub/Sub, BigQuery and • Cloud Storage • NoSQL (preferably MongoDB • Programming languages: Scala/Python • Great Expectation, or similar DQ framework • Familiarity with workflow management tools like: Airflow, Prefect or Luigi • Understanding of Data Governance, Data Warehousing and Data Modelling • Good SQL knowledge • Able to communicate effectively, distill technical knowledge into digestible • messages in a succinct / visual way • Proactively identify and contribute with team development initiatives, and • supporting junior members. • Infrastructure-as-Code, preferably Terraform • Docker and Kubernetes • AI / ML engineering knowledge • Lineage, or relevant tools e.g. Atlan
Responsibilities
• Build a centralized data lake on GCP Data services by integrating diverse data sources throughout the enterprise. • Develop, maintain, and optimize SPARK-powered batch and streaming data processing pipelines. Leverage GCP data services for complex data engineering tasks and ensure smooth integration with other platform components • Design and implement data validation and quality checks to ensure data's accuracy, completeness, and consistency as it flows through the pipelines. • Work with the Data Science and Machine Learning teams to engage in advanced analytics. • Collaborate with cross-functional teams, including data analysts, business users, operational and marketing teams, to extract insights and value from data. • Collaborate with the product team to design, implement, and maintain the data models for analytical use cases. • Design, develop, and upkeep data dashboards for various teams using Looker Studio. • Engage in technology explorations, research and development, POC’s and conduct deep investigations and troubleshooting. • Design and manage ETL/ELT processes, ensuring data integrity, availability, and performance. • Troubleshoot data issues and conduct root cause analysis when reporting data is in question.
No credit card. Takes 10 seconds.