SOUM - Data Engineer

Remote - / Cairo / Tashkent1mo ago

Remote Mid EMEA Cloud Computing Artificial Intelligence Data Engineer SQL Python AWS Azure GCP

Requirements

• Proficiency in Python and SQL for data engineering tasks. • Strong understanding of ETL/ELT processes, data warehousing, and data modeling. • Hands-on experience with cloud platforms (AWS, GCP, or Azure) and data storage solutions (BigQuery, Redshift, Snowflake, etc.). • Familiarity with data orchestration tools Airflow, Airbyte is a must. • Experience with containerization & deployment tools (Docker, Kubernetes) is a plus. • Knowledge of data governance, security, and best practices for handling sensitive data. • Familiarity to work with Git and GitHub. • Dataform is a must • Strong skills in eliciting requirements from cross-functional stakeholders and translating them into actionable data engineering tasks. • 2+ years in data engineering, building and maintaining data pipelines. • 2+ years in SQL and Python development for production environments. • Experience working in fast-growing startup environments is a plus. • Exposure to real-time data processing frameworks (Kafka, Spark, Flink) is a plus. • We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.

Responsibilities

• Data Pipeline Development & Optimization: • Design, build, and maintain scalable and reliable data pipelines to support analytics, ML models, and business reporting. • Collaborate with data scientists and analysts to ensure data is available, clean, and optimized for downstream use. • Implement data quality checks, monitoring, and validation processes. • Data Architecture & Integration: • Work with cross-functional teams to design efficient ETL/ELT workflows using modern data tools. • Integrate data from multiple sources (databases, APIs, third-party tools) into centralized storage solutions (data lakes/warehouses). • Support cloud-based infrastructure for data storage and retrieval. • Performance & Scalability: • Monitor, troubleshoot, and optimize existing data pipelines to handle large-scale, real-time data flows. • Implement best practices for query optimization and cost-efficient data storage. • Ensure data is available and accessible for business-critical operations. • Collaboration & Documentation: • Partner with product, engineering, and business stakeholders to understand data requirements. • Document data workflows, schemas, and best practices. • Support a culture of data reliability, governance, and security.