Valtech - Senior Data Engineer

Poland - Remote - Hybrid1mo ago

In Office Senior EMEA Cloud Computing Artificial Intelligence Senior Data Engineer Governance Apache Spark Python SQL AWS

Requirements

• Strong hands-on experience with Apache Spark and Delta Lake, and strong programming skills in Python and SQL. Proven experience building batch and streaming data pipelines and production-grade data platforms, with solid understanding of data modeling, data quality, and governance principles. • Cloud & Platforms (Key Requirement) • Experience with one or more major cloud platforms, with preference for Microsoft Azure / Fabric, as well as AWS or GCP. Familiarity with modern data platforms such as Databricks and Snowflake is expected. • Architecture & Systems Thinking • Experience with lakehouse architectures and distributed data systems, and strong understanding of scalability, reliability, and performance considerations in data pipelines. • Mindset • Mindset • Strong problem-solving skills focused on scalability and reliability, with a collaborative approach to working in cross-functional teams. Experience in Agile or consulting environments is beneficial. • Experience with GenAI and AI data systems (e.g., RAG pipelines, vector databases, LLM data preparation), as well as CI/CD for data pipelines and infrastructure-as-code tools such as Terraform, ARM, or CloudFormation. • Additional exposure to streaming technologies (e.g., Kafka), Spark optimization, or advanced analytics and ML workloads (including causal or experimentation platforms) is valuable. Experience building data products or large-scale analytics platforms is also beneficial. • Commitment to reaching all kinds of people • We design experiences that work for all kinds of people - and that starts with our own teams. At Valtech, we’re intentional about building an inclusive culture where everyone feels supported to grow, thrive and achieve their goals. No matter your background, you belong here. Explore our Diversity & Inclusion site to see how we’re creating a more equitable Valtech for all.

Responsibilities

• Build & Data Platform Engineering • Design and implement scalable data platforms and pipelines across cloud environments (Azure/Fabric, AWS, GCP, Databricks, Snowflake). This includes developing reliable batch, streaming, and near-real-time pipelines using technologies such as Spark and Delta Lake, and building ingestion, transformation, and curation workflows for both structured and unstructured data. • You will implement modern data architectures including lakehouse patterns and medallion layering (bronze, silver, gold), ensuring systems are reusable, scalable, and aligned with enterprise needs. • Enable AI, GenAI & Data Products • Deliver high-quality datasets that support analytics, machine learning, causal modeling, and optimization systems. You will enable data pipelines for GenAI use cases (including LLMs, RAG pipelines, and vector-based data flows), as well as agent-based architectures and intelligent workflows, ensuring that data is model-ready and production-grade. • Data Modeling, Orchestration & Automation • Design scalable logical and physical data models for analytical and operational use cases, ensuring consistency across domains. Orchestrate workflows using tools such as Airflow, dbt, Lakeflow, or equivalents, with strong focus on automation, reliability, and maintainability of end-to-end pipelines. • Architecture, Governance & Observability • Establish strong data observability, including monitoring of data freshness, pipeline reliability, and SLA adherence, ensuring systems remain trustworthy and production-ready. • Data Serving, Integration & Optimization • Enable data serving layers (APIs, feature inputs, analytical endpoints) to support downstream systems, including ML and AI platforms. Continuously monitor and optimize pipelines and infrastructure for performance, scalability, and cost efficiency. • Collaboration • Collaboration • Work closely with data scientists, ML engineers, analysts, and business stakeholders to translate requirements into robust data solutions. Support adoption of data products and contribute to best practices across the data and AI ecosystem.