Aspora - Senior Data Platform Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• 5+ years of data engineering experience with 2+ years on large-scale big data platforms. • Hands-on expertise with Apache Spark — performance tuning, partitioning, broadcast joins, execution plans. • Deep Databricks experience — workspace configuration, Unity Catalog, Delta Live Tables, or equivalent. • Solid Apache Airflow experience: DAG authoring, custom operators, XCom, Pools, and sensor patterns. • Production experience implementing CDC pipelines (Debezium, Kafka Connect, or DMS). • Strong proficiency in Python and SQL. • Experience designing analytical data models for large datasets (star schema, wide tables, aggregation layers). • Track record of building reliable, observable, and testable pipelines in production. • What Great Looks Like • Hands-on experience with modern data lake technologies like Delta Lake or Apache Iceberg, including compaction, time travel, and schema evolution • Experience building and operating streaming data pipelines using Apache Spark Structured Streaming, Apache Flink, or Kafka Streams • Proficiency with dbt for data transformations and lineage management • Experience working with cloud data infrastructure on Amazon Web Services, Google Cloud Platform, or Microsoft Azure • Familiarity with infrastructure-as-code tools such as Terraform or AWS CloudFormation • Experience owning data platform reliability end-to-end, including monitoring, alerting, and building self-healing systems • A strong data-as-a-product mindset, with emphasis on clear contracts, versioned schemas, SLOs, and well-documented datasets • A bias toward automation—proactively reducing operational toil by building scalable frameworks and tooling • Solid engineering fundamentals, including writing testable code, participating in rigorous code reviews, and maintaining high standards for operational excellence
Responsibilities
• Big Data Platform & Infrastructure • Design, build, and operate large-scale data processing infrastructure using Spark on Databricks — ensuring reliability, performance, and cost efficiency at scale. • Spark on Databricks • Architect and maintain lakehouse solutions (Delta Lake, Iceberg) including partitioning strategies, Z-ordering, and compaction jobs. • Own cluster management, autoscaling policies, and resource governance across Databricks workspaces. • Drive platform-level improvements: query optimisation, caching strategies, compute–storage separation, and shuffle tuning. • ETL / ELT Pipeline Engineering • Design and build robust, idempotent, and testable data pipelines handling batch and near-real-time workloads. • Manage and extend our Airflow-based orchestration layer — DAG authoring standards, dependency management, alerting, and SLA enforcement. • Airflow • Implement and maintain CDC pipelines (Debezium, Kafka Connect, or native DB replication) ensuring low-latency, high-fidelity data propagation. • CDC pipelines • Define data pipeline contracts (schemas, SLAs, quality assertions) and enforce them via automated data quality frameworks. • Analytical Storage & Computation • Model and manage analytical data stores — dimensional models, OBT patterns, and aggregation layers optimised for BI and self-serve analytics. • Own the evolution of our analytical warehouse/lakehouse stack — performance benchmarking, cost modelling, and technology selection. • Build and maintain efficient data serving layers for dashboards, ML feature stores, and reverse ETL use cases. • Implement data retention, archival, and lifecycle management policies across hot/warm/cold storage tiers. • Platform Engineering & Developer Experience • Define and enforce data platform engineering best practices — code standards, CI/CD for pipelines, automated testing, and observability. • Build internal tooling and libraries that make data engineers faster: reusable Spark utilities, pipeline templates, local dev environments. • Champion data reliability engineering: lineage tracking, incident response playbooks, pipeline SLO monitoring, and root cause analysis. • Tech-Stack • | Area | Tools | Compute | Apache Spark, Databricks, PySpark, Scala | Orchestration | Apache Airflow, dbt | Ingestion & CDC | Debezium, Kafka, Kafka Connect | Storage | Delta Lake, Iceberg, S3/GCS, Snowflake | Languages | Python, SQL, Scala | Observability | Great Expectations, OpenLineage, Monte Carlo |
Benefits
• Work on a high-impact product that is redefining banking for immigrants worldwide. • Own backend design and execution, solving complex engineering problems at scale. • Work alongside a top-tier global team of engineers in a fast-paced environment. • Competitive ESOPs—align your growth with Aspora’s long-term vision. • Health insurance, strong leave policies, and career growth opportunities in a high-impact startup
No credit card. Takes 10 seconds.