Black Duck Software, Inc. - Principal Data Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Strong SQL skills and strong Python skills, used to build pipelines, services, and automation. • Hands-on experience running cloud systems on AWS and Google Cloud (IaaS level: compute, storage, networking, IAM). • Practical experience with both operational databases (RDS-style) and analytics stores (columnar/OLAP), including performance tuning. • Strong data modeling ability, including schema evolution, conformed dimensions, and “one source of truth” metric definitions. • Track record of delivering data products that other teams or customers depend on, with clear contracts and reliability expectations. • Ability to make sound engineering tradeoffs across latency, accuracy, cost, and security without creating brittle complexity. • Experience with lakehouse patterns and open table formats (or similar), including governance and table maintenance. • Experience with orchestration and streaming systems used in production (batch + real-time), and managing backfills safely. • Familiarity with ML data needs (training/serving splits, feature-ready datasets, evaluation datasets) and AI-adjacent workflows. • Preferred • Experience building self-service data platforms (catalog, discoverability, access controls) used by multiple teams. • Experience in regulated or security-sensitive environments, including retention, auditing, and data access controls. • Work model, location & travel • Location: Belfast, UK • Reports to: VP of Data Engineering • Work model: Hybrid (details TBD) • Collaboration hours: Flexible; overlap with UK and US time zones • Travel: Minimal
Responsibilities
• Lead the design and build-out of cross-product data services for multiple product lines from one governed data plane. • Define the “customer data plane” model: canonical customer identifiers, shared dimensions, and consistent facts used across products. • Build and operationalize ingestion patterns for batch, streaming, and event data, with repeatable onboarding for new sources. • Own the operational playbook for data reliability: data contracts, quality checks, lineage, monitoring, and incident response. • Implement and run access methods that make data usable: curated datasets, secure query interfaces, and product-ready data APIs where needed. • Productize customer-facing data products (datasets, metrics, exports, and feeds) with versioning, documentation, and clear ownership. • Design data models that fit both operational systems (RDS) and analytics stores (columnar/OLAP), including performance and cost tuning. • Ensure data products also power ML workflows: trusted training datasets, feature-ready outputs, and consistent definitions for decision-making. • Enable AI automation by delivering reliable, low-latency, governed data products that can be used safely in automated workflows. • Partner closely with product, engineering, and security stakeholders to align data products to roadmap priorities and customer outcomes. • Raise the technical bar through architecture reviews, standards, and mentoring—while staying hands-on in key systems.
Similar Jobs
No credit card. Takes 10 seconds.