Data Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• A Bachelor’s or Master’s degree in engineering, computer science, or a related discipline. • Proficiency in Python and SQL, with experience using libraries such as Pandas, Polars, Dask, PySpark, and NumPy. • Experience with containerisation (e.g., Docker). • Familiarity with Linux environments. • Knowledge of data modelling, including relational databases, object storage, and non-relational databases. • Understanding of data governance, including access control and licensing. • Experience with data visualisation tools such as Plotly, Seaborn, or D3.js (desirable). • Experience with orchestration frameworks such as Airflow or Prefect. • Exposure to cloud environments — AWS experience is a strong advantage. • Familiarity with distributed computing frameworks such as Spark, Databricks, or Kubernetes (desirable). • Experience working with biological datasets or in biotech/healthcare data environments (bonus). • Familiarity with data cataloguing tools (bonus). • Personally, You Are • Personally, You Are • Curious (e.g., eager to learn about new data types or scientific use cases). • A collaborative team player, able to communicate effectively with stakeholders from diverse backgrounds and technical abilities (Data Science, ML, Governance, Scientists). • Passionate about science and making a meaningful impact for patients. • Self-motivated and driven to succeed. • User-focused: you consider how datasets will be used downstream by both humans and machines. • Comfortable working independently and taking initiative. • Compliance-aware: you understand the importance of secure, traceable, and auditable data workflows. • Relation Therapeutics is a committed equal opportunities employer. • RECRUITMENT AGENCIES: Please note that Relation Therapeutics does not accept unsolicited resumes from agencies. Resumes should not be forwarded to our job aliases or employees. Relation Therapeutics will not be liable for any fees associated with unsolicited CVs.
Responsibilities
• Build, maintain, and monitor pipelines to transfer lab data into appropriate self-hosted and cloud environments, ensuring clear visibility for stakeholders. • Ingest and integrate bioinformatics and multi-modal data from diverse external sources. • Develop and support reusable ETL workflows to standardise data formats, enrich metadata, and implement versioning and lineage logic. • Enhance FAIR compliance by collaborating with Data Science, Machine Learning, and Wet Lab teams to adopt best practices in data management. • Work with users and stakeholders to integrate data assets into the internal data catalogue and further develop this solution. • Connect lab, external, analytical, and ML data assets into a unified ecosystem. • Automate data quality checks and validation layers during ingestion and transformation to ensure accuracy and reliability. • Enable downstream data exploration tools and visualisation dashboards to promote accessibility and usability for non-technical users. • Support data governance by implementing best practices in access control, data lifecycle management, and compliance (e.g., licensing, privacy). • Develop and refine scalable, fit-for-purpose data models and workflows.