Skills needed: Data Engineering and DevOps experience with cloud infrastructure knowledge preferred; proficiency in AWS services such as EC2, S3, Lambda is a plus. Familiarity with containerization technologies like Docker and Kubernetes is required. Experience working on large scale data processing projects using tools like Hadoop or Spark would be beneficial.
Years of experience: 5+ years preferably in Data Engineering/DevOps within the cloud infrastructure domain, specifically AWS services knowledge preferred but not mandatory. Familiarity with containerization technologies is required regardless of prior experience level. Experience working on large scale data processing projects using tools like Hadoop or Spark would be beneficial; however, it's not a strict requirement as long as the candidate can demonstrate relevant skills and abilities in this area through their resume/portfolio.
Education: Bachelor’th degree required with preference for candidates who have completed Master's level studies (MSc) or higher degrees such as MBA or PhD, especially those related to Data Engineering, DevOps, Cloud Computing, Computer Science, Information Systems, etc., but not mandatory.
Certifications: AWS certification preferred; knowledge of other cloud service provider certificates like Azure and Google is also considered beneficial though it's not a strict requirement as long as the candidate can demonstrate relevant skills through their resume/portfolio or during an interview process if needed.
Must-haves: Experience with AWS services such as EC2, S3, Lambda preferred but not mandatory; proficiency in containerization technologies like Docker and Kubernetes required regardless of prior experience level; familiarity with large scale data processing projects using tools like Hadoop or Spark would be beneficial though it's not a strict requirement.
Responsibilities
Data Platform Development & Engineering
Design & Implement ETL/ELT: Develop, optimize, and maintain scalable data pipelines using Python, SQL, and core Azure data services.
Azure Data Services Management: Architect and manage key Azure data components, including:
Data Lakes: Provisioning and structuring data within Azure Data Lake Storage (ADLS Gen2).
Data Processing: Implementing data transformation and analysis logic using Azure Data Factory (ADF), Azure Synapse Pipelines, and Azure Databricks (using Spark/PySpark).
Data Warehousing: Designing and optimizing the enterprise Data Warehouse in Azure Synapse Analytics (SQL Pool).
Data Modeling and Quality: Define and enforce data modeling standards and implement data quality checks within the pipelines.
Cloud Infrastructure & DevOps Automation
Infrastructure as Code (IaC): Design, manage, and provision all Azure data resources (ADLS, Synapse, ADF, Databricks Clusters) using Terraform or Azure Resource Manager (ARM) Templates/Bicep.
CI/CD Implementation: Build and maintain automated Continuous Integration/Continuous Deployment (CI/CD) pipelines for all code (data, infrastructure, and application) using Azure DevOps or GitHub Actions.
Containerization & Compute: Utilize Docker and manage deployment environments using Azure Kubernetes Service (AKS) or Azure Container Instances (ACI) when required for data applications.
Monitoring, Logging, & Security: Configure comprehensive monitoring and alerting using Azure Monitor and Log Analytics. Implement network security and access controls (RBAC) across the data platform.