The following skills and qualifications are required for this role:
Educational Background: A Bachelor's degree in Computer Science, Engineering, or a related technical field.
Professional Experience: A minimum of 3-5 years of hands-on experience in a data engineering role.
Technical Proficiencies:
Programming: Advanced proficiency in Python, with experience in either Java or Scala being a plus.
SQL: Expertise in writing complex, highly-optimized SQL queries across large datasets.
Cloud Platforms: Demonstrable experience with at least one major cloud platform, such as AWS (Redshift, S3, EC2), Google Cloud Platform (BigQuery, Dataproc), or Azure.
Big Data Technologies: Hands-on experience with big data tools like Apache Spark, Hadoop, and Kafka.
Data Warehousing: Experience with modern data warehousing solutions such as Snowflake, Redshift, or BigQuery.
Excellent problem-solving and analytical skills.
Strong communication and collaboration abilities, with a knack for explaining complex technical concepts to non-technical audiences.
A proactive and self-motivated work ethic, with a strong sense of ownership and a commitment to delivering high-quality results.
While not mandatory, the following qualifications will be highly regarded:
A Master's degree in a relevant technical field.
Professional certifications in cloud technologies (e.g., AWS Certified Data Analytics, Google Professional Data Engineer).
Experience with data orchestration tools such as Airflow or Prefect.
Familiarity with containerization technologies like Docker and Kubernetes.
A solid understanding of data modeling principles and best practices.
Responsibilities
Infrastructure Auditing & Optimization
Conduct comprehensive audits of the existing data infrastructure to assess its efficiency, scalability, and performance.
Identify and analyze performance bottlenecks, and propose and implement optimization strategies.
Re-design and modernize data pipelines for greater scalability and reduced latency.
Implement monitoring and alerting systems to proactively identify and address infrastructure issues.
Data Pipeline & ETL Development
Design, build, and maintain robust and scalable ETL/ELT processes to ingest data from a wide variety of sources.
Assemble large, complex datasets that meet both functional and non-functional business requirements.
Automate manual data processes to improve efficiency and reduce the potential for human error.
Ground Truth Data Source Management
Establish and maintain a centralized, reliable source of truth for all key business data.
Implement rigorous data quality checks and validation processes to ensure data accuracy and consistency.
Develop and enforce data governance best practices, including data lineage and metadata management.
Collaboration & Support
Work closely with data scientists, data analysts, and other stakeholders to understand their data requirements and provide them with the data they need.
Build analytical tools and provide technical support to assist teams in leveraging the data infrastructure effectively.
Act as a subject matter expert on data engineering best practices and advocate for their adoption across the organization.
Benefits
Equity options mentioned as part of compensation package.
Paid Time Off (PTO) is included with benefits such as holiday pay and sick leave for full-time employees, including paid vacation days off work during regular business hours when no one else will be at their desk or on the clock to answer questions from clients/customers about your company.
Insurance coverage provided by employer's group plan is included in benefits package (health insurance).
Perks such as a gym membership and free snacks are part of compensation package.
Remote work options available for eligible employees, allowing them to telecommute from home or another location with internet access during designated hours on an ad hoc basis without disrupting the workflow in their office environment (flexibility).