Data Engineer
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or a related technical field. • Extensive experience with database systems such as Redshift, Snowflake, or similar cloud-based solutions. • Advanced proficiency in SQL and experience with optimizing complex queries for performance. • Hands-on experience with building and managing data pipelines using tools such as Apache Airflow, AWS Glue, or similar technologies. • Solid understanding of ETL (Extract, Transform, Load) processes and best practices for data integration. • Experience with infrastructure automation tools (e.g., Terraform, CloudFormation) for managing data ecosystems. • Knowledge of programming languages such as Python, Scala, or Java for pipeline orchestration and data manipulation. • Strong analytical and problem-solving skills, with an ability to troubleshoot and resolve data flow issues. • Familiarity with containerization (e.g., Docker) and orchestration (e.g., Kubernetes) technologies for data infrastructure deployment. • Collaborative team player with strong communication skills to work with cross-functional teams.
Responsibilities
• Designing, building, and optimizing scalable data pipelines to process and integrate data from various sources in real-time or batch modes. • Developing and managing ETL/ELT workflows to transform raw data into structured formats for analysis and reporting. • Integrating and configuring database infrastructure, ensuring performance, scalability, and data security. • Automating data workflows and infrastructure setup using tools like Apache Airflow, Terraform, or similar. • Collaborating with data scientists, analysts, and other stakeholders to ensure efficient data accessibility and usability. • Monitoring, troubleshooting, and improving the performance of data pipelines and infrastructure to ensure data quality and flow consistency. • Working with cloud infrastructure (AWS, GCP, Azure) to manage databases, storage, and compute resources efficiently. • Implementing best practices for data governance, data security, and disaster recovery in all infrastructure designs. • Staying current with the latest trends and technologies in data engineering, pipeline automation, and infrastructure as code.
Benefits
• Opportunity. We are at at the forefront of developing a web-scale crawler and knowledge graph that allows ordinary people to participate in the process, and share in the benefits of AI development. • Culture. We’re a lean team working together to achieve a very ambitious goal of improving access to public web data and distributing the value of AI to the people. We prioritize low ego and high output. • Compensation. You’ll receive a competitive salary, benefits and equity package.