Software Engineer II, Data
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• Minimum of 8 years of related experience with a Bachelor’s degree; or 6 years and a Master’s degree; or a PhD with 3 years experience; or equivalent experience. • Proven ability to design flexible, maintainable ETL systems. • Experience with data pipeline orchestration tools such as Prefect, Airflow, Argo, Databricks, or Spark. • Understanding of the ML model lifecycle; prior work with scientific or ML workflows is a plus. • Hands-on experience with multi-terabyte scale data processing. • Familiarity with AWS; Kubernetes experience is a bonus. • Knowledge of data lake technologies such as Parquet, Iceberg, AWS Glue etc. • Strong Python software engineering skills. • Pragmatic mindset — able to evaluate tradeoffs find solutions that empower ML researchers to move quickly. • Background in bioinformatics or chemistry is a plus.
Responsibilities
• Design and improve data pipelines that process large, multi-modal datasets from a variety of internal and external sources into training datasets for AI models. • Evolve our data storage layer to support analytics, schema evolution, reproducibility, and efficient data access. • Collaborate with ML engineers to improve the performance and reliability of Python-based data processing workflows.
Benefits
• We offer industry leading competitive pay, company paid healthcare, flexible spending accounts, voluntary life insurance, 401K matching, and uncapped vacation to our team. We are in a brand-new state-of-the art facility in beautiful San Diego with an onsite gym, dining, and easy access to great places to live and play.