Samba - Data Scientist

Warsaw2mo ago

In Office Mid EMEA Cloud Computing Data Analytics Artificial Intelligence Data Scientist CAO Documentation Python SQL AWS GCP

Requirements

• Bachelor's degree required in Statistics, Data Science, Computer Science, Mathematics or a related quantitative field; Master's strongly preferred • 3–5 years of hands-on data science experience with demonstrated ability to own and deliver complex, multi-sprint projects independently • Advanced Python with production-quality code, testing, and documentation; strong SQL and PySpark for billion-row datasets • Databricks workflows, Delta Lake, and job orchestration; working knowledge of cloud platforms (AWS or GCP) • Solid command of core ML — regression, classification, clustering, model evaluation, and experimental design — applied to complex, high-volume data • Proficiency with MLOps practices: experiment tracking, pipeline orchestration (Airflow), and reproducible model deployment • Exposure to modern AI methodologies: RAG systems, LLM-augmented models, vector databases, and semantic search • Strong communicator — able to translate technical work into clear documentation, user stories, and cross-functional conversations • Demonstrated ability to mentor junior data scientists and contribute to team standards • Hands-on experience with knowledge graph construction, entity resolution, or semantic data modeling (RDF, OWL, SPARQL, or equivalent graph frameworks) • Familiarity with probabilistic record linkage, identity graph approaches, or embedding-based entity matching at scale • Experience with causal inference methods (A/B testing, synthetic control, uplift modeling) • Experience with deduplication, enrichment, or web-to-TV linkage problems • Background in media, ad tech, or measurement — TV viewership (ACR/STB data), digital audience modeling, cross-platform measurement (linear + CTV/OTT), or identity resolution in privacy-constrained environments • Familiarity with the measurement and identity vendor landscape (Nielsen, Comscore, LiveRamp, The Trade Desk • 180,000 zł - 330,000 zł a year

Responsibilities

• Own end-to-end delivery of significant data science projects — from problem scoping and approach design through to production deployment • Make sound, independently-reasoned decisions on methodology, model selection, and evaluation; document them clearly in technical solution documents covering problem statement, approach, metrics, and timeline • Lead solution design for your own initiatives; break down complex epics into well-scoped user stories with clear acceptance criteria, adopting DataOps and MLOps best practices throughout — experiment tracking, pipeline orchestration, model monitoring, and reproducibility • Build production-quality Python and PySpark code on Databricks — well-tested, documented, and reusable — and implement advanced ML and AI-powered workflows including entity resolution, probabilistic record linkage, embedding-based matching, semantic similarity, and LLM-augmented pipelines • Develop and maintain reusable tools, libraries, and documentation that improve team efficiency and technical standards; conduct code reviews with constructive, specific feedback that raises the bar • Mentor junior data scientists on technical execution, code quality, and career development; lead internal talks or workshops on ML topics • Collaborate cross-functionally with product, engineering, and operations — translate business requirements into technical specifications, partner with data engineering on scalable pipeline design, and participate in cross-functional design reviews and working groups