wagey.ggwagey.ggv1.0-0f5e85e-22-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Solutions Engineer Role/protege - Solutions Applied Data Scientist, Healthcare
protege

protege - Solutions Applied Data Scientist, Healthcare

United States1mo ago
RemoteNASolutions EngineerData ScientistData QualityData AnalysisPythonSQLRuby

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• evaluate feasibility of cohort logic • This work helps ensure that projects are grounded in what data actually exists. • Delivery Tooling & Workflow Improvements • As delivery patterns emerge, you will help develop tools and reusable workflows that improve efficiency. • Examples include: • reusable SQL templates for cohort construction • automated validation checks • scripts for dataset preparation • tools that reduce manual delivery work • This role is an important bridge between manual dataset delivery and scalable data infrastructure. • What Success Looks Like • 30 days: Learn the delivery motion and source-data reality. Build working knowledge of Solutions workflows, healthcare data partners, common cohort patterns, and how complex requests get escalated. Shadow active projects, understand existing QA approaches, and start contributing to scoped feasibility and validation work. • 60 days: Own scoped technical escalations and create early leverage • Independently support complex cohort-definition and dataset-construction work, write and validate SQL / Python workflows, and help Solutions Leads answer hard feasibility questions with clear tradeoffs. • 90 days: Become a trusted technical partner across delivery • Handle the hardest dataset problems with limited oversight, improve QA and repeatability, and propose workflow or platform improvements that reduce bespoke work across engagements. • Experience working with large structured healthcare datasets • Strong SQL and python skills and experience writing complex queries • Experience joining and transforming large datasets • Experience performing data validation and exploratory analysis • Strong Python skills for data analysis and scripting • Experience working with structured file formats (CSV, Parquet, etc.) • Ability to translate ambiguous requirements into concrete data logic • Strong communication skills and ability to collaborate with technical and non-technical stakeholders • Working with Protege • We move fast - thoughtfully. Speed matters in what we're building, and so does intention. We're biased toward action and always learning. • We're a lean, high-trust team. Everyone has real ownership. Clarity and autonomy drive our best work. • We take our work seriously, not ourselves. We solve hard problems with humility and celebrate wins - big and small. • We're kind, direct, and inclusive. We give feedback early and often, with the goal of helping one another grow. • We're builders at heart. Every person at Protege is hands-on, resourceful, and focused on creating momentum. • We grow fast - together. You'll be surrounded by people who care about impact, who challenge you to think bigger, and who are genuinely excited about what comes next.

Responsibilities

• Technical Escalation & Delivery Collaboration • During delivery projects, Solutions Leads may encounter complex data challenges that require deeper analysis or technical problem-solving. You will act as a technical partner, helping solve things such as: • Complex cohort definitions that require multi-source joins • Linking datasets across different data partners • Investigating unexpected gaps or anomalies in delivered data • Evaluating whether requested variables or labels exist in available datasets • Determining whether a dataset can realistically satisfy model requirements • You will work collaboratively with Solutions Leads to unblock delivery challenges while keeping projects moving toward successful completion. • When solutions require infrastructure or pipeline changes, you will partner with the Solutions Engineer and internal platform engineering teams to implement the required workflows. • Cohort Definition & Dataset Construction • Work with Solutions Leads to translate customer requirements into concrete dataset logic. You will help ensure that datasets accurately represent the intended population and meet customer specifications. • Writing complex SQL queries to construct cohorts • Implementing inclusion and exclusion logic • Joining datasets across multiple data sources • Validating linkage between datasets • Identifying and resolving inconsistencies or missing fields • Partner with Solutions Leads to resolve complex data questions that arise during project delivery • Escalate or collaborate with delivery engineers when dataset construction requires pipeline changes or large-scale data processing • Data Quality Validation & Completeness Analysis • Before complex datasets are delivered to customers you will help validate that they meet required standards. You will work closely with Solutions Leads before datasets are delivered to ensure that the datasets meet agreed acceptance criteria. Review bespoke QA methodology and suggest platform improvements to Product and Engineering to decrease custom work across engagements. • Performing data completeness analysis • Investigating missing or anomalous data • Verifying cohort logic results • Validating row counts and dataset structure • Creating summary statistics and validation outputs • Data Feasibility • Many customer projects involve AI researchers who are defining the healthcare datasets required to train or evaluate models. You will work with these customer teams to translate research goals into practical dataset specifications. • Reviewing dataset requests from AI researchers and model development teams • Helping clarify and refine requirements for model training or evaluation datasets • Evaluating whether requested variables or labels exist in available data sources • Identifying proxy variables or alternative dataset structures when ideal variables are unavailable • Assessing feasibility of requested cohort definitions given real-world data constraints • Explaining data limitations, tradeoffs, and potential biases to technical stakeholders • Iterating with researchers to converge on datasets that are both scientifically meaningful and operationally feasible • This role requires someone who is comfortable engaging with technically sophisticated stakeholders while grounding conversations in the realities of messy, real-world data. • Data Partner & Source Data Analysis • Many datasets originate from external healthcare data partners. • You will help analyze partner datasets to: • understand schema and field availability • assess data quality and completeness

Benefits

• Pre-seed/seed-stage startup • Series A or B startup • Series C or D startup • Series E+ startup • Protege would be my first startup • or drag and drop here • In 2-3 sentences, please explain how your experience aligns well with the role

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X