wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Senior Software Engineer Role(1,785)/protege (24) - Senior Software Engineer, Data Processing
protege

protege - Senior Software Engineer, Data Processing

Remote - USA3w ago
RemoteSeniorNACloud ComputingArtificial IntelligenceSenior Software EngineerSenior Data EngineerAWSAirflowDagsterData Quality

Requirements

• 5+ years building and operating production backend or data systems, with real experience in data processing at scale • Hands-on experience designing and running large-scale data pipelines • Experience with distributed data processing • Strong proficiency with AWS • Comfort with messy, varied, high-volume data and high ambiguity, with a knack for finding patterns in complex environments • Attention to detail without losing speed, and a bias to action • Excited to work on a product built around moving and processing large volumes of data • Curious, tenacious, and proactive • Experience processing one or more specific modalities at scale: medical imaging (e.g., DICOM), text, audio or video • Background working with sensitive or regulated data environments (HIPAA, healthcare compliance, PHI handling) • Experience with streaming systems or workflow orchestration (e.g., Airflow, Dagster) • Prior startup experience as a founding or early engineer • Familiarity with ML, NLP, or LLM-based systems, including embeddings and fine-tuning

Responsibilities

• INGESTION & PROCESSING SYSTEMS • Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well-structured datasets • Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream • Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing • Build parsers, validators, and normalization logic that can systematically handle messy, non-standard, and high-variance source formats • Turn repeated one-off data handling work into reusable processing patterns, internal tooling, and platform capabilities • SCALE, PERFORMANCE & RELIABILITY • Build for high volume and high throughput, optimizing systems for reliability, cost, and speed • Work across distributed and parallel compute systems to process workloads that do not fit well on a single machine • Choose the right execution model for the workload, including batch processing, distributed execution, and modern compute patterns for unstructured data and inference-heavy processing • Diagnose and resolve bottlenecks across ingestion and processing systems, and keep performance from degrading as volume and modality complexity grow • DATA QUALITY, SECURITY & COMPLIANCE • Build validation and quality checks that catch bad, incomplete, or malformed data before it propagates downstream • Handle sensitive and regulated data, including PHI, with the security and care the domain demands, including de-identification where required • Track provenance, metadata, and usage constraints through the ingestion path so downstream use remains compliant and auditable • Raise the quality bar for observability, debuggability, and operational reliability across the ingestion layer • CROSS-FUNCTIONAL PARTNERSHIP • Partner with product and Data Lab to support new modalities, new partner requirements, and non-standard source data • Work directly with partner engineering teams when needed to translate source-system realities into robust ingestion and processing design • Surface recurring patterns that are worth standardizing into reusable transforms, validators, and internal tooling • Help shape how Protege handles new data types as the platform expands into more complex data environments • WHAT SUCCESS LOOKS LIKE • Get productive in the codebase and ship your first improvements to existing pipelines • Build a working map of the ingestion and processing stack, the major data flows, and how we handle each modality • Meet the engineering, product, and Data Lab teams to understand how the function operates across the company • 60 DAYS: TAKE OWNERSHIP • Own a processing pipeline or modality end to end, from ingestion through delivery of AI-ready output • Develop depth in how we handle one or two data types at scale • Start raising the bar on data quality, observability, and processing best practices • 90 DAYS: OPERATE INDEPENDENTLY • Own a significant part of the ingestion and processing layer and lead design on new modalities or scaling challenges • Ship reliably with minimal hand-holding, and help unblock others working in the data layer • Identify at least one leverage opportunity — a reusable transform, tool, or architectural improvement — worth investing in, and drive it

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

The Allen Institute for Artificial IntelligenceThe Allen Institute for Artificial Intelligence - 3 open Senior Software Engineer roles1mo ago
·Seattle, WA·$126k - $189k/year
In OfficeNASeniorCloud ComputingArtificial IntelligenceSenior Software EngineerSenior Data EngineerPythonData QualityAirflowAWSDocker
Gusto, Inc.Gusto, Inc. - Senior Software Engineer, Data Platform2mo ago
·Denver, CO;San Francisco, CA;New York, NY;Los Angeles, CA;Seattle, WA;Toronto, Ontario, CAN - Remote - Hybrid·$163k - $204k/year + Equity
In OfficeNASeniorCloud ComputingData AnalyticsSenior Software EngineerSenior Data EngineerAWSClickHouseRedshiftAirflowPython
apellaapella - Senior Software Engineer, Data Platform4mo ago
·Remote - United States·$175k - $225k/year
RemoteNASeniorCloud ComputingData AnalyticsSenior Software EngineerSenior Data EngineerdbtKafkaDagsterSQLAirflow
VendeluxVendelux - Senior Software Engineer, Data & AI- Northeast3mo ago
·United States
In OfficeNASeniorCloud ComputingArtificial IntelligenceSenior Software EngineerSenior Data EngineerSenior Data ScientistSQLPythonAirflowAWSGCP
hive.cohive.co - Senior Software Engineer, Data2mo ago
·Remote - Toronto, Ontario, Canada·$91k - $138k/year + Equity
RemoteNASeniorCloud ComputingArtificial IntelligenceSenior Software EngineerSenior Data EngineerAWSPythonMongoDBDjangoRedshift
NateraNatera - Senior Software Engineer, Data & AI Solutions4d ago
·Remote - US Remote·$125k - $156k/year
RemoteNASeniorInsurancePharmaceuticalsSenior Software EngineerSenior Data EngineerSQLPythonAWSSnowflakeAirflowdbtDagsterTerraformHIPAA CompliancePower BIData QualityTableauQlikData VisualizationReportingVector
Life360Life360 - Senior Data Engineer (AI Native)3w ago
·Remote - USA·$104k - $192k/year + Equity
RemoteNASeniorCloud ComputingArtificial IntelligenceSenior Data EngineerAWSKafkaApache SparkDatabricksAirflow
spellbook.legalspellbook.legal - Senior Data Engineer1mo ago
·Remote - Canada·$104k - $155k/year + Equity
RemoteNASeniorCloud ComputingArtificial IntelligenceSenior Data EngineerAWSDockerDagsterTypeScriptCDK
The Voleon GroupThe Voleon Group - Senior Software Engineer, Strategy Research Analytics1mo ago
·United States·$200k - $15k/year
RemoteNASeniorCloud ComputingArtificial IntelligenceSenior Software EngineerSQLPythonAWSData GovernanceAirflow

Browse more by category

Show 1,785 moreSenior Software EngineerShow 351 moreSenior Data EngineerShow 3,747 moreAWSShow 384 moreAirflowShow 65 moreDagsterShow 787 moreData Quality
Privacy·Terms··Contact·FAQ·Wagey on X