RxSense - Senior Data Engineer

Remote - USA2mo ago

Remote Senior NA Cloud Computing Health Insurance Logistics Insurance Senior Data Engineer Python Data Governance SQL SQL Server Snowflake

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click

Requirements

• 5+ years of professional data engineering experience with strong proficiency in SQL and data pipeline development for production systems. • Deep hands-on experience with Snowflake, including data modeling, performance optimization, warehouse management, and cost governance. • Strong experience with SQL Server, including complex query development, stored procedures, and understanding of transactional database patterns. • Proficiency in Python for data pipeline development, automation, and integration with cloud services and APIs. • Experience with ETL/ELT tools and frameworks (dbt, Matillion, Airflow, or equivalent) and data orchestration patterns. • Experience with AWS cloud services relevant to data engineering (S3, Glue, Lambda, IAM, or equivalent). • Strong understanding of data governance, data quality frameworks, and access control models—particularly in environments with sensitive or regulated data. • Experience building data catalogs, lineage documentation, or metadata management systems. • Excellent problem-solving skills and a track record of building data systems that are reliable, well-documented, and maintainable by teams beyond the original author. • Bachelor’s degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience. • Preferred • Experience in healthcare, pharmacy benefits, health insurance, or a related regulated industry. Understanding of claims data structures, pharmacy transactions, or PBM operations is highly valued. • Experience with healthcare data standards (HL7, FHIR, NCPDP) and HIPAA compliance requirements for data infrastructure. • Experience building data infrastructure that supports AI/ML workloads, including feature engineering, training data management, and model serving pipelines. • Familiarity with data de-identification techniques and strategies for managing PHI in non-production environments. • Experience with Redis or distributed caching systems and understanding of how cached data layers interact with analytical systems. • Track record of migrating or modernizing legacy data architectures into cloud-native platforms. • Experience with data observability tools (Monte Carlo, Metaplane, or equivalent) and SLA-driven pipeline monitoring.

Responsibilities

• Data Pipeline Design and Operations • Design, build, and maintain production-grade ETL/ELT pipelines that move data between SQL Server (operational), Snowflake (analytical), and downstream consumers including AI systems, reporting tools, and business intelligence platforms. • Optimize data ingestion and transformation patterns for healthcare-scale volumes—millions of claims, pricing transactions, and member records processed daily. • Implement data quality checks, validation rules, and monitoring that catch issues before they propagate to analytics, AI models, or regulatory reports. • Build and maintain data models in Snowflake that support self-service analytics, enabling product, clinical, actuarial, and operations teams to answer their own questions. • Manage pipeline scheduling, orchestration, and SLA monitoring to ensure data freshness targets are met across all business-critical data products. • Data Governance and Access • Implement role-based access controls (RBAC) and data governance frameworks that enable squad-level and group-level data access rather than ad hoc individual permissions. • Build and maintain data catalogs and lineage documentation that make it clear what data exists, where it comes from, what transformations have been applied, and who has access. • Design data access patterns specifically for AI agents, ensuring agents can retrieve the data they need with appropriate authorization, audit trails, and containment boundaries. • Ensure all data infrastructure complies with HIPAA requirements, including data de-identification for non-production environments, PHI access logging, and encryption at rest and in transit. • Collaborate with security and IT teams to implement secrets management best practices for database credentials, API keys, and service accounts used in data pipelines. • Platform and Infrastructure • Architect Snowflake environments for cost-effective performance, including warehouse sizing, clustering, materialized views, and query optimization strategies. • Support the lower-environment data strategy by implementing alternatives to full production data replication, including data subsetting, synthetic data generation, and lookback-window-based approaches. • Collaborate with DevOps and infrastructure teams on AWS-based data infrastructure, including S3 storage optimization, IAM policies for data access, and cost management across data storage tiers. • Evaluate and implement data integration tools and frameworks that reduce pipeline development time while maintaining reliability and observability. • AI and Analytics Enablement • Partner with the AI team to build data foundations for AI workloads, including feature stores, training data pipelines, and governed access to claims and pricing data for model development. • Build data pipelines that support real-time and near-real-time use cases for AI-driven pricing, claims analysis, and clinical intelligence. • Develop data products that leverage RxSense’s longitudinal claims data as a compounding competitive advantage—enabling trend analysis, formulary optimization, and cost management insights. • Support the development of financial visibility tools that enable reporting on per-customer cost and spend, closing a critical gap in current business intelligence capabilities.

Get Started Free

No credit card. Takes 10 seconds.

Requirements

Responsibilities