Edgesource Corporation - Data Engineer

Chantilly, Virginia, United States - Hybrid1mo ago

In Office Mid NA Cloud Computing Artificial Intelligence Data Engineer Learning & Development SQL Python Linux AWS

Requirements

• Active TS/SCI FSP • 3–5 years + of professional experience in data engineering or related roles. • Strong collaboration skills to work effectively with Data Scientists, Analysts, and Engineering teams. • Ability to communicate complex technical concepts to non-technical stakeholders. • Detail-oriented, curious, and committed to data quality. • Capable of managing multiple priorities in a fast-paced environment. • Python, SQL, and PySpark (highly desired) for data processing and pipeline development. • Elastic/OpenSearch for search and analytics solutions. • Experience with AWS cloud services and Linux environments. • Git for version control and collaborative development. • Understanding of machine learning workflows and MLOps concepts. • Hands-on experience modeling, querying, and optimizing graph databases, especially Neo4j highly desired • Working at Edgesource • As an ISO 9001:2015 certified and CMMI Level 3 appraised small business, Edgesource specializes in providing a variety of technical solutions to include software development, database services, enterprise networking, data center virtualization, and management support. We are always seeking top-talent to join our team in helping to address the most critical technical challenges facing our nation. • At Edgesource, we understand that our employees are our greatest asset, and as such we offer a wide array of benefits to support the well-being of our staff to include: • Flexible PTO Policy + 11 Paid Holidays • Flexible Work Schedules (Remote / Hybrid) • Medical / Dental / Vision / Flexible Spending Account (FSA) • 401k Plan with Match • Tuition & Professional Development Support

Responsibilities

• Design, develop, and maintain ETL/ELT pipelines for batch and real-time processing using Python and SQL. • Integrate data from multiple sources, including databases, APIs, streaming platforms, PDFs, and MS Office files. • Build scalable data architectures to support analytics and machine learning workloads. • Optimize data processing and queries for performance and cost efficiency in AWS S3. • Exposure to PySpark or other big data frameworks is a plus for future pipeline scalability. • Develop and implement web scraping and data ingestion workflows to collect open-source data, integrating content and producing structured datasets and visualizations for analytics and stakeholder consumption. • Data Management & Optimization • Collect, clean, and validate large volumes of structured and unstructured data. • Track data versions, implement data quality checks, and ensure data reliability. • Design and optimize data storage in AWS S3, including raw, intermediate, and final datasets. • Implement data governance practices, including documentation, cataloging, lineage, and security. • Ensure compliance security standards. • Collaboration with Data Science & Stakeholders • Work closely with Data Scientists, Analysts, and stakeholders to understand data requirements. • Prepare clean, structured, and feature-ready datasets for analytics and machine learning. • Support feature engineering, aggregations, and transformations at scale. • Assist in deploying ML models to production, ensuring monitoring, versioning, and performance optimization. • APIs, Containers & CI/CD • Integrate with REST APIs. • Utilize Docker, Kubernetes, Git, and CI/CD pipelines to deploy and manage workflows. • Documentation & Communication • Document pipelines, data schemas, and transformations clearly. • Communicate technical concepts effectively with cross-functional teams. • Participate in code reviews and promote best practices across the team.