The Motley Fool - Senior Data Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Python Mastery: 5+ years of professional Python development. Comfortable with object-oriented design, data manipulation libraries (pandas, NumPy). • Python Mastery: • Familiarity with financial research data vendors and feed/API products such as CapIQ Xpressfeed, FactSet, Bloomberg, Thomson Reuters/Refinitiv/LSEG, Russell or MSCI. • Familiarity with financial business data and feed/API products from Broadridge, Morningstar and custodian banks and fund administrators. • ETL & Orchestration: Proven experience designing and operating ETL/ELT pipelines. Apache Airflow and Lambda experience is a plus. • ETL & Orchestration: • Snowflake & Advanced SQL: Deep expertise in Snowflake architecture (clustering keys, micro-partitions, Snowpipe, Stages). Able to write complex analytical SQL, window functions, CTEs, recursive queries, and optimize them for cost and performance. • Snowflake & Advanced SQL: • Infrastructure as Code: Hands-on experience with AWS CDK or Terraform. You define infrastructure in code, not in the AWS Console. • Infrastructure as Code: • Exposure to LLM integration patterns: markdown files and prompt engineering. Experience with RAG, knowledge bases, and embeddings is a plus. • Nice-to-Have/Pluses: • Cloud Fluency (AWS): Working knowledge of the AWS ecosystem: Lambda, ECS Fargate, Step Functions, S3, EventBridge, CloudWatch, and RDS. • Cloud Fluency (AWS): • Experience with data visualization tools (Tableau, Streamlit, or similar) for self-service analytics. • Background in data governance, data cataloging, or data lineage tooling. • CI/CD Experience: Demonstrated experience maintaining CI/CD pipelines for automated testing and deployment of data engineering and application code using Github actions and Terraform. • SQL & Database Tuning: Experience profiling and optimizing queries across both OLAP (Snowflake) and OLTP (PostgreSQL/Aurora) systems. Familiarity with EXPLAIN plans, indexing strategies, and database-level performance tuning. • SQL & Database Tuning: • Machine Learning Fundamentals: Practical experience building, evaluating, and deploying ML models. Familiarity with common frameworks (scikit-learn, XGBoost) and an understanding of when and how to apply ML to business problems. • Machine Learning Fundamentals:
Responsibilities
• You will help lead the transformation of our data infrastructure, moving the team from manual processes and spreadsheet-based workflows into scalable, governed, automated data systems. This role sits at the center of investment operations, analytics, reporting, and emerging AI initiatives. • What Strategic Initiatives You Will Drive? • The “Golden Source” Transformation: Migrating data reliance from spreadsheets and manual processes into a governed Snowflake warehouse with documented lineage, quality checks, and self-service analytics. • The “Golden Source” Transformation: • Automated Ingestion Pipelines: Replacing manual file drops with event-driven Airflow DAGs and AWS Lambda functions that ingest, validate, and transform data from external vendors and internal systems in near real-time. • Automated Ingestion Pipelines: • Infrastructure as Code: Defining all cloud infrastructure with Terraform or AWS CDK so that development, staging, and production environments are reproducible, version-controlled, and auditable. • Infrastructure as Code: • Data-Powered AI Initiatives: Establishing the data foundations that enable AI across the business: clean, governed, and accessible datasets that feed AI agents, natural-language interfaces, and intelligent automation. Machine learning techniques such as anomaly detection, classification, and forecasting will augment these initiatives where appropriate. • Data-Powered AI Initiatives: • Automate how we tell our story with data: Build reusable templates and automation frameworks to close the loop between our database and the materials our team uses to win and retain business. That means pulling live data into branded one-pagers, generating narrative-driven slide decks, websites, populating email campaigns, and producing social-ready content. • Automate how we tell our story with data: • Okay, but what will you actually do in this role? • Data Engineering & ETL — 35% • Pipeline Design & Orchestration: Design, build, and maintain robust ETL/ELT pipelines using Apache Airflow (MWAA). Author DAGs that handle complex dependencies across external data vendors, internal models, and downstream consumers. • Pipeline Design & Orchestration: • Data Integration: Ingest data from diverse sources including SFTP feeds, REST APIs, flat files, and third-party financial data providers. Normalize and conform data into a consistent analytical model. • Data Integration: • Data Quality & Monitoring: Build “circuit breakers” into pipelines: automated data quality checks that halt downstream processing and alert the team via CloudWatch and Slack when anomalies are detected. • Data Quality & Monitoring: • Serverless Processing: Implement AWS Lambda functions for lightweight, event-driven tasks such as triggering ingestion when files land in S3 or validating data payloads before loading. • Serverless Processing: • Documentation: Maintain and document the data catalog so institutional knowledge lives in the system, not in your head. • Documentation: • Snowflake Data Warehouse & SQL — 35% • Warehouse Architecture: Serve as the subject-matter expert for Snowflake. Design schemas, manage data loading via Stages and Snowpipe, and implement role-based access controls. • Warehouse Architecture: • Complex SQL Development: Write advanced analytical SQL: window functions, CTEs, recursive queries, pivots; to support investment reporting, performance attribution, and ad-hoc analysis. • Complex SQL Development: • Query Tuning & Optimization: Profile and optimize slow-running queries. Leverage clustering keys, micro-partition pruning, materialized views, and result caching to minimize compute cost and maximize performance. • Query Tuning & Optimization: • Cloud Infrastructure & CI/CD — 10% • Infrastructure as Code: Define and deploy cloud resources using Terraform or AWS CDK. Treat infrastructure as software with version control, peer review, and automated testing. • CI/CD Pipelines: Help design and maintain CI/CD workflows with GitHub Actions for automated testing, linting, and deployment of data pipelines, infrastructure, and application code. • CI/CD Pipelines: • Analytics, Reporting & AI — 20% • Analytical Modeling: Partner with investment and business teams to translate questions into data models, dashboards, and reports that drive strategic decisions using Tableau. • Analytical Modeling: • Analytics Presentation: Design and build automated pipelines that pull data from source systems and render it into production-ready marketing outputs: one-pagers, pitch decks, email campaigns, and social content. • Analytics Presentation: • AI Integration: Be a resource for software engineers to build an AI layer on top of existing data infrastructure, enabling LLMs to securely query fund performance data via APIs and answer natural-language questions for internal stakeholders. • AI Integration:
Benefits
• By applying on this site, you acknowledge that The Motley Fool will be collecting the personal data you provide for our recruiting purposes. Please see our Applicant Privacy Notice for additional information about how we process, transfer, and store your data, including where that data is stored, and about any additional privacy rights you may have based on your jurisdiction.
No credit card. Takes 10 seconds.