Chainalysis - Senior Data Engineer, Exposure
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Responsibilities
• Build cloud-native data ingest and aggregation processes that intake gigabytes of data per day. • Develop and optimize high-performance Spark jobs in Python to detect activities for market manipulation, fraud, behavioral patterning, and more. • Build and improve batch-based applications as well as real-time streaming pipelines processing billions of records per day. • Architect and maintain scalable data lakehouse environments using formats like Parquet, Iceberg, and Delta Lake. • Collaborate on building scalable API services on AWS that interface with our data layer to handle 1,000s of requests per second. • Help the team modernize our data stack to operate at 10x current capacity, moving toward highly automated, serverless architectures. • Debug production data quality issues and performance bottlenecks across distributed systems and microservices. • We’re looking for candidates who have: • Experience in designing and implementing cloud-native, distributed data processing systems in a major cloud provider (AWS preferred). • Deep expertise in Python and Apache Spark, with a strong understanding of performance tuning and distributed computing principles. • A bias to ship and iterate alongside product management and design partners to turn raw data into actionable insights. • A technical background with extensive experience working directly on backend systems and large-scale data architecture. • Pride in materializing complex product ideas into stable, production-grade data pipelines. • Exposure to or interest in the cryptocurrency technology ecosystem and the unique data challenges it presents • Mentored other engineers, leading cross-team data initiatives, and driving design and technology decisions is a plus • Worked with Terraform and Kubernetes (EKS) for orchestrating data workloads is a plus • A genuine excitement for significantly scaling large data systems and exploring the latest in "Modern Data Stack" technologies • Technologies we use (experience not required): • Languages: Python, Java • Languages • Big Data: Spark (PySpark), Flink, Databricks • Big Data • Storage: Parquet, Iceberg, Delta Lake, Paimon • Storage • Cloud/Infra: AWS (Serverless, EMR, Lambda), Kubernetes, Terraform • Cloud/Infra • CI/CD: GitHub including GitHub Actions • CI/CD • Database: PostgreSQL, Snowflake/Redshift • Database • Blockchain technology is powering a growing wave of innovation. Businesses and governments around the world are using blockchains to make banking more efficient, connect with their customers, and investigate criminal cases. As adoption of blockchain technology grows, more and more organizations seek access to all this ecosystem has to offer. That’s where Chainalysis comes in. We provide complete knowledge of what’s happening on blockchains through our data, services, and solutions. With Chainalysis, organizations can navigate blockchains safely and with confidence. • You belong here. • You belong here.
Similar Jobs
No credit card. Takes 10 seconds.