wagey.ggwagey.gg
38,923  jobs38,923  jobs
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs(38,923)/Research Engineer Role(148)/wynd-labs (5) - Research Crawling Engineer
wynd-labs

wynd-labs - Research Crawling Engineer

United States+ Equity2mo ago
RemoteNAArtificial IntelligenceDeveloper ToolsResearch EngineerGoRustJavaC++Python

Requirements

• Strong programming experience in one or more of: Go, Rust, Python, Java, or C++ • Experience building web crawlers or large-scale data pipelines • Solid understanding of HTTP, networking, and browser behavior • Familiarity with distributed systems and parallel processing • Experience working with large datasets (TB–PB scale preferred) • Ability to debug unstable or adversarial environments • Preferred / Bonus: • Experience with NLP pipelines or dataset curation for ML • Familiarity with LLM pretraining data or retrieval systems • Experience with headless browsers (e.g., Chrome DevTools Protocol, Playwright, Puppeteer) • Knowledge of proxy systems, IP rotation, and large-scale request orchestration • Background in data quality evaluation or benchmarking • Experience running workloads on cloud or bare-metal infrastructure • What This Role Involves: • Operating at the boundary of scale and reliability • Adapting to constantly changing web environments • Balancing throughput, coverage, and data quality • Owning end-to-end data acquisition pipelines • Evaluation Criteria: • Ability to design systems that scale without degrading quality • Practical problem-solving under real-world constraints • Speed of iteration and ownership • Measurable improvements in data coverage, quality, or efficiency

Responsibilities

• Build and maintain large-scale web crawlers across diverse domains • Design high-throughput, fault-tolerant systems for data collection (millions to billions of URLs/day) • Handle anti-bot systems, rate limits, and dynamic/JS-heavy sites • Develop pipelines for cleaning, deduplication, filtering, and normalization • Construct and maintain datasets for research and model training • Monitor crawl performance, coverage, and data quality; iterate quickly • Collaborate with research teams to align data collection with modeling needs • Optimize infrastructure for cost, latency, and reliability

Benefits

• Based on experience and demonstrated ability to operate at scale • Example Projects: • Build a distributed crawler for a continuously updated, high-quality web project • Design a system to classify and filter billions of pages for pretraining • Extract structured data from dynamic, JS-heavy sites at scale • Improve deduplication and quality scoring across multimodal datasets • Opportunity. We are at the forefront of developing a web-scale crawler and knowledge graph that improves access to public web data and extends the value of AI to the people. • Culture. We're a lean team with a high bar. We come to work not to be comfortable, but to find out what we're capable of and to do work that matters. We're not calling for people who keep things moving. We're calling for people who make everyone around them better. • We prioritize low ego and high output. This is a fully remote team. • Compensation. You’ll receive a competitive salary, benefits and equity package.

Apply in one click

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Similar roles

TuringTuring - Research Engineer2mo ago
·Remote - Brazil
RemoteLATAMMidSoftwareResearch EngineerC++JavaGoRustPython
OpenAIOpenAI - Researcher, Alignment Science3w ago
·San Francisco, California, United States - Hybrid·$250k - $445k/year + Equity
In OfficeNAArtificial IntelligenceOil & GasResearch EngineerPythonReportingClose
helm-aihelm-ai - Research Engineer4mo ago
·Remote - Canada·$150k - $250k/year + Equity
RemoteNASeniorResearch EngineerOptimismTensorFlowPythonC++Base
menlomenlo - Robotics Researcher, Manipulation1w ago
·Singapore - Hybrid·Equity
In OfficeAPACArtificial IntelligenceRoboticsResearch EngineerC++JAXPython
HUDHUD - Research Engineer1mo ago
·Remote - San Francisco, California, United States
RemoteNAArtificial IntelligenceLogisticsResearch EngineerPythonDockerLinuxRevenue Growth
helm-aihelm-ai - Research Engineer, Optimization4mo ago
·Remote - Canada
RemoteNASeniorArtificial IntelligenceResearch EngineerPythonLearning & DevelopmentReportingTensorFlow
antimetalantimetal - Research Engineer5mo ago
·New York, NY, United States·$200k - $300k/year + Equity
In OfficeNAMidNonprofitAutomotiveArtificial IntelligenceMaterialsResearch EngineerPythonTypeScriptReportingOutreach
Helm.aiHelm.ai - Research Engineer, Optimization5mo ago
·Remote - USA
RemoteNASeniorArtificial IntelligenceResearch EngineerPythonTensorFlowLearning & DevelopmentReporting
hcompanyhcompany - Research Engineer, Model Inference & Serving2mo ago
·United Kingdom - Hybrid
In OfficeEMEAStaffArtificial IntelligenceMaterialsResearch EngineerStaff EngineerRustC++PythonGoJAX

Browse more by category

Show 148 moreResearch EngineerShow 2,085 moreGoShow 732 moreRustShow 1,848 moreJavaShow 924 moreC++Show 6,338 morePython
Privacy·Terms··Contact·FAQ·Wagey on X