OnHires - Senior Scraping Infrastructure Engineer
Upload My Resume
Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT
Requirements
• Core Experience: Proven, hands-on professional experience in high-volume web scraping and data extraction using Python. • Anti-Blocking Expertise: Deep, practical knowledge of anti-bot solutions, including CAPTCHA solving, browser fingerprinting, and effective proxy/IP management strategies. • Anti-Blocking Expertise: • Technical Depth: Solid understanding of HTML parsing, browser automation techniques, and asynchronous programming. • Technical Depth: • Frameworks: Proficiency with leading web scraping frameworks (e.g., Playwright, Scrapy, or Selenium). • Frameworks: • Web Knowledge: Strong knowledge of REST APIs, HTTP protocols, and effective proxy management. • Web Knowledge: • Database Skills: Familiarity with both SQL and NoSQL databases for efficient data storage and processing. • Infrastructure: Experience with Docker, Linux environments, and version control (Git). • Infrastructure: • Communication: Fluent in English (written and spoken). • Communication: • Mindset: Self-driven, pragmatic, and capable of taking full ownership of critical, high-impact infrastructure projects. • Mindset: • Experience with advanced async libraries (e.g., asyncio) • Understanding of data quality validation and pipeline monitoring tools. • What they offer • What they offer • Impact & Ownership: A high degree of freedom and the opportunity to have a meaningful, measurable impact on a growing scale-up business. • Impact & Ownership: • Flexibility: A high degree of flexibility – our client is a remote-first company and actively support remote work. • Flexibility: • Growth: A competitive compensation package and dedicated support for your personal & professional development (ongoing training & coaching). • Growth: • Team & Atmosphere: A great work atmosphere within a small, talented, and international team. • Team & Atmosphere: • Office (Optional): A modern office located on the campus of Wildau Tech University, easily accessible by public transport (just outside Berlin). • Office (Optional):
Responsibilities
• Infrastructure Strategy & Architecture: Architect, build, and maintain the core infrastructure for massive, large-scale asynchronous data extraction system. • Infrastructure Strategy & Architecture: • Advanced Resilience Engineering: Design, implement, and continuously optimize sophisticated anti-blocking strategies, IP rotation, fingerprint management, and anti-bot bypass techniques to ensure high reliability and consistent uptime against modern web blocking. • Advanced Resilience Engineering: • Operational Excellence & Monitoring: Implement robust monitoring, alerting, and logging systems to proactively debug, troubleshoot, and continuously improve scraper performance, reliability, and data quality across the platform. • Operational Excellence & Monitoring: • Core Development: Develop, test, and deploy highly robust and fault-tolerant web scraping components using advanced Python tools (Scrapy, Playwright, Selenium, Requests, etc.). • Core Development: • Integration & Pipelines: Manage and automate high-volume data ingestion pipelines and seamless integrations with internal and external REST APIs. • Integration & Pipelines: • DevOps & Automation: Drive DevOps best practices, including managing infrastructure with Docker, Nomad knowledge (a plus), CI/CD pipelines • DevOps & Automation: • Collaboration & Mentorship: Partner with other engineers to set standards, enhance core infrastructure tooling, and mentor junior team members. • Collaboration & Mentorship:
No credit card. Takes 10 seconds.