wagey.ggwagey.ggv1.0-1fede34-14-May
Browse Tech JobsCompaniesFeaturesPricingFAQs
Log InGet Started Free
Jobs/Web Developer Role/nessolabs - Web Scraping Engineer — European Public Procurement
nessolabs

nessolabs - Web Scraping Engineer — European Public Procurement

Indonesia1w ago
In OfficeAPACCloud ComputingWeb DeveloperInfrastructure EngineerProcurementPythonDocumentationSeleniumPlaywright

Upload My Resume

Drop here or click to browse · Tap to choose · PDF, DOCX, DOC, RTF, TXT

Apply in One Click
Apply in One Click

Requirements

• Strong async Python — you think in asyncio, not time.sleep() • Playwright or Selenium experience — you've intercepted XHR responses, handled SPAs, and debugged timing issues • Resilience mindset — retry with backoff, graceful degradation, circuit breakers. Your scraper doesn't crash at 3 AM. • Comfort with messy HTML — you can write a multi-strategy extractor that handles /, /, and / on the same site • Data parsing skills — Italian locale, date formats, CIG validation, document type detection • Bonus: experience with Italian PA (Pubblica Amministrazione) portals, ANAC/PVL datasets, or OCDS data formats • Python 3.11+ · Playwright · httpx · BeautifulSoup · Pydantic · SQLAlchemy 2.0 · PostgreSQL · Prefect · AWS S3 · Supabase • No whiteboard algorithms. We'll send you a hands-on technical assessment: a mock procurement portal with real-world challenges. You build a scraper. We evaluate the code.

Responsibilities

• Build and maintain async scrapers (Python + Playwright) against Italian and later European public procurement portals (Maggioli PortaleAppalti, ANAC, MePA, and others) • Handle real-world challenges: JSESSIONID session management, FriendlyCaptcha/Mosparo anti-bot, Cloudflare WAF, IP rotation with rate limit backoff • Parse Italian data formats — amounts (€ 1.234.567,89), dates (DD/MM/YYYY, textual), CIG/CUP identifiers with placeholder detection • Extract and process documents: PDF, .p7m (PKCS#7 signed), ZIP/7Z archives, with OCR fallback • Integrate scrapers into our Prefect orchestration pipeline with monitoring, alerting, and anomaly detection • Work with PostgreSQL, Supabase, Clickhouse, and S3 for dual-sink storage with upsert/idempotency patterns

Benefits

• IDR 17M – IDR 21M per month • Offers Bonus • Upload your resume here to autofill key application fields. • Drop your resume here! • Parsing your resume. Autofilling key fields...

Get Started Free

No credit card. Takes 10 seconds.

Privacy·Terms··Contact·FAQ·Wagey on X