Design, develop, and maintain scalable web scraping solutions to extract data from a wide range of websites and online platforms
Build robust data pipelines and automation workflows for data collection, cleaning, validation, and transformation
Process and prepare scraped data into MDR production-ready formats, meeting strict quality and timeline requirements
Monitor and troubleshoot scraping jobs, handling anti-bot mechanisms, CAPTCHAs, rate limiting, and site structure changes
Collaborate with cross-functional teams to understand data requirements, prioritize sources, and define scraping specifications
Document scraping processes, data schemas, and technical decisions for knowledge sharing and continuity
Identify opportunities for process improvement and automation to increase efficiency and reduce turnaround time
Support the transition of work from external vendors, ensuring seamless continuity of data deliveries
Requirements
8+ years of professional experience in web scraping, data extraction, or data engineering
Strong proficiency in Python, with hands-on experience using scraping libraries and frameworks (Scrapy, BeautifulSoup, Selenium, Playwright, or similar)
Experience building and scheduling automated data pipelines (cron, Airflow, or equivalent orchestration tools)
Solid understanding of HTML, CSS, DOM structure, and browser developer tools for inspecting and reverse-engineering web pages
Familiarity with REST APIs, JSON, and techniques for extracting data from API endpoints
Experience with relational databases (PostgreSQL, MySQL) and proficiency in SQL
Ability to handle anti-scraping measures: proxy rotation, headless browsers, CAPTCHA handling, and request throttling
Strong problem-solving skills and attention to data quality and accuracy