USA Home Listings is a profitable company building the API platform for real estate data. The role involves taking ownership of the data acquisition infrastructure, modernizing legacy web scraping and data pipeline systems into a reliable, scalable data engine that powers the platform.
Responsibilities:
- Data Pipeline Ownership: Take over the existing web scraping infrastructure, audit it, document it, and modernize it into a maintainable, scalable system
- Web Scraping: Architect and maintain robust scraping spiders using Zyte (formerly Scrapinghub), Scrapy, or equivalent tools to ingest property, vendor, and market data at scale
- Data Processing: Build ETL pipelines that clean, normalize, and enrich raw scraped data into structured datasets stored in PostgreSQL with PostGIS extensions
- Database Engineering: Optimize queries, design schemas, and manage data integrity for large-scale property and campaign datasets
- Infrastructure: Manage background task queues via Celery/Redis, monitor pipeline health, and ensure data freshness and reliability
- API Support: Work with the Senior Full-Stack Engineer to expose processed data through the platform API for internal and external consumption
Requirements:
- 3+ years of Python experience. Strong fundamentals in data processing, scripting, and automation
- Web scraping expertise. Hands-on experience with Scrapy, Zyte, Selenium, Playwright, or similar tools. You understand proxies, headful browsing, rate limiting, and anti-bot bypasses
- Database proficiency. Strong PostgreSQL skills including query optimization, indexing, and schema design. PostGIS experience preferred
- ETL pipeline experience. You have built and maintained data pipelines that process large volumes of messy, real-world data
- Legacy system comfort. You can read undocumented code, trace data flows, and incrementally improve systems without breaking them
- Celery/Redis experience. You have managed background task queues and understand scheduling, retries, and failure handling
- Experience with Zyte (Scrapinghub) specifically
- Django experience (the platform is Django-based)
- Real estate, property, or public records data experience
- Data quality monitoring and alerting
- Experience with DigitalOcean or similar cloud platforms