ZoomInfo is where careers accelerate, and they are seeking an exceptional Principal Software Engineer to serve as the Technical Lead for their Web Data Acquisition team. This role involves designing and leading the web crawling and data extraction infrastructure, solving complex distributed systems challenges, and mentoring a high-performing engineering team.
Responsibilities:
- Proven experience building web crawling or large-scale data systems from scratch
- Strong architectural skills designing scalable, fault-tolerant distributed systems
- Track record leading complex technical initiatives and driving architecture direction for teams
- Demonstrated ability to evolve production systems incrementally while maintaining reliability
- Deep background in large-scale data engineering (terabytes daily)
- Hands-on experience with cloud data warehouses (BigQuery, Snowflake)
- Experience with Apache Kafka, Kubernetes (GKE/EKS), and orchestration tools (Airflow)
- Familiarity with multi-cloud environments (GCP + AWS)
- Expertise designing and operating ETL/ELT pipelines
- Deep expertise in web crawling technologies and advanced scraping (Scrapy or similar)
- Experience extracting structured/unstructured web data and SERP extraction
- Knowledge of proxy infrastructure management, anti-bot detection, and ethical crawling
- Familiarity with crawling vendors and AI/LLM-based extraction approaches
- Experience mentoring engineers at all levels and fostering collaborative culture
- Strong ability to influence technical direction and establish best practices
- Track record hiring, coaching, and developing senior engineers
- Startup or small company experience wearing multiple hats
- Comfortable operating in ambiguity and pioneering new capabilities
- Entrepreneurial mindset with bias toward action and iteration
- Excellent communicator who explains complex technical concepts to diverse audiences
Requirements:
- Proven experience building web crawling or large-scale data systems from scratch
- Strong architectural skills designing scalable, fault-tolerant distributed systems
- Track record leading complex technical initiatives and driving architecture direction for teams
- Demonstrated ability to evolve production systems incrementally while maintaining reliability
- Deep background in large-scale data engineering (terabytes daily)
- Hands-on experience with cloud data warehouses (BigQuery, Snowflake)
- Experience with Apache Kafka, Kubernetes (GKE/EKS), and orchestration tools (Airflow)
- Familiarity with multi-cloud environments (GCP + AWS)
- Expertise designing and operating ETL/ELT pipelines
- Deep expertise in web crawling technologies and advanced scraping (Scrapy or similar)
- Experience extracting structured/unstructured web data and SERP extraction
- Knowledge of proxy infrastructure management, anti-bot detection, and ethical crawling
- Familiarity with crawling vendors and AI/LLM-based extraction approaches
- Experience mentoring engineers at all levels and fostering collaborative culture
- Strong ability to influence technical direction and establish best practices
- Track record hiring, coaching, and developing senior engineers
- Startup or small company experience wearing multiple hats
- Comfortable operating in ambiguity and pioneering new capabilities
- Entrepreneurial mindset with bias toward action and iteration
- Excellent communicator who explains complex technical concepts to diverse audiences
- 10+ years software engineering experience
- 5+ years focused on data engineering
- 3+ years in senior/principal-level technical leadership
- Strong CS fundamentals (algorithms, data structures, distributed systems)
- Self-starter who thrives in fast-paced environments
- B2B data company or data-as-a-product experience a plus
- Python & Java
- Apache Kafka
- GCP (BigQuery, GKE, Vertex AI)
- Snowflake & Starburst/Trino
- Terraform
- Scrapy / Web Scraping Frameworks
- Proxy Management Systems
- Distributed Systems & Kubernetes
- Apache Airflow
- Large-Scale ETL Pipelines
- Expert-level experience with Apache Spark