Serv is a global executive recruitment partner hiring for Mercator.ai, which is focused on building scalable data infrastructure for data-driven decision making. The Staff Data Engineer will lead the design and development of distributed data pipelines, ensuring scalability, reliability, and integration of modern tools to enhance engineering output.
Responsibilities:
- Lead the architecture and evolution of scalable, distributed data pipelines, ensuring high availability and performance at scale
- Design and implement robust data models to support reporting and advanced data applications
- Build and maintain distributed web scraping systems using tools such as Playwright, Selenium, and BeautifulSoup
- Develop systems capable of handling anti-scraping measures, proxy rotation, and high-volume data extraction
- Integrate AI and LLMs into engineering workflows for code generation, automation, and optimization
- Apply prompt engineering techniques to improve data processing, documentation, and troubleshooting
- Identify and implement system and process improvements to optimize performance and efficiency
- Manage and scale cloud-based data infrastructure, including data warehouses, object storage, and search systems
- Deploy and maintain containerized workloads using Kubernetes
- Implement data quality monitoring and governance processes to ensure accuracy and reliability
- Mentor junior engineers through code reviews, documentation, and knowledge sharing
- Communicate technical concepts clearly and provide business context for engineering decisions
Requirements:
- 5+ years of experience in Data Engineering with a track record of scaling systems
- Expert proficiency in Python and advanced SQL, including performance tuning and optimization
- Strong experience with workflow orchestration tools such as Airflow or Prefect and transformation tools such as dbt
- Proven experience building resilient web scraping systems using Playwright, Selenium, and BeautifulSoup
- Deep understanding of relational and NoSQL databases including Postgres, MongoDB, and ElasticSearch
- Experience working with large-scale data systems such as BigQuery
- Strong proficiency with CI/CD pipelines, Git, and Docker
- Experience designing and maintaining distributed systems with high availability and fault tolerance
- Experience with GCP or AWS and Kubernetes for infrastructure management
- Familiarity with LLMs such as ChatGPT, Claude, or Gemini for engineering workflows
- Experience with prompt optimization and AI-assisted development