Sayari is a venture-backed global corporate data provider and commercial intelligence platform. They are seeking a Principal Data Engineer to serve as a technical anchor for complex data challenges, focusing on hands-on work with Spark and graph data logic, as well as system architecture and technical mentorship.
Responsibilities:
- Design and implement complex Spark data logic, focusing on performance optimization, data volume tuning, and robust execution
- Own the architectural design of graph build pipelines, ensuring they are scalable, automated, and highly resilient
- Plan and oversee the strategic re-architecture of data pipelines to meet evolving business needs and scale
- Optimize infrastructure-as-code and schema designs to reduce cloud costs and improve pipeline latency
- Act as a technical consultant for the team, fostering a collaborative and engineer-led approach to design decisions
- Support the development of the engineering team through code reviews, design docs, and architectural best practices
- Ensure the accuracy of mission-critical data outputs
Requirements:
- 8+ years of experience in the big data space, with a proven track record of implementing large-scale features and leading process redesigns
- Expert-level mastery of Apache Spark for large-scale data processing
- Strong experience with orchestration tools (Airflow) and cloud computing environments
- Hands-on experience architecting and managing data flows into databases such as Elasticsearch, Memgraph, and Cassandra
- Demonstrated ability in system architecture, including Infrastructure as Code (IaC) and schema design
- A 'builder' mindset with experience evolving and improving existing architectures to meet new scale requirements
- Experience working specifically with graph data or graph databases
- Prior experience with entity resolution or identity resolution systems
- Experience evaluating and selecting modern analytical database architectures