Sayari is a venture-backed global corporate data provider and commercial intelligence platform. They are seeking a Principal Data Engineer to serve as a technical anchor for complex data challenges, focusing on hands-on work with Spark and graph data logic, as well as system architecture and technical mentorship.

Responsibilities:

Design and implement complex Spark data logic, focusing on performance optimization, data volume tuning, and robust execution
Own the architectural design of graph build pipelines, ensuring they are scalable, automated, and highly resilient
Plan and oversee the strategic re-architecture of data pipelines to meet evolving business needs and scale
Optimize infrastructure-as-code and schema designs to reduce cloud costs and improve pipeline latency
Act as a technical consultant for the team, fostering a collaborative and engineer-led approach to design decisions
Support the development of the engineering team through code reviews, design docs, and architectural best practices
Ensure the accuracy of mission-critical data outputs

Requirements:

8+ years of experience in the big data space, with a proven track record of implementing large-scale features and leading process redesigns
Expert-level mastery of Apache Spark for large-scale data processing
Strong experience with orchestration tools (Airflow) and cloud computing environments
Hands-on experience architecting and managing data flows into databases such as Elasticsearch, Memgraph, and Cassandra
Demonstrated ability in system architecture, including Infrastructure as Code (IaC) and schema design
A 'builder' mindset with experience evolving and improving existing architectures to meet new scale requirements
Experience working specifically with graph data or graph databases
Prior experience with entity resolution or identity resolution systems
Experience evaluating and selecting modern analytical database architectures

Principal Data Engineer

Key skills

About this role

Responsibilities:

Requirements: