Luxury Presence is building the AI growth platform for real estate, backed by top investors and on track for significant revenue growth. As a Sr. Data Engineer, you'll lead the development of high-throughput data pipelines and enhance data quality using AI-driven solutions.
Responsibilities:
- Build and scale high-throughput streaming pipelines
- Design, implement, and operate pipelines ingesting 400M+ monthly MLS updates across 350+ integrations using Airflow, Spark Streaming, Kafka, and Iceberg—ensuring reliability, performance, and data correctness
- Model and deliver high-quality, production-grade real estate datasets
- Develop and maintain datasets that power core product experiences, with a focus on data modeling, transformation logic, and balancing freshness, accuracy, and cost
- Strengthen data quality and observability
- Implement and improve data quality checks, monitoring, and alerting to detect issues early and reduce downstream impact
- Leverage AI to improve data operations
- Contribute to AI-driven tooling that helps triage, debug, and resolve data quality issues, increasing team efficiency and reducing manual intervention
Requirements:
- 6+ years of professional data engineering or software engineering experience
- Strong experience with distributed data processing and streaming systems (Spark / PySpark, Kafka)
- Proficiency in Python (Pydantic preferred) and familiarity with Node/TypeScript is a plus
- Experience building and maintaining data pipelines on AWS using tools like Airflow, Spark Streaming, and Iceberg
- Solid understanding of data modeling and working with large-scale datasets
- Familiarity with event-driven systems and ingestion patterns (Kafka, SQS)
- Experience implementing data quality checks, monitoring, and debugging data issues
- Proven track record leading high-impact initiatives from concept through production in a SaaS environment
- Expert-level grasp of software design principles and experience with multi-tenant platform architectures
- Interest in applying AI/ML or automation to improve data workflows is a plus