Greystar is a leading global real estate platform specializing in property management and investment management. They are seeking a Backend AI-Forward Data Engineer to design and build scalable data infrastructure that enables AI-powered products and analytics across their global portfolio.

Responsibilities:

Design, build, and maintain scalable data pipelines that ingest, transform, and serve data from dozens of source systems (PMS, CRM, financial systems, IoT, web/mobile analytics, and third-party providers)
Develop and operate our Data Management Platform (DMP) on Databricks, ensuring data is governed, validated, and available for AI/ML workloads
Build data models optimized for both analytical queries and AI consumption — including feature stores, embedding pipelines, and real-time serving layers
Implement data quality frameworks including automated testing, lineage tracking, anomaly detection, and regression testing for critical data assets
Build and maintain MCP (Model Context Protocol) server integrations that expose Greystar’s data to LLM-powered tools and AI agents across the organization
Design APIs and data interfaces that allow AI products (GPS, Greystar.com, internal tools) to query and act on data in real time
Partner with Data Science and Product teams to operationalize ML models — building the infrastructure for model training, evaluation, deployment, and monitoring
Evaluate and integrate AI-powered data tooling (e.g., AI-assisted data cataloging, automated schema detection, intelligent data quality monitoring)
Collaborate with other engineers on AI integration patterns, prompt engineering, and modern development practices. We are an AI forward team and it’s moving fast so we test, iterate, share and repeat
Implement and enforce data governance policies including access controls, PII handling, data classification, and compliance requirements across global operations
Build observability into data systems: monitoring, alerting, SLA tracking, and data freshness guarantees
Contribute to Greystar’s AI governance framework, ensuring data used by AI systems is accurate, compliant, and appropriately scoped
Document data models, pipeline architectures, and integration patterns to enable self-service for business unit analytics teams

Requirements:

5+ years of professional data engineering experience building and operating production data platforms
Deep expertise with Databricks, Spark, or similar distributed data processing frameworks
Strong SQL skills and experience with data modeling for both analytical (star schema, data vault) and AI/ML workloads
Deep experience with AI coding tools like Cursor, Codex, Claude Code, etc
Proficiency in Python; experience with orchestration tools (Airflow, Dagster, or Databricks Workflows)
Experience with cloud data platforms (ADLS, Synapse, Azure ML; AWS/GCP acceptable)
Experience building data infrastructure that supports ML workflows: feature stores, training pipelines, embedding generation, and model serving
Familiarity with LLM integration patterns including RAG architectures, vector databases (Pinecone, Weaviate, or similar), and MCP or tool-use frameworks
Understanding of how AI/ML models consume data and the engineering requirements for reliable, low-latency AI data serving
Awareness of AI governance considerations: data provenance, bias detection, and responsible AI data practices
AI-first mindset — you leverage AI tools in your own workflow and think about how data infrastructure should evolve as AI capabilities advance
Strong ownership mentality; you care about data quality as a product, not just a pipeline
Clear communicator; able to explain data architecture decisions to product managers, analysts, and business stakeholders
Collaborative approach across engineering, product, analytics, and business teams
Databricks, Spark, Delta Lake, Unity Catalog
Python, SQL, dbt or similar transformation frameworks
Azure cloud services (ADLS, Azure ML, Synapse) or equivalent
Git, CI/CD, infrastructure as code (Terraform or similar)
Familiarity with data catalog, lineage, and observability tools (Monte Carlo, Great Expectations, or similar)
Experience in real estate, property management, financial services, or asset management is a strong plus
Familiarity with multi-source data environments where data arrives in heterogeneous formats with varying quality
Experience building data products that serve multiple business units with different access and governance requirements

Backend AI-Forward Data Engineer

Key skills

About this role

Responsibilities:

Requirements: