Greystar is a leading global real estate platform specializing in property management and investment management. They are seeking a Backend AI-Forward Data Engineer to design and build scalable data infrastructure that enables AI-powered products and analytics across their global portfolio.
Responsibilities:
- Design, build, and maintain scalable data pipelines that ingest, transform, and serve data from dozens of source systems (PMS, CRM, financial systems, IoT, web/mobile analytics, and third-party providers)
- Develop and operate our Data Management Platform (DMP) on Databricks, ensuring data is governed, validated, and available for AI/ML workloads
- Build data models optimized for both analytical queries and AI consumption — including feature stores, embedding pipelines, and real-time serving layers
- Implement data quality frameworks including automated testing, lineage tracking, anomaly detection, and regression testing for critical data assets
- Build and maintain MCP (Model Context Protocol) server integrations that expose Greystar’s data to LLM-powered tools and AI agents across the organization
- Design APIs and data interfaces that allow AI products (GPS, Greystar.com, internal tools) to query and act on data in real time
- Partner with Data Science and Product teams to operationalize ML models — building the infrastructure for model training, evaluation, deployment, and monitoring
- Evaluate and integrate AI-powered data tooling (e.g., AI-assisted data cataloging, automated schema detection, intelligent data quality monitoring)
- Collaborate with other engineers on AI integration patterns, prompt engineering, and modern development practices. We are an AI forward team and it’s moving fast so we test, iterate, share and repeat
- Implement and enforce data governance policies including access controls, PII handling, data classification, and compliance requirements across global operations
- Build observability into data systems: monitoring, alerting, SLA tracking, and data freshness guarantees
- Contribute to Greystar’s AI governance framework, ensuring data used by AI systems is accurate, compliant, and appropriately scoped
- Document data models, pipeline architectures, and integration patterns to enable self-service for business unit analytics teams
Requirements:
- 5+ years of professional data engineering experience building and operating production data platforms
- Deep expertise with Databricks, Spark, or similar distributed data processing frameworks
- Strong SQL skills and experience with data modeling for both analytical (star schema, data vault) and AI/ML workloads
- Deep experience with AI coding tools like Cursor, Codex, Claude Code, etc
- Proficiency in Python; experience with orchestration tools (Airflow, Dagster, or Databricks Workflows)
- Experience with cloud data platforms (ADLS, Synapse, Azure ML; AWS/GCP acceptable)
- Experience building data infrastructure that supports ML workflows: feature stores, training pipelines, embedding generation, and model serving
- Familiarity with LLM integration patterns including RAG architectures, vector databases (Pinecone, Weaviate, or similar), and MCP or tool-use frameworks
- Understanding of how AI/ML models consume data and the engineering requirements for reliable, low-latency AI data serving
- Awareness of AI governance considerations: data provenance, bias detection, and responsible AI data practices
- AI-first mindset — you leverage AI tools in your own workflow and think about how data infrastructure should evolve as AI capabilities advance
- Strong ownership mentality; you care about data quality as a product, not just a pipeline
- Clear communicator; able to explain data architecture decisions to product managers, analysts, and business stakeholders
- Collaborative approach across engineering, product, analytics, and business teams
- Databricks, Spark, Delta Lake, Unity Catalog
- Python, SQL, dbt or similar transformation frameworks
- Azure cloud services (ADLS, Azure ML, Synapse) or equivalent
- Git, CI/CD, infrastructure as code (Terraform or similar)
- Familiarity with data catalog, lineage, and observability tools (Monte Carlo, Great Expectations, or similar)
- Experience in real estate, property management, financial services, or asset management is a strong plus
- Familiarity with multi-source data environments where data arrives in heterogeneous formats with varying quality
- Experience building data products that serve multiple business units with different access and governance requirements