Together AI is a research-driven artificial intelligence company on a mission to lower the cost of modern AI systems. They are seeking a Staff Data Warehouse Engineer to design and operate a medallion data warehouse, ensuring high data quality and governance across the organization.
Responsibilities:
- Architect and operate a medallion/curated data warehouse stack (bronze/silver/gold) for product, usage, billing, and operational data
- Build and maintain Airflow orchestrated pipelines and dbt transformation projects (modular, tested, documented)
- Design analytics-ready models: SCD Type 2, star schemas, and appropriate normalization for upstream canonical layers
- Lead Master Data Management (MDM) patterns (golden records, reference data, deduping, identity resolution)
- Implement and automate data quality checks (freshness, nulls, referential integrity, distribution drift, anomaly detection)
- Establish data governance habits: data stewardship, ownership, SLAs, and clear definitions for 'source of truth.'
- Build and maintain a business semantic layer (consistent metric definitions, dimensions, and reusable logic) used by notebooks/BI
- Partner with stakeholders (Product, Engineering, Finance, GTM, Ops) to translate questions into durable datasets and metrics
- Use SQL, Python, and Spark where scale demands it; optimize for correctness, performance, and cost
- Mentor engineers and contribute to standards (code review, design docs, runbooks), paving the path to tech lead
Requirements:
- Strong warehouse fundamentals and production experience delivering trusted datasets and metrics
- Expert SQL (window functions, dimensional modeling, performance tuning)
- Hands-on with dbt (models, tests, docs, snapshots, macros) and Airflow (DAG design, backfills, reliability)
- Solid Python for data tooling and automation; experience with Spark (PySpark/SQL) is a plus
- Practical experience with SCD2, star schemas, and handling slowly changing business entities
- Strong stakeholder management: you can drive alignment on definitions, tradeoffs, and delivery timelines
- High standards for data quality, reliability, and maintainability