Together AI is a research-driven artificial intelligence company on a mission to lower the cost of modern AI systems. They are seeking an early-career Data Warehouse Engineer to contribute to designing and operating their data warehouse, ETL pipelines, and improving data quality and governance.
Responsibilities:
- Contribute to building and maintaining a medallion/curated data warehouse stack (bronze/silver/gold) for product, usage, billing, and operational data
- Build and maintain Airflow orchestrated pipelines and dbt transformation projects (modular, tested, documented)
- Help design analytics-ready models: SCD Type 2, star schemas, and appropriate normalization for upstream canonical layers
- Learn and apply Master Data Management (MDM) patterns (golden records, reference data, deduping, identity resolution)
- Implement data quality checks (freshness, nulls, referential integrity, distribution drift, anomaly detection)
- Contribute to data governance habits: data stewardship, ownership, SLAs, and clear definitions for “source of truth.”
- Help build and maintain a business semantic layer (consistent metric definitions, dimensions, and reusable logic) used by notebooks/BI
- Partner with stakeholders (Product, Engineering, Finance, GTM, Ops) to translate questions into durable datasets and metrics
- Use SQL, Python, and Spark where scale demands it; optimize for correctness, performance, and cost
Requirements:
- 0–4 years of professional experience (or strong internships/projects) working with data warehouses, pipelines, or analytics engineering
- Solid SQL fundamentals — you're comfortable writing queries and have some exposure to window functions or dimensional modeling concepts
- Some hands-on experience with dbt or Airflow, or strong eagerness to learn — coursework and personal projects count
- Basic Python for scripting and data tooling; any exposure to Spark (PySpark/SQL) is a plus
- Familiarity with data modeling concepts like SCD2 or star schemas — even if only from coursework
- Good communication skills: you can ask clarifying questions, explain your reasoning, and work with stakeholders to understand their needs
- High standards for data quality, reliability, and maintainability — you care about getting things right