Job Title : AI Data Engineer
Location : Austin, TX (Hybrid)
Duration : 12 months
Job Description-
Responsibilities:
- Understand the data needs of stakeholder teams in terms of key data models and reporting, and translate that into technical requirements
- Define, build and manage key data pipelines in dbt that transform raw logs into canonical datasets
- Establish high data integrity standards and SLAs to ensure timely, accurate delivery of data
- Develop insightful and reliable dashboards to track performance of core metrics that will deliver insights to the whole company
- Build foundational data products, dashboards and tools to enable self-serve analytics to scale across the company
- Influence the future roadmap of Product and GTM teams from a data systems perspective
- Become an expert in our organization s data models and the company's data architecture
You might be a good fit if you have:
- 5+ years of experience as an Analytics Data Engineer or similar Data Science & Analytics roles, preferably partnering with GTM and Product leads to build and report on key company-wide metrics.
- A passion for the company's mission of building helpful, honest, and harmless AI.
- Expertise in building multi-step ETL jobs, building robust data models through tooling like dbt; proficiency with workflow management platforms like Airflow and version control management tools through GitHub.
- Expertise in SQL and Python to transform data into accurate, clean data models.
- Experience building data reporting and dashboarding in visualization tools like Hex to serve multiple cross-functional teams.
- A bias for action and urgency, not letting perfect be the enemy of the effective.
- A full-stack mindset , not hesitating to do what it takes to solve a problem end-to-end, even if it requires going outside the original job description.
- Experience building an Analytics Data Engineering (or similar) function at start-ups.
- A strong disposition to thrive in ambiguity, taking initiative to create clarity and forward progress.
Key Responsibilities
- Prompt Engineering Excellence: Design, test, and optimize system prompts and feature-specific prompts that shape Claude s behavior across consumer and API products.
- Evaluation Development: Build and maintain comprehensive evaluation suites that ensure model quality and consistency across product launches and updates.
- Cross-functional Collaboration: Partner closely with product teams, research teams, and safeguards to ensure new features meet quality and safety standards.
- Model Launch Support: Play a critical role in model releases, ensuring smooth rollouts and catching regressions before they impact users.
- Infrastructure Contribution: Help build and improve the frameworks and tools that allow teams to develop and test prompts and features with confidence.
- Knowledge Transfer: Mentor product engineers on prompt engineering best practices and help teams build their first evaluations.
- Rapid Iteration: Work in a fast-paced environment where model capabilities advance daily, requiring quick adaptation and creative problem-solving.
Required Qualifications
- 5+ years of software engineering experience with Python or similar languages.
- Demonstrated experience with LLMs and prompt engineering (through work, research, or significant personal projects).
- Strong understanding of evaluation methodologies and metrics for AI systems.
- Excellent written and verbal communication skills you ll need to explain complex model behaviors to diverse stakeholders.
- Ability to manage multiple concurrent projects and prioritize effectively.
- Experience with version control, CI/CD, and modern software development practices.
Preferred Qualifications
- Experience with Claude or other frontier AI models in production settings.
- Background in machine learning, NLP, or related fields.
- Experience with A/B testing and experimentation frameworks (e.g., Statsig).
- Familiarity with AI safety and alignment considerations.
- Experience building tools and infrastructure for ML/AI workflows.
- Track record of improving AI system performance through systematic evaluation and iteration.