Zeta Global is an AI-Powered Marketing Cloud that utilizes advanced artificial intelligence to enhance customer acquisition and retention for marketers. The Lead AI Engineer will oversee the post-training lifecycle of large language models, focusing on Supervised Fine-Tuning and optimization while collaborating with various teams to integrate these models into production systems.

Responsibilities:

Lead Supervised Fine-Tuning (SFT) of large language models in production, shaping instruction-following, reasoning quality, tone, and domain-specific behavior
Extend SFT pipelines with instruction tuning and preference-based optimization (e.g., RLHF-style approaches or direct preference optimization)
Design, curate, and maintain high-quality SFT and preference datasets, combining human-labeled and synthetic data tailored to real-world marketing and decisioning use cases
Own model evaluation and benchmarking, including: Offline behavioral evals (instruction adherence, reasoning depth, hallucination rates) Online experiments and A/B tests Continuous regression detection and performance monitoring
Develop and operate agentic LLM systems, enabling multi-step reasoning, tool use, workflow orchestration, and decision execution
Implement and optimize prompting, retrieval-augmented generation (RAG), memory, and tool-calling strategies, with a clear understanding of when to solve problems via SFT versus prompting
Partner closely with data engineering, platform, and product teams to integrate fine-tuned models into high-throughput, low-latency systems
Establish best practices for LLM versioning, experimentation, deployment, rollback, governance, and safety
Provide technical leadership and mentorship to engineers working on applied AI and LLM systems

Requirements:

Significant hands-on experience with Supervised Fine-Tuning (SFT) of LLMs in production, beyond prompt-only approaches
Direct experience using OpenAI APIs and/or AWS Bedrock for SFT, post-training, and deployment
Strong understanding of LLM post-training workflows, including data preparation, instruction tuning, evaluation methodologies, and common failure modes
Experience building and operating agentic LLM systems (tool use, multi-step reasoning, workflow orchestration)
Proficiency in Python and modern ML frameworks (e.g., PyTorch)
Experience operating ML systems in distributed, production environments
Strong intuition for trade-offs between model quality, latency, cost, safety, and scalability

Lead AI Engineer

Key skills

About this role

Responsibilities:

Requirements: