ZenBusiness is dedicated to helping entrepreneurs launch and manage their businesses with simplicity and support. The Conversational AI & Prompt Engineer role focuses on enhancing AI interactions by crafting effective dialogue flows and optimizing prompts to ensure high-quality customer experiences.
Responsibilities:
- Analyze conversation transcripts and user feedback to identify areas of confusion, failure, and prompt leakage
- Work with the Customer Impact Team Product Lead to define and track conversational KPIs (e.g., resolution rate, containment rate, user satisfaction)
- Optimize prompts and model selection for cost efficiency, response latency, and scalability in production environments
- Collaborate with the engineers to improve conversation-specific evaluation criteria (e.g., NLU accuracy, intent recognition)
- Design and maintain evaluation frameworks to measure prompt performance using golden datasets and automated scoring (e.g., LLM-as-judge, rubric-based scoring, precision/recall of intent routing)
- Implement guardrails to reduce hallucinations, prevent prompt injection, and ensure compliant, safe responses
- Collaborate on design, map, and implement complex conversation flows, including error recovery and contextual handoffs (escalation to human support)
- Own the continuous optimization of system prompts and instructions for LLMs (Gemini, OpenAI) to ensure Velo's response is accurate, tone is consistent, and on-brand
- Design and optimize structured outputs, function calling, and tool-routing logic to ensure accurate data capture and downstream system integrations
Requirements:
- 5+ years with 2+ years in Conversational AI, Applied LLM Engineering, Prompt Engineering, or NLP systems in production environments
- Deep experience designing and optimizing prompts for GPT, Gemini, or similar models, including structured outputs and function calling
- Practical experience designing and tuning RAG pipelines (chunking, embeddings, retrieval evaluation)
- Experience building evaluation datasets and running prompt experiments (A/B testing, automated scoring, regression testing)
- Proficiency in Python or TypeScript; experience integrating LLM APIs in production systems
- Ability to analyze conversational performance using data and logs to drive measurable improvements
- Strong systems thinking, empathy for users, and ability to translate business logic into scalable AI behavior
- Experience With Agentic Systems: Similar to Decagon, Agentforce, Fin, Sierra