Amira Learning is a leader in third-generation edtech, focused on accelerating literacy outcomes through AI technology. The Gen AI Engineer will design and develop LLM-powered systems, ensuring accuracy and reliability while collaborating across teams to implement generative AI solutions.
Responsibilities:
- Design, build, and continuously improve LLM-powered systems across Amira's product and operations — from internal tools to customer-facing features
- Own RAG pipelines end-to-end: document ingestion, chunking strategy, embedding selection, retrieval tuning, and response synthesis
- Develop and enforce guardrails, grounding strategies, and confidence thresholds to mitigate hallucination and ensure output reliability
- Architect prompt chains and agent workflows that are robust, maintainable, and cost-effective at scale
- Design and operate evaluation frameworks to measure system accuracy, helpfulness, hallucination rate, and task completion across generative AI features
- Fine-tune and adapt foundation models for domain-specific tasks, including data curation, training pipeline setup, and performance benchmarking
- Implement automated and human-in-the-loop review processes to catch and correct problematic outputs
- Monitor production traffic, identify failure modes, and iterate rapidly on retrieval, prompting, and generation strategies
- Integrate LLM-powered features with internal systems and third-party platforms (e.g., Salesforce, CRM tools) via APIs, connectors, and data sync workflows
- Contribute to shared ML infrastructure and tooling used across Amira's AI systems
- Help explore and implement solutions that make generative AI economically viable within the budget constraints typical of public schools and education SaaS
- Partner with learning design, content, product, and customer success teams to ensure AI systems are grounded in accurate, up-to-date domain knowledge
- Translate business needs into well-scoped generative AI solutions and communicate tradeoffs clearly to non-technical stakeholders
Requirements:
- 2+ years of hands-on experience building and deploying LLM-based systems in production
- Deep familiarity with RAG architectures: embedding models, vector databases, retrieval strategies, and response grounding
- Demonstrated experience with evaluation and benchmarking of LLM outputs — including hallucination mitigation, confidence filtering, output validation, and fallback strategies
- Practical experience with prompt engineering, prompt chaining, and/or agent orchestration frameworks (LangChain, LlamaIndex, or similar)
- Proficiency in Python and experience working with LLM APIs (open-source, Anthropic, OpenAI, etc.)
- Experience building and maintaining ML or data pipelines in AWS or similar cloud infrastructure (Lambda, S3, RDS, etc.)
- Degree in computer science or a related technical field, or equivalent practical experience
- Experience fine-tuning foundation models or running RLHF / preference-based feedback loops for domain-specific improvement
- Experience in education SaaS or with education-sector customers (districts, schools, state agencies)
- Familiarity with Salesforce or similar CRM platforms and their API/data ecosystems
- Experience with evaluation tooling, custom eval harnesses, or LLM-as-judge approaches
- Background working with conversational AI, chatbots, or customer-facing generative AI features
- Proven ability to operate in a fast-paced, goal-oriented startup environment and manage multiple concurrent workstreams