AWSCyber SecurityPythonAIMachine LearningMLLLMLarge Language ModelsOpenAIAnthropicRAGLangChainAgenticLangGraphBedrock
About this role
Role Overview
Design, develop, and improve agentic AI systems using LangGraph and LangChain, with a focus on reliability, traceability, and measurable output quality.
Advance existing capabilities including the DAG driven AI and chatbots by improving reasoning, tool-calling accuracy, and inference confidence.
Develop and mature evaluation frameworks to assess AI-generated outputs across dimensions including groundedness, faithfulness, relevance, context precision, and context recall.
Improve and extend multi-LLM ensemble system including consensus scoring methods, model weighting, and aggregation strategies.
Design and implement fine-tuning and prompt optimization pipelines for domain-specific cybersecurity and compliance use cases.
Develop AI/ML components for RAG systems, including embedding strategies, retrieval optimization, chunking, and re-ranking.
Partner with Product and Engineering teams to identify high-leverage opportunities to introduce AI across onboarding, deployment, monitoring, configuration, and remediation workflows.
Develop and document reusable AI-native components, integration patterns, and deployment blueprints, including single-tenant and multi-tenant LLM serving architectures on AWS Bedrock, Cohere, Anthropic, and OpenAI-compatible providers.
Requirements
5 years, or greater, experience in applied machine learning, data science, or AI engineering, with at least 2 years working with large language models in production.
Hands-on experience building agentic AI systems with tool-calling agents, multi-step reasoning pipelines, or DAG-based orchestration (LangGraph, LangChain, or equivalent).
Strong proficiency in Python and the modern ML/AI ecosystem.
Experience with Retrieval-Augmented Generation, including vector databases, embedding models, chunking strategies, and retrieval evaluation.
Experience with foundation model APIs across multiple providers: AWS Bedrock, OpenAI, Anthropic, Cohere, and Meta.
Proven ability to design and implement quantitative evaluation frameworks for LLM or agentic system outputs.
Demonstrated ability to communicate AI concepts and recommendations to non-technical stakeholders and cross-functional teams.
Strong command of statistical reasoning, probabilistic modeling, and experiment design.