AWSAzureCloudGoogle Cloud PlatformNumpyPandasPythonPyTorchScikit-LearnTensorflowAIMachine LearningMLDeep LearningGenerative AILLMLarge Language ModelsLangChainAgenticTensorFlowscikit-learnNumPyMLOpsGCPGoogle CloudCI/CDLeadershipCommunicationCollaboration
About this role
Role Overview
Develop sophisticated, production-scale AI systems, including multi-step agentic workflows and multi-agent orchestration platforms
Build tools & agents with advanced capabilities in reasoning, planning, and adaptive tool utilization to address complex business challenges
Drive complete ownership of the AI/ML lifecycle – encompassing implementation, comprehensive testing, deployment, and continuous operational monitoring – delivering projects on schedule and to specification
Produce high-quality, maintainable code for model training pipelines, evaluation frameworks, and inference services that meet production standards
Partner strategically with cross-functional stakeholders including product leaders, data scientists, application teams, vendors, and partners to align on requirements, iterate on solutions, and deliver successful outcomes
Provide hands-on technical leadership, driving architectural decisions and championing best practices across AI development, LLMOps, quality assurance, and production deployment
Design and implement responsible AI frameworks including hallucination detection, safety guardrails, comprehensive evaluation systems, and observability infrastructure to ensure model reliability, accuracy, and ethical deployment
Establish comprehensive evaluation frameworks for Large Language Models and agent-based systems, measuring model quality, task success rates, safety compliance, and operational effectiveness
Proactively identify and resolve technical blockers that could impact project timelines or deliverables
Communicate technical strategy and progress to executive leadership and key stakeholders with clarity and confidence
Engage directly in development and problem-solving, particularly on high-complexity technical challenges, to maintain project velocity and quality
Drive innovation through research and experimentation with emerging AI technologies and frameworks, evaluating and integrating new capabilities that advance our platform.
Requirements
7+ years of proven expertise in designing, building, and deploying AI/ML solutions at scale, with 1-2 years of production experience in Generative AI technologies
Strong foundation in machine learning including statistical modeling, supervised and unsupervised learning algorithms
Advanced skills in prompt engineering with deep understanding of optimization techniques and best practices for LLM interactions
Expert-level programming proficiency in Python and AI/ML development ecosystems
Deep expertise in modern AI frameworks including LLM application development and agentic systems (LangChain, CrewAI, or similar)
Comprehensive MLOps experience with hands-on implementation of CI/CD pipelines, model monitoring, versioning, and lifecycle management for both models and agent-based systems
Production deployment experience on major cloud platforms (AWS, Azure, or GCP) with demonstrated ability to architect and scale cloud-native ML solutions
Versatile ML skillset spanning traditional techniques (classification, regression, clustering) and cutting-edge deep learning approaches
Production-grade generative AI experience deploying and maintaining LLMs and multi-modal models in live environments
Exceptional analytical capabilities with a track record of solving complex technical problems and thriving in ambiguous, rapidly-evolving situations
Proficiency with industry-standard ML libraries including PyTorch, TensorFlow, Scikit-learn, NumPy, and Pandas
Outstanding communication and collaboration skills with ability to translate complex technical concepts for diverse audiences and drive cross-functional alignment
Success partnering across organizational levels from individual contributors to senior leadership, building trust and delivering results
Proven ability to influence and lead in matrix organizations where collaboration and relationship-building are essential to achieving outcomes.
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Numpy
Pandas
Python
PyTorch
Scikit-Learn
Tensorflow
Benefits
A bonus and/or long-term incentive units may be provided as part of the compensation package
Full range of medical, financial, and/or other benefits, dependent on the level and position offered