AWSAzureCloudGoogle Cloud PlatformNumpyPandasPythonPyTorchScikit-LearnTensorflowAIMachine LearningMLDeep LearningGenerative AILLMLarge Language ModelsLangChainAgenticTensorFlowscikit-learnNumPyMLOpsGCPGoogle CloudCI/CDLeadershipCommunicationCollaboration
About this role
Role Overview
Develop sophisticated, production-scale AI systems, including multi-step agentic workflows and multi-agent orchestration platforms.
Build tools & agents with advanced capabilities in reasoning, planning, and adaptive tool utilization to address complex business challenges.
Drive complete ownership of the AI/ML lifecycle – encompassing implementation, comprehensive testing, deployment, and continuous operational monitoring – delivering projects on schedule and to specification.
Produce high-quality, maintainable code for model training pipelines, evaluation frameworks, and inference services that meet production standards.
Partner strategically with cross-functional stakeholders including product leaders, data scientists, application teams, vendors, and partners to align on requirements, iterate on solutions, and deliver successful outcomes.
Provide hands-on technical leadership, driving architectural decisions and championing best practices across AI development, LLMOps, quality assurance, and production deployment.
Design and implement responsible AI frameworks including hallucination detection, safety guardrails, comprehensive evaluation systems, and observability infrastructure to ensure model reliability, accuracy, and ethical deployment.
Establish comprehensive evaluation frameworks for Large Language Models and agent-based systems, measuring model quality, task success rates, safety compliance, and operational effectiveness.
Proactively identify and resolve technical blockers that could impact project timelines or deliverables.
Communicate technical strategy and progress to executive leadership and key stakeholders with clarity and confidence.
Engage directly in development and problem-solving, particularly on high-complexity technical challenges, to maintain project velocity and quality.
Drive innovation through research and experimentation with emerging AI technologies and frameworks, evaluating and integrating new capabilities that advance our platform.
Requirements
7+ years of proven expertise in designing, building, and deploying AI/ML solutions at scale, with 1-2 years of production experience in Generative AI technologies.
Strong foundation in machine learning including statistical modeling, supervised and unsupervised learning algorithms.
Advanced skills in prompt engineering with deep understanding of optimization techniques and best practices for LLM interactions.
Expert-level programming proficiency in Python and AI/ML development ecosystems.
Deep expertise in modern AI frameworks including LLM application development and agentic systems (LangChain, CrewAI, or similar).
Comprehensive MLOps experience with hands-on implementation of CI/CD pipelines, model monitoring, versioning, and lifecycle management for both models and agent-based systems.
Production deployment experience on major cloud platforms (AWS, Azure, or GCP) with demonstrated ability to architect and scale cloud-native ML solutions.
Versatile ML skillset spanning traditional techniques (classification, regression, clustering) and cutting-edge deep learning approaches.
Production-grade generative AI experience deploying and maintaining LLMs and multi-modal models in live environments.
Exceptional analytical capabilities with a track record of solving complex technical problems and thriving in ambiguous, rapidly-evolving situations.
Proficiency with industry-standard ML libraries including PyTorch, TensorFlow, Scikit-learn, NumPy, and Pandas.
Outstanding communication and collaboration skills with ability to translate complex technical concepts for diverse audiences and drive cross-functional alignment.
Success partnering across organizational levels from individual contributors to senior leadership, building trust and delivering results.
Proven ability to influence and lead in matrix organizations where collaboration and relationship-building are essential to achieving outcomes.
Tech Stack
AWS
Azure
Cloud
Google Cloud Platform
Numpy
Pandas
Python
PyTorch
Scikit-Learn
Tensorflow
Benefits
A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits