The Judge Group is seeking an experienced Machine Learning Engineer to build advanced algorithmic and AI capabilities. This role combines deep technical modeling expertise with infrastructure engineering to develop and operate end-to-end ML/AI systems at scale, working closely with Data Science, Data Engineering, and Architecture teams.

Responsibilities:

Design and optimize machine learning models including deep learning architectures, LLMs, and BERT-based classifiers
Build distributed training workflows using PyTorch and similar frameworks
Fine‑tune large language models and optimize inference performance (Neuron Compiler, ONNX, vLLM)
Optimize models for GPU, TPU, and AWS Inferentia/Trainium
Design AI services for both real‑time and batch processing use cases
Lead development of ML infrastructure covering data ingestion, feature engineering, training, and serving
Build scalable inference systems for real-time and batch predictions
Deploy models across EC2, EKS, SageMaker, and specialized inference hardware
Implement and maintain core MLOps capabilities including Feature Store, Observability, Governance, and automated pipelines
Build Infrastructure‑as‑Code workflows for training, evaluation, and deployment
Develop MLOps tooling to simplify workflows for data science teams
Create CI/CD pipelines for ML models and infrastructure components
Monitor and optimize ML systems for performance, accuracy, latency, and cost efficiency
Implement system profiling and observability across the ML lifecycle
Partner with Data Engineering to ensure optimal data availability and quality
Collaborate with Architecture, Governance, and Security teams to meet enterprise standards
Provide technical guidance on modeling methods and AI infrastructure best practices

Requirements:

An experienced Machine Learning Engineer (Contract) to build advanced algorithmic and AI capabilities across Personalization, Generative AI, Forecasting, and Decision Science
Deep technical modeling expertise with infrastructure engineering to develop and operate end‑to‑end ML/AI systems at scale
Design and optimize machine learning models including deep learning architectures, LLMs, and BERT-based classifiers
Build distributed training workflows using PyTorch and similar frameworks
Fine‑tune large language models and optimize inference performance (Neuron Compiler, ONNX, vLLM)
Optimize models for GPU, TPU, and AWS Inferentia/Trainium
Design AI services for both real‑time and batch processing use cases
Lead development of ML infrastructure covering data ingestion, feature engineering, training, and serving
Build scalable inference systems for real-time and batch predictions
Deploy models across EC2, EKS, SageMaker, and specialized inference hardware
Implement and maintain core MLOps capabilities including Feature Store, Observability, Governance, and automated pipelines
Build Infrastructure‑as‑Code workflows for training, evaluation, and deployment
Develop MLOps tooling to simplify workflows for data science teams
Create CI/CD pipelines for ML models and infrastructure components
Monitor and optimize ML systems for performance, accuracy, latency, and cost efficiency
Implement system profiling and observability across the ML lifecycle
Partner with Data Engineering to ensure optimal data availability and quality
Collaborate with Architecture, Governance, and Security teams to meet enterprise standards
Provide technical guidance on modeling methods and AI infrastructure best practices

Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: