Home
Jobs
Saved
Resumes
Principal AI/ML Engineer at Luma | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Principal AI/ML Engineer
Luma
Remote
Website
LinkedIn
Principal AI/ML Engineer
Massachusetts, United States of America
Full Time
3 weeks ago
H1B Sponsor
Apply Now
Key skills
AWS
Azure
Cloud
Distributed Systems
Docker
Google Cloud Platform
Kubernetes
Python
PyTorch
TypeScript
Go
AI
ML
GCP
Google Cloud
Helm
Caching
CI/CD
About this role
Role Overview
Architect, build, and scale the end-to-end ML Ops pipeline, including training, fine-tuning, evaluation, rollout, and monitoring.
Design reliable infrastructure for model deployment, versioning, reproducibility, and orchestration across cloud and on-prem GPU clusters.
Optimize compute usage across distributed systems (Kubernetes, autoscaling, caching, GPU allocation, checkpointing workflows).
Lead the implementation of observability for ML systems (monitor drift, performance, throughput, reliability, cost).
Build automated workflows for dataset curation, labeling, feature pipelines, evaluation, and CI/CD for ML models.
Collaborate with researchers to productionize models and accelerate training/inference pipelines.
Establish ML Ops best practices, internal standards, and cross-team tooling.
Mentor engineers and influence architectural direction across the entire AI platform.
Requirements
Deep hands-on experience designing and operating production ML systems at scale (Staff/Principal-level expected).
Strong background in ML Ops, distributed systems, and cloud infrastructure (AWS, GCP, or Azure).
Proficiency with Python and familiarity with TypeScript or Go for platform integration.
Expertise in ML frameworks: PyTorch, Transformers, vLLM, Llama-factory, Megatron-LM, CUDA / GPU acceleration (practical understanding)
Strong experience with containerization and orchestration (Docker, Kubernetes, Helm, autoscaling).
Deep understanding of ML lifecycle workflows: training, fine-tuning, evaluation, inference, model registries.
Ability to lead technical strategy, collaborate cross-functionally, and operate in fast-paced environments
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Docker
Google Cloud Platform
Kubernetes
Python
PyTorch
TypeScript
Go
Benefits
Competitive salary & equity options
Sign-on bonus
Health, Dental, and Vision
401k
Apply Now
Home
Jobs
Saved
Resumes