Design, enhance, and maintain the Tachyon Predictive Ops (TPOps) self-service MLOps framework, enabling rapid experimentation, training, deployment, and monitoring of AI models.
Build cloud-native MDLC (Model Development Lifecycle) capabilities including model registry, versioning, lineage, and reproducibility.
Develop unified libraries, SDKs, and extensible components that accelerate both predictive and generative AI workflows.
Implement reusable automation patterns for model training, validation, deployment, and governance.
Contribute to the Unified & Managed Predictive AI Platform, spanning on-premise infrastructure, GCP, and upcoming Azure ML integration.
Implement real-time and batch inferencing capabilities supporting instant prediction use cases and scheduled batch pipelines.
Support hybrid AI delivery patterns —predictive ML, GenAI workflows, agentic systems, and multi-agent orchestration.
Build strategic observability features including drift detection, performance optimization, and open-standards monitoring integrations.
Collaborate with MLOps, platform engineering, and architecture to ensure compliance with enterprise governance and operational excellence requirements.
Design and develop scalable APIs and microservices to expose AI capabilities to enterprise applications.
Implement automation and CI/CD patterns enabling consistent deployments across hybrid compute environments.
Develop prompt engineering standards and reusable blueprints for LLM-powered developer tools such as Copilot, Devin.AI, and agentic systems.
Work with data scientists, engineers, and platform teams to integrate AI pipelines and frameworks across the enterprise.
Provide hands-on guidance to junior engineers on modern AI engineering and platform development patterns.
Requirements
4+ years of Specialty Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
3+ years hands-on experience with AI/ML development and modern ML frameworks.
2+ years strong programming skills in Python and/or Java.
2+ years in building APIs, frameworks, automation pipelines, or distributed systems.
2+ years experience with cloud platforms (GCP, Azure, AWS) or on-prem platforms such as Kubernetes or OpenShift.
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Google Cloud Platform
Java
Kubernetes
Microservices
OpenShift
Python
Benefits
Health benefits
401(k) Plan
Paid time off
Disability benefits
Life insurance, critical illness insurance, and accident insurance