Inframark is seeking a Senior MLOps Engineer to architect and build their production ML infrastructure. The role involves designing and implementing a multi-tenant platform for deploying machine learning models at scale across multiple wastewater utility customers.
Responsibilities:
- Design and implement multi-tenant ML model serving infrastructure that supports customer isolation, monitoring, and cost allocation
- Build CI/CD pipelines for automated model training, testing, validation, and deployment
- Establish data quality frameworks including validation, drift detection, and monitoring at scale
- Create model versioning, A/B testing, and rollback capabilities for production deployments
- Collaborate closely with data scientists to establish workflows that enable independent model deployment while maintaining quality and consistency
- Implement observability and monitoring systems for model performance, data quality, and infrastructure health
- Design and document architectural patterns and best practices for the ML platform
- Optimize infrastructure costs across multiple customer deployments
- Ensure security, compliance, and data isolation requirements are met in multi-tenant architecture
- Bridge the gap between pilot/proof-of-concept systems and production-ready infrastructure
Requirements:
- 5-8+ years of experience in MLOps, DevOps, or ML infrastructure engineering
- Proven experience architecting and building ML platforms from scratch (0→1), not just maintaining existing systems
- Deep understanding of multi-tenant architecture patterns, including data isolation, security, and cost optimization
- Strong experience with containerization (Docker, Kubernetes) and orchestration for ML workloads
- Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP) for production ML deployment
- Experience designing and implementing CI/CD pipelines for ML models
- Strong knowledge of data quality monitoring, model drift detection, and observability practices
- Proficiency in Python and infrastructure-as-code tools (Terraform, CloudFormation, etc.)
- Experience working with Python ML Stack: pytorch, scikit-learn, numpy, and pandas
- Experience working closely with data scientists to enable their productivity and independence
- Excellent communication skills—able to explain architectural decisions and tradeoffs to both technical and business stakeholders
- Experience in time-series data, SCADA systems, or edge computing
- Previous experience scaling ML systems from pilots to hundreds of production deployments
- Familiarity with water/wastewater utility operations or industrial control systems