Fusion Risk Management is a fast-growing, innovative company focused on operational resilience through cloud-based software solutions. The Machine Learning Engineer will design, build, deploy, and operate production-grade machine learning systems, driving improvements in resilience capabilities through intelligent systems.
Responsibilities:
- Design, build, deploy, and maintain production machine learning systems, including predictive models for threat intelligence, escalation timing, and recovery prediction
- Own the end-to-end model lifecycle for flywheel use cases: data ingestion, feature engineering, training, rigorous evaluation, deployment, monitoring, and automated retraining based on customer outcome data
- Build and maintain robust model evaluation frameworks—including offline metrics, A/B testing infrastructure, backtesting against historical outcomes, and calibration analysis—to ensure models improve with each retraining cycle
- Architect scalable ML pipelines with full CI/CD: automated testing of model code and artifacts, validation gates before promotion, staged rollouts, and rollback capabilities
- Own ML Ops and AI Ops practices, including automated model validation, performance monitoring, drift detection, observability dashboards, and governance frameworks
- Maintain and expand operations for simulation (Monte Carlo, Bayesian Networks) and optimization engines (linear, constraint, CP-SAT) for continued reliable service
- Design ML systems that operate across both managed cloud and customer-hosted (reverse SaaS) environments, with pluggable inference adapters that respect customer governance boundaries
- Refactor and harden existing AI systems to improve scalability, latency, cost efficiency, and fault tolerance
- Build and maintain data pipelines and feature engineering workflows that support reliable and reproducible model training
- Collaborate closely with product and engineering teams to translate resilience use cases into scalable, maintainable ML-powered product capabilities
Requirements:
- Strong software engineering foundation with hands-on experience building and deploying machine learning systems in production environments
- Deep experience with model evaluation methodology—including metric selection, offline/online evaluation, statistical testing, calibration, and understanding when a model is ready for production
- Strong experience with ML Ops tooling and practices: CI/CD pipelines for model code and artifacts, automated testing, model registries, experiment tracking, and reproducible training
- Experience designing and operating feedback-loop or continuous-learning ML systems where production outcomes are used to retrain and improve models over time
- Experience with reinforcement learning, decision systems, simulation modeling, or optimization techniques
- Proficiency in writing clean, maintainable, well-tested code with version control, CI/CD, and observability best practices
- Experience with containerized deployments and orchestration (Docker, Kubernetes, Helm) and deploying ML services in both cloud and on-premise/VPC environments
- Familiarity with drift detection, model monitoring, alerting, and governance frameworks for production ML
- Experience designing ML architectures, APIs, and services that integrate with enterprise SaaS platforms
- Ability to design modular, extensible ML systems that evolve alongside product requirements
- Familiarity with AI-assisted development tools (e.g., Copilot, Cursor, Claude Code, or similar) and comfort using them to accelerate ML engineering workflows
- Strong communication skills and the ability to explain model behavior, evaluation results, tradeoffs, and architectural decisions to technical and non-technical stakeholders
- Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Engineering, or a related field
- 3+ years of experience building, deploying, and operating machine learning systems in production environments
- Demonstrated experience with model evaluation, validation, and testing in production ML systems (strongly preferred)
- Experience building CI/CD pipelines for ML—including automated testing, validation gates, and staged deployments (strongly preferred)
- Experience with feedback-loop or continuous-learning ML architectures where models retrain on outcome data (preferred)
- Experience with reinforcement learning, decision intelligence systems, or control systems (preferred)
- Experience with simulation, optimization, constraint programming, or operations research techniques (preferred)
- Experience building ML pipelines in cloud environments (Azure preferred)
- Experience deploying ML systems in hybrid cloud/on-premise environments (nice to have)