Fusion Risk Management is a fast-growing, innovative company focused on operational resilience through cloud-based software solutions. The Machine Learning Engineer will design, build, deploy, and operate production-grade machine learning systems, driving improvements in resilience capabilities through intelligent systems.

Responsibilities:

Design, build, deploy, and maintain production machine learning systems, including predictive models for threat intelligence, escalation timing, and recovery prediction
Own the end-to-end model lifecycle for flywheel use cases: data ingestion, feature engineering, training, rigorous evaluation, deployment, monitoring, and automated retraining based on customer outcome data
Build and maintain robust model evaluation frameworks—including offline metrics, A/B testing infrastructure, backtesting against historical outcomes, and calibration analysis—to ensure models improve with each retraining cycle
Architect scalable ML pipelines with full CI/CD: automated testing of model code and artifacts, validation gates before promotion, staged rollouts, and rollback capabilities
Own ML Ops and AI Ops practices, including automated model validation, performance monitoring, drift detection, observability dashboards, and governance frameworks
Maintain and expand operations for simulation (Monte Carlo, Bayesian Networks) and optimization engines (linear, constraint, CP-SAT) for continued reliable service
Design ML systems that operate across both managed cloud and customer-hosted (reverse SaaS) environments, with pluggable inference adapters that respect customer governance boundaries
Refactor and harden existing AI systems to improve scalability, latency, cost efficiency, and fault tolerance
Build and maintain data pipelines and feature engineering workflows that support reliable and reproducible model training
Collaborate closely with product and engineering teams to translate resilience use cases into scalable, maintainable ML-powered product capabilities

Requirements:

Strong software engineering foundation with hands-on experience building and deploying machine learning systems in production environments
Deep experience with model evaluation methodology—including metric selection, offline/online evaluation, statistical testing, calibration, and understanding when a model is ready for production
Strong experience with ML Ops tooling and practices: CI/CD pipelines for model code and artifacts, automated testing, model registries, experiment tracking, and reproducible training
Experience designing and operating feedback-loop or continuous-learning ML systems where production outcomes are used to retrain and improve models over time
Experience with reinforcement learning, decision systems, simulation modeling, or optimization techniques
Proficiency in writing clean, maintainable, well-tested code with version control, CI/CD, and observability best practices
Experience with containerized deployments and orchestration (Docker, Kubernetes, Helm) and deploying ML services in both cloud and on-premise/VPC environments
Familiarity with drift detection, model monitoring, alerting, and governance frameworks for production ML
Experience designing ML architectures, APIs, and services that integrate with enterprise SaaS platforms
Ability to design modular, extensible ML systems that evolve alongside product requirements
Familiarity with AI-assisted development tools (e.g., Copilot, Cursor, Claude Code, or similar) and comfort using them to accelerate ML engineering workflows
Strong communication skills and the ability to explain model behavior, evaluation results, tradeoffs, and architectural decisions to technical and non-technical stakeholders
Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Engineering, or a related field
3+ years of experience building, deploying, and operating machine learning systems in production environments
Demonstrated experience with model evaluation, validation, and testing in production ML systems (strongly preferred)
Experience building CI/CD pipelines for ML—including automated testing, validation gates, and staged deployments (strongly preferred)
Experience with feedback-loop or continuous-learning ML architectures where models retrain on outcome data (preferred)
Experience with reinforcement learning, decision intelligence systems, or control systems (preferred)
Experience with simulation, optimization, constraint programming, or operations research techniques (preferred)
Experience building ML pipelines in cloud environments (Azure preferred)
Experience deploying ML systems in hybrid cloud/on-premise environments (nice to have)

Machine Learning Engineer

Key skills

About this role

Responsibilities:

Requirements: