Lead the evaluation, approval, rollout, rollback, drift detection, and deprecation of AI services.
Manage models and configurations as code with versioning, test coverage, and audit trails.
Define and implement guardrails for toxicity, PII, policy violations, hallucination control, and grounding.
Conduct red-teaming exercises and containment strategies, and report on safety posture.
Align support structures, on-call coverage, escalation paths, and playbooks with Service Management and partners.
Ensure runbooks are current, usable, and effective for BAU and after-hours support.
Collaborate with DataOps on feature and embedding stores, retrieval/RAG patterns, lineage, and SLAs to ensure reliability and cost-efficiency.
Enforce tagging to surface product-level costs, monitor budget variances and anomalies, and drive safe optimisations such as caching, batching, and model sizing.
Maintain up-to-date standards, diagrams, evaluation reports, and runbooks to ensure clean handovers and reduce single points of failure.
Requirements
Proven experience managing the full lifecycle of AI models and agents — from evaluation and approval to rollout, rollback, drift detection, and deprecation.
Skilled in prompt/config/version management as code, RAG patterns, vector/feature stores, and runtime monitoring for latency, quality, and safety.
Hands-on experience with AWS services such as Bedrock and SageMaker (or equivalents), serverless runtimes, event-driven architecture, CI/CD pipelines, observability tools (CloudWatch/OpenTelemetry), and secure connectivity.
Practical knowledge of building and operating agentic experiences using Agentforce, integrating with Salesforce workflows (Service, Sales, Knowledge), and aligning runtime signals (safety, performance, cost) to business outcomes.
Strong understanding of data ingestion, identity resolution, harmonisation, segmentation, activation, and governance — with a focus on how Data Cloud supports AI/RAG use cases and downstream observability.
Proficient in Terraform-first provisioning across AWS, Snowflake, and Salesforce integrations.
Experience with HashiCorp Vault for secrets management, PKI, dynamic credentials, policy-as-code, automated rotation, and audit trails.
Deep knowledge of SLIs/SLOs, tracing, logging, alerting, incident response, post-incident reviews, change automation, rollback strategies, and resilience patterns.
Comfortable partnering with Support, Service Management, and vendors for BAU and after-hours operations.
Tech Stack
AWS
Cloud
Terraform
Vault
Benefits
A fully subsidised Southern Cross health insurance cover for you and your family.
Laptop, unlimited data plan and a market leading mobile phone.
Lifestyle leave, giving you the option to purchase an extra week or two of annual leave.
Discounts on One New Zealand products, services and much more!