Designing and operating scalable AI infrastructure for LLM inference, prompt management, and evaluation pipelines, supporting billions in premium flow.
Building self-service tools, SDKs, and APIs that empower product teams to move from prototype to production 30% faster.
Instrumenting production AI/ML workloads with standardised logging, tracing, and evaluation metrics, increasing observability coverage to 100% of deployed models.
Implementing intelligent routing, caching, and provider optimisation via the LLM gateway, reducing AI compute costs by up to 25%.
Driving adoption of shared platform services (LLM gateway, evaluation frameworks, monitoring) to replace bespoke solutions, increasing platform adoption across new AI features.
Championing developer experience by delivering comprehensive documentation and responsive support, resulting in higher internal customer satisfaction.
Requirements
Built and deployed production AI infrastructure that scaled to support enterprise-grade reliability and observability.
Delivered self-service tools or APIs that enabled multiple product teams to accelerate their AI/ML development cycles.
Implemented evaluation frameworks, A/B testing infrastructure, or monitoring solutions that measured and improved model performance, latency, cost, and quality in production.
Led initiatives to reduce AI compute costs through optimisation strategies such as intelligent routing or caching.
Successfully migrated teams from bespoke AI solutions to shared platform services, driving measurable adoption.
Prioritised and improved developer experience through documentation, support, or workflow enhancements.
Benefits
£5,000 training and conference budget for individual and group development.
25 days of holiday plus 8 bank holidays (33 days total).
Company pension scheme via Penfold.
Mental health support and therapy via Spectrum.life.