Own the AWS architecture underpinning Toqan and Toqan Claw — not as a supporting function, but as the person who defines how these systems scale.
Design the cloud infrastructure patterns that let AI workloads run reliably across 10+ operating companies.
Set the guardrails that keep the whole AI Platform team building consistently and safely.
Review infrastructure performance dashboards for Toqan and Toqan Claw — checking latency, availability, and cost signals across the OpCos currently live on the platform.
Work alongside a team of 10 engineers who are moving fast and thinking at group scale.
Requirements
Deep hands-on AWS expertise across compute, networking, storage, and managed services — you know which service to reach for and why
Proven track record of scaling infrastructure for AI or ML workloads in production, where reliability and latency are non-negotiable
Strong command of infrastructure-as-code — Terraform, CDK, or equivalent — applied at real scale, not just in greenfield projects
Experience operating in a multi-product or platform-team context, where your decisions ripple across multiple engineering teams and products
Proficiency in Go (5+ years), with a track record of building and operating production backend services; Python is a bonus
Hands-on experience integrating with multiple AI and LLM providers in production — you understand how model capabilities translate into robust, scalable backend systems
Comfortable owning CI/CD pipelines, automated test infrastructure (unit, integration, E2E), and build systems end-to-end
Systems-level thinking — you design for reliability, scalability, and performance from the start, not as an afterthought
Comfortable defining and enforcing infrastructure standards and guardrails — you've set the bar for a team, not just met it
Experience with LLM serving infrastructure — vLLM, Triton, SageMaker, or similar — is a strong plus
Familiarity with Kubernetes and container orchestration at scale is a plus
Experience building and maintaining event-sourced systems is a plus
Direct experience building MCP servers or working with Model Context Protocol is a plus