Kraken is a mission-focused company rooted in crypto values, striving to accelerate the global adoption of crypto. The Senior Software Engineer in AI Infrastructure will design and build the infrastructure layer powering AI agent systems in production, focusing on high-performance services and scalable systems.
Responsibilities:
- Design and build the infrastructure layer powering AI agent systems in production
- Develop high-performance Rust services that handle model inference, orchestration, and execution
- Architect scalable systems capable of supporting millions of users and high request throughput
- Build reliable ML infrastructure and MLOps patterns for model deployment, evaluation, and monitoring
- Define guardrails, observability, and failure handling for agent-driven workflows
- Optimize latency, throughput, and cost across inference and orchestration layers
- Partner closely with the Agent Systems team to translate experimental prototypes into hardened production systems
- Contribute to foundational infrastructure decisions in a high-scale, high-impact environment
Requirements:
- 5+ years of experience building and operating high-scale production systems
- Strong proficiency in Rust and systems-level programming
- Deep understanding of distributed systems, reliability engineering, and performance optimization
- Experience operating services serving millions of users or high-throughput workloads
- Familiarity with ML infrastructure, model serving, or MLOps in production environments
- Experience designing observability, monitoring, and failure recovery systems
- Strong collaboration skills working across infrastructure and applied engineering teams
- High ownership mindset in high-stakes production environment
- Experience building infrastructure for agent-based or LLM-powered systems
- Background in high-performance networking, async systems, or low-latency architectures
- Experience with container orchestration and cloud-native infrastructure
- Familiarity with evaluation frameworks and model performance monitoring at scale
- Experience working in fast-moving 0→1 or platform-building teams