ATTIX is a company focused on building a real-time messaging platform that integrates personal AI agents within conversations. They are seeking a Cloud Platform Engineer to develop and manage isolated AI agent containers at scale on AWS, with a focus on Kubernetes orchestration and infrastructure management.
Responsibilities:
- Build a provisioning system that spins up isolated AI agent containers on demand — one per user or one per workspace — with sub-60-second cold start times
- Multi-tenant container orchestration on AWS/Kubernetes (EKS) with hard security isolation between instances (network policies, resource limits, namespace isolation)
- WebSocket routing layer that bridges provisioned agent instances on AWS to Vama’s real-time messaging infrastructure (NATS-based, running on GCP) so agents appear as native chat participants
- Auto-scaling and resource management — spin down idle instances, right-size containers, manage GPU/CPU allocation for LLM inference if we go that route
- Infrastructure-as-code for the entire provisioning stack — Terraform/Pulumi, AWS APIs, EKS operators
- Cross-cloud networking to Vama’s GCP-hosted chat services
- Cost metering and billing integration per instance — track compute, memory, and API usage per user/workspace
- Health monitoring, auto-recovery, and observability for hundreds/thousands of concurrent agent instances
Requirements:
- 4+ years building platform infrastructure, internal developer platforms, or managed hosting systems
- Deep Kubernetes expertise — you've built custom operators, CRDs, managed multi-tenant clusters, not just deployed apps to K8s
- Container orchestration at the platform level — you understand namespace isolation, network policies, resource quotas, and pod security standards
- AWS infrastructure expertise — EKS, ECS/Fargate, EC2, Lambda, VPC networking, IAM, S3. This is an AWS-first role
- GCP familiarity is a bonus since our chat infrastructure lives there
- Infrastructure-as-code — Terraform, Pulumi, or equivalent. You define infrastructure in code, not click through consoles
- Networking fundamentals — WebSocket routing, load balancing, DNS, TLS termination. You'll be routing real-time traffic between chat services and agent containers
- You ship 3–5x using AI coding tools (Claude Code, Cursor, etc.). This is non-negotiable. We will test for this
- Experience building provisioning/orchestration systems for user-facing products (think: Vercel, Railway, Render, DigitalOcean App Platform)
- Go experience — Vama's backend is Go microservices. Doesn't need to be your primary language but you should be comfortable reading and contributing
- Node.js familiarity — our agent runtime is Node.js-based. Understanding its runtime characteristics helps with resource planning
- Cost optimization experience — you've managed cloud bills and know how to keep per-unit costs low at scale
- Experience with NATS, Kafka, or similar event-driven messaging systems