Architect, maintain, and scale containerized AI services deployed to Kubernetes, with emphasis on Azure Kubernetes Service (AKS).
Design orchestration layers that manage model calls, downstream services, retries, rate limits, and failure handling.
Optimize system performance under load, including horizontal scaling, autoscaling policies, resource management, and cost control.
Implement WebSocket or real-time client communication patterns for interactive AI applications.
Contribute to infrastructure-as-code and CI/CD practices for AI service deployment, collaborating with CloudOps, DevOps, and application engineering teams to ensure reliability, availability, and operational standards are met.
Partner with Product and business stakeholders to translate projected traffic, adoption, and growth targets into scalable technical architectures and capacity plans and debug production level issues as needed.
Requirements
3-5 years of software engineering experience with strong fundamentals in object-oriented programming, design patterns, and distributed system design.
Professional experience in Python, C#, Java, or a similar language used in production systems.
Strong hands-on experience with containerization (Docker) and Kubernetes-based orchestration (AKS preferred).
Experience integrating AI/LLM workloads into enterprise-grade distributed systems.
Experience designing APIs and backend systems that support high concurrency and real-time interactions.
Experience designing event-driven architectures using messaging systems (Azure Service Bus, RabbitMQ, SQS).