Design, develop, and maintain infrastructure for AI inference workloads, including GPU scheduling, model deployment pipelines, and data access patterns in on-prem environments
Build and manage monitoring and observability tools for AI inference platforms, including dashboards, alerts, and runbooks for model health and system performance
Collaborate with ML engineers and platform teams to design system architecture for AI workloads, integrate inference runtimes, and test performance at scale

Hands-on Experience In Containerization and Container Orchestration: Kubernetes, Helm, Docker/CRI-O
Linux and networks
Programming and Scripting: Python/Go/Bash
Infrastructure as Code (IaC) approach: Ansible, Terraform
Creating CI/CD pipelines: GitLab/GitHub actions
Experience with Cluster API or any other "Kubeception" technology
Deep experience with Kubernetes CNI, CSI, and Operators
Nice to Have Knowledge in Kubernetes-related technologies such as ArgoCD, Helmfile
Experience with Prometheus stack
Experience with other Cloud Native technologies.

*Benefits may vary depending on your location.

DevOps Engineer, Cloud AIaaS

Key skills