Julius AI is an applied AI lab focused on building coding agents that execute millions of lines of code for a vast user base. They are seeking a Mid - Senior Software Engineer to design and operate secure, multi-tenant container infrastructure while ensuring reliability, performance, and security across cloud environments.
Responsibilities:
- Design and operate secure, multi‑tenant container infrastructure with fast startup and smart autoscaling
- Ship cloud deployments (Helm/Terraform) with SSO, network controls, and audit logging
- Drive observability (metrics, traces, logs) with clear SLOs; lead incident response
- Optimize images, scheduling, networking, and cost; build fair‑use and rate‑limiting controls
Requirements:
- Production Kubernetes and container internals (Docker/containerd); strong networking fundamentals
- Cloud (AWS/GCP/Azure) and IaC (Terraform/Helm)
- Monitoring/Logging (Prometheus, Grafana, OpenTelemetry, ELK/Vector)
- Security best practices for containerized, multi‑tenant systems
- gVisor/Kata/Firecracker; Cilium/eBPF; GPU scheduling; serverless autoscaling (KEDA/Knative/Karpenter)
- You've built an AI side project and enjoy tinkering with LLMs