Cadre5, founded in 1999 in East Tennessee, provides innovative technical solutions and is seeking a Kubernetes Platform Engineer for the American Science Cloud initiative. This role involves managing the lifecycle of Kubernetes clusters, troubleshooting issues, and implementing security measures to support large-scale computational science efforts.
Responsibilities:
- Manage the full lifecycle of Kubernetes clusters (on-premises K3s/RKE2, GKE, and EKS), including upgrades, security patching, scaling, and capacity planning
- Troubleshoot cluster-level issues including control plane problems, node failures, and resource constraints
- Implement and maintain cluster security hardening based on CIS benchmarks and organizational security policies
- Manage etcd cluster health, backup procedures, and disaster recovery capabilities
- Monitor cluster performance and optimize resource utilization across multi-tenant workloads
- Coordinate with datacenter operations team for physical infrastructure changes and maintenance windows
- Implement, configure, and maintain Cilium CNI across on-premises and cloud Kubernetes environments
- Design and enforce network policies to achieve secure multi-tenant isolation
- Troubleshoot complex pod networking issues including DNS resolution, service discovery, and connectivity problems
- Configure and maintain BGP peering with physical network infrastructure for on-premises integration
- Work with network engineering team on firewall rules, VLANs, IPv6 networking, and network architecture
Requirements:
- Typically requires a minimum of 8 years of related experience with a Bachelor's degree; or 6 years and a Master's degree; or equivalent experience
- Demonstrated experience administering Kubernetes on on-premises infrastructure (K3s, RKE2, or similar bare-metal distributions)
- Experience with cloud-managed Kubernetes (GKE and/or EKS)
- Strong understanding of Linux networking fundamentals: iptables/nftables, routing tables, DNS, TCP/IP stack, network troubleshooting
- Experience with GitOps methodologies and tools such as ArgoCD or Flux
- Proficiency in scripting and automation: Bash, Python, Go
- Cilium CNI or equivalent production experience
- Ability to work collaboratively in a team environment and communicate technical concepts clearly
- Understanding of Kubernetes security best practices including Pod Security Standards, RBAC, and secrets management
- GCP (Google Cloud Platform) and/or AWS (Amazon Web Services) cloud platform experience
- The ability to obtain and maintain a Department of Energy 'Q' clearance is required. This requires US Citizenship
- Go programming experience for operator maintenance and platform tooling development
- CKA (Certified Kubernetes Administrator) or CKS (Certified Kubernetes Security Specialist) certification
- Background in BGP routing protocols and network engineering concepts
- IPv6 networking experience
- Infrastructure as Code experience with Terraform or Ansible
- Experience with internal developer platform (IDP) tools such as Backstage or similar
- Experience with service mesh technologies (Istio, Linkerd)
- Excellent understanding of code review and familiarity with GitHub and GitLab workflows