BigBear.ai is a leading provider of AI-powered decision intelligence solutions for national security, supply chain management, and digital identity. They are seeking a highly skilled and experienced Principal Kubernetes Platform Engineer responsible for the reliability, security, scalability, and operational excellence of their Kubernetes estate across multiple cloud environments.
Responsibilities:
- Own day-to-day and strategic administration of Kubernetes clusters across multiple cloud environments (AKS/EKS/GKE), including Azure Government enclaves where applicable
- Design, build, secure, and operate highly available Kubernetes platform architectures (multi-zone, upgrade-safe, disaster recovery-ready)
- Establish and enforce cluster standards: namespaces/tenancy, RBAC, Pod Security standards, admission control, network segmentation, and workload isolation
- Implement and maintain end-to-end platform security controls: image provenance, vulnerability management, runtime protection, secrets management, and certificate lifecycle
- Build and mature GitOps/CI/CD patterns for Kubernetes (e.g., Flux/Argo), ensuring reliable, repeatable deployments with strong auditability
- Manage Kubernetes lifecycle operations: version upgrades, node pool strategy, capacity planning, add-on management, and cluster hardening
- Define and operate observability for clusters and workloads: logging, metrics, traces, alerting, SLOs/SLIs, and actionable runbooks
- Proactively ensure the highest levels of platform availability and performance; lead root-cause analysis and drive permanent corrective actions
- Maintain security, backup, and redundancy strategies for etcd (where applicable), persistent storage, cluster state, and critical platform services
- Secure and maintain the stack to fix cybersecurity vulnerabilities, CVEs, misconfigurations, and supply-chain risks; coordinate remediation timelines with stakeholders
- Provide 2nd and 3rd level support for Kubernetes and containerized workloads, including incident response participation and on-call support as required
- Partner with application teams to set best practices for containerization, resource requests/limits, health probes, service discovery, ingress, and release safety
- Develop and maintain automation to reduce manual intervention (IaC, policy-as-code, auto-remediation, self-service workflows, and automated compliance evidence)
- Liaise with cloud vendors and internal stakeholders for platform problem resolution and architectural guidance
- Maintain our environment to comply with FedRAMP High requirements and support regular reporting and audit evidence collection
- Uphold and enforce Ask Sage’s compliance, privacy, and security policies, ensuring adherence to all relevant regulations and standards
- Conduct regular audits of Kubernetes configurations and platform controls; recommend and implement enhancements aligned to benchmarks and risk posture
Requirements:
- Minimum of 7 years of experience in infrastructure/platform engineering, including at least 4 years of deep, hands-on Kubernetes administration in production
- Clearance: TS/SCI required
- Demonstrated expertise operating Kubernetes across multiple cloud providers (AKS + EKS and/or GKE)
- Strong knowledge of Kubernetes internals and critical subsystems: scheduling, networking (CNI), DNS, ingress, storage (CSI), RBAC, admission control, and upgrades
- Strong security background in container and Kubernetes hardening (e.g., policy controls, least privilege, network policies, secrets handling, supply chain security)
- Proficiency with Infrastructure-as-Code and automation (e.g., Terraform, Ansible) and scripting (e.g., Bash, Python, Go)
- Experience with observability tooling and operational maturity (monitoring/alerting, incident response, SLOs)
- Familiarity with compliance-driven environments and producing audit-ready evidence (FedRAMP/DoD environments a plus)
- Relevant certifications preferred (one or more): CKA/CKS, Azure Solutions Architect, AWS Solutions Architect, Security+, CISSP
- Demonstrated expertise operating Kubernetes across multiple cloud providers (AKS + EKS and/or GKE); Azure Government experience strongly preferred