Own the design, build, and operations of a modular, API‑first platform that abstracts infrastructure from simulation software
Lead multi‑cluster Kubernetes, service mesh, security hardening, and observability to support hundreds of users and tens of thousands of entities across connected and air‑gapped environments
Design and operate multi‑cluster Kubernetes environments (control/data/CI/observability planes) with strong isolation and zero‑trust defaults
Implement service mesh (e.g., Istio) for mTLS, traffic control, and fine‑grained AuthorizationPolicy; manage ingress (Gateway/VirtualService) and east‑west policies
Author and maintain platform CRDs and controllers/operators to reconcile developer intent into runtime objects (namespaces/cells, Deployments/Jobs, Services, policies, gateways)
Integrate network policy (eBPF/Cilium), secrets management, RBAC/ABAC, and policy‑driven automation across environments
Stand up observability (metrics/logs/traces) and SLO monitoring; drive reliability (HA, backup/restore, DR)
Support air‑gap packaging/delivery and secure software supply chain (images, SBOMs, provenance)
Requirements
12+ years of Software Engineering experience
5–8+ years in platform/SRE roles operating production Kubernetes at scale
Strong multi‑cluster and GitOps (Argo CD/Flux) experience
Hands‑on with Istio/Envoy, Cilium (NetworkPolicy, eBPF), Ingress/Gateway API, and cluster networking (DNS, L7/L4)
Controller/operator development using Go (Kubebuilder/Operator SDK) or TypeScript‑based frameworks; CRD design/versioning
Observability: Prometheus, OpenTelemetry, Fluent Bit/OpenSearch; incident response and performance tuning