IR Labs is the innovation lab inside Integrated Research focused on turning cutting-edge AI research into products. The Principal DevSecOps Engineer will build the core platform for the company, establish secure automation patterns, and drive operational excellence across teams.
Responsibilities:
- Serve as the founding infrastructure engineer, building the core platform that scales the company and raises the reliability bar
- Establish secure, repeatable IaC/GitOps patterns (Terraform/CloudFormation) and automated delivery (GitHub Actions, ArgoCD)
- Partner with teams pre-GA on design reviews, capacity planning, and readiness
- Define and drive SLIs/SLOs/SLAs and an error-budget culture for services and ops
- Eliminate toil with end-to-end automation across provisioning, config, testing, and operations
- Co-design platforms with ML, backend, and security to safely power AI/ML workloads
- Architect multi-region resilience—backup, DR, and failover—balancing availability, consistency, and cost
- Advance observability and incident excellence; make smart bets on emerging infra tools
- Codify production engineering standards and coach teams toward operational excellence
Requirements:
- 8+ years operating high-availability, fault-tolerant distributed systems with IaC and GitOps
- Strong coding in Go/Python/Rust plus solid shell skills; comfortable extending Kubernetes via CRDs
- Deep Kubernetes/EKS expertise; mastery of containerization and service networking
- Hands-on with AWS primitives (VPC, EC2, S3, IAM, RDS) and multi-region traffic/failover
- Observability pro (Prometheus, Grafana, OpenTelemetry, Fluentd, Jaeger) with strong RCA/incident chops
- Security fundamentals: IAM, secrets management, and compliance guardrails (SOC2/HIPAA/GDPR)
- Experience building secure, self-service platforms (SDKs/APIs/portals, e.g., Backstage/TypeScript)
- Proven SRE practice—SLIs/SLOs, error budgets—and strong testing, reviews, and CI/CD habits
- Clear communicator and mentor who thrives in fast-moving environments and collaborates across ML, data, and backend teams