AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraformTypeScriptGoBashGCPGoogle CloudGitHub ActionsGitLab CICloudFormationDatadogELK StackGitHubGitLabCI/CDMentoringCommunicationRemote Work
About this role
Role Overview
You'll own the backbone of Onebrief’s COTS deployments: automate, secure, and scale the platform so product teams can move fast without breaking trust.
Building and automating the platform: Design, provision, and manage cloud and on‑prem environments with Terraform and Ansible; operate secure, resilient Kubernetes clusters for multi‑level deployments.
Designing and evolving CI/CD: Reduce lead time and maintain strict gates and environment parity by improving pipelines in GitLab CI/CD, Jenkins, and GitHub Actions.
Upholding reliability, security, and compliance: Partner with SREs to implement monitoring, logging, and alerting (Prometheus, Grafana, ELK/Datadog); embed RMF and STIG controls into automation and day‑to‑day operations.
Operating and optimizing our footprint: Manage AWS/Azure/GCP resources and on‑prem networks. Balance cost, performance, and security while unblocking product teams.
Supporting the team, documenting, and mentoring: Triage and resolve complex platform issues; write clear docs for architectures and runbooks; share context and mentor teammates.
Requirements
5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus.
Proven partner to DevOps/SRE and application teams; collaborates well across functions and shares context openly.
Clear, concise writing; strong documentation habits and async communication.
Infrastructure as Code: Terraform (or CloudFormation), Ansible.
Containers and orchestration: Docker; Kubernetes design, deployment, and operations.
CI/CD: experience building and maintaining pipelines (GitLab CI/CD, Jenkins, GitHub Actions).
Scripting: proficiency with at least one of Python, Go, or Bash.
Cloud and on‑prem: AWS, Azure, or Google Cloud; familiarity with on‑prem virtualization.
Observability: Prometheus, Grafana, ELK stack, or Datadog.
Networking fundamentals: core protocols and secure configurations.
Must have a Secret Clearance and be eligible for a TS/SCI clearance.
Tech Stack
Ansible
AWS
Azure
Cloud
Docker
Google Cloud Platform
Grafana
Jenkins
Kubernetes
Prometheus
Python
Terraform
TypeScript
Go
Benefits
Flexible Work Environment: Remote work with flexible hours and unlimited PTO.
Comprehensive Health Coverage: Health, dental, vision, and life insurance.
Retirement Plan: 401(k) plan to secure your future.
Parental Leave: 8 weeks at 100% regardless of state.
Company Retreats: Annual company summit trips.
Home Office Budget: $1,000 per year for home office improvements.