NeuBird AI is scaling rapidly and is seeking a DevOps Engineer to build and maintain deployment pipelines for their product. The role involves owning CI/CD infrastructure and working closely with engineering teams to optimize deployment processes and troubleshoot issues.
Responsibilities:
- Manage and optimize our deployment pipeline for Hawkeye, our AI SRE agent
- Maintain our CI/CD systems using Travis and GitHub Actions
- Ensure our Docker containerization strategy scales efficiently
- Manage our Kubernetes clusters running on AWS and Azure that power our production workloads
- Implement and evolve our GitOps workflows using FluxCD to keep deployments consistent, auditable, and recoverable
- Work closely with engineering teams to improve build times and debug deployment failures
- Design deployment strategies that minimize downtime and risk
- Write and maintain CI/CD configurations, connecting automated test stages into the pipeline
- Optimize Docker images for size and security
- Manage Kubernetes manifests and Helm charts
- Ensure our GitOps practices actually work when things go wrong
- Troubleshoot failed deployments across environments and trace issues back to their root cause
- Implement fixes that prevent recurrence
- Monitor pipeline performance and implement security scanning in the build process
- Manage infrastructure across AWS and Azure environments
- Automate everything that slows teams down
Requirements:
- 3-5 years building and maintaining CI/CD pipelines and deployment infrastructure, ideally at a SaaS company shipping to production frequently
- Deeply experienced with Travis and GitHub Actions
- Know Docker inside and out (multi-stage builds, layer optimization, security scanning)
- Managed production Kubernetes clusters on AWS and Azure
- Understand GitOps principles and have hands-on experience with FluxCD or similar tools like ArgoCD
- Comfortable working with AWS services (EKS, ECR, CloudWatch, IAM) and Azure equivalents (AKS, ACR, Azure Monitor)
- Understand infrastructure as code, whether that's Terraform, Helm, Kustomize, or raw Kubernetes YAML
- Know how to debug when deployments fail—reading logs, checking pod status, tracing network issues, investigating resource constraints
- Connected test automation into CI/CD pipelines and know how to balance test coverage with pipeline speed
- Pragmatic approach to automation and manual solutions
- Write documentation that actually helps people
- Understand that the best pipeline is the one that developers trust and rarely think about because it just works