NeuBird AI is scaling rapidly and seeking a DevOps engineer to build and maintain deployment pipelines for their product. The role involves managing CI/CD infrastructure, optimizing deployment strategies, and collaborating with engineering teams to ensure efficient code deployment.
Responsibilities:
- Manage and optimize our deployment pipeline for Hawkeye, our AI SRE agent
- Maintain our CI/CD systems using Travis and GitHub Actions
- Ensure our Docker containerization strategy scales efficiently
- Manage our Kubernetes clusters running on AWS and Azure that power our production workloads
- Implement and evolve our GitOps workflows using FluxCD to keep deployments consistent, auditable, and recoverable
- Work closely with engineering teams to improve build times, debug deployment failures when they happen, and design deployment strategies that minimize downtime and risk
- Write and maintain CI/CD configurations, connecting automated test stages into the pipeline (unit tests, integration tests, smoke tests)
- Optimize Docker images for size and security, manage Kubernetes manifests and Helm charts, and ensure our GitOps practices actually work when things go wrong
- Troubleshoot failed deployments across environments, trace issues back to their root cause (bad config, resource constraints, networking problems), and implement fixes that prevent recurrence
- Monitor pipeline performance, implement security scanning in the build process, manage infrastructure across AWS and Azure environments, and automate everything that slows teams down
Requirements:
- 3-5 years building and maintaining CI/CD pipelines and deployment infrastructure, ideally at a SaaS company shipping to production frequently
- Deeply experienced with Travis and GitHub Actions
- Know Docker inside and out (multi-stage builds, layer optimization, security scanning)
- Managed production Kubernetes clusters on AWS and Azure
- Understand GitOps principles and have hands-on experience with FluxCD or similar tools like ArgoCD
- Comfortable working with AWS services (EKS, ECR, CloudWatch, IAM) and Azure equivalents (AKS, ACR, Azure Monitor)
- Understand infrastructure as code, whether that's Terraform, Helm, Kustomize, or raw Kubernetes YAML
- Know how to debug when deployments fail—reading logs, checking pod status, tracing network issues, investigating resource constraints
- Connected test automation into CI/CD pipelines and know how to balance test coverage with pipeline speed
- Pragmatic approach to automation and manual solutions
- Write documentation that actually helps people
- Understand that the best pipeline is the one that developers trust and rarely think about because it just works