Telestream is a leader in the digital video industry, known for its innovative video transcoding and media exchange solutions. They are seeking a DevOps Engineer with hands-on Kubernetes expertise to enhance the reliability and scalability of their video processing infrastructure, focusing on CI/CD pipelines and cloud operations.
Responsibilities:
- Design, deploy, and administer production Kubernetes clusters, including workload scheduling, namespace management, RBAC, network policies, and cluster upgrades
- Design and maintain continuous integration/deployment pipelines to automate testing and deployment, including Kubernetes-native delivery workflows using Helm and ArgoCD or equivalent
- Track software performance, fixing errors, troubleshooting systems, implement preventative measures to ensure smooth workflows
- Implement and manage infrastructure. Utilize Terraform or CloudFormation for IaC management
- Optimize cloud resources by implementing cost-effective solutions
- Collaborate with various teams to ensure smooth deployment
- Monitor and create new processes based on performance analysis
- Implement security best practices, including automated compliance checks and secure code deployment
Requirements:
- Bachelor's Degree in Computer Science, Engineering or equivalent
- 7+ yrs of experience the following
- Hands-on experience building and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, or equivalent) with direct integration into Kubernetes deployment workflows
- Production-level experience with infrastructure as code (Terraform required; CloudFormation or Pulumi a plus), including managing cloud-hosted Kubernetes clusters (EKS, GKE, or AKS)
- Experience with monitoring, logging, and observability tooling in Kubernetes environments (Prometheus, Grafana, Datadog, ELK/EFK stack, or equivalent); ability to build dashboards and alerts from scratch, not just consume existing ones
- Demonstrated, hands-on Kubernetes experience in production environments: cluster administration, Helm chart authoring and management, RBAC configuration, persistent storage, horizontal/vertical pod autoscaling, and diagnosing and resolving real production failures (CrashLoopBackOff, OOMKilled, networking issues, etc.)
- Strong troubleshooting skills with the ability to diagnose infrastructure and application issues live, under pressure, without reference materials—this is evaluated directly in our interview process
- Excellent communication and collaboration skills
- Proficiency in scripting languages (Python, Go, Bash, or PowerShell); ability to write and own automation scripts, not just modify existing ones