Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. As a DevOps Engineer, you will play a critical role in building and maintaining robust development and testing infrastructure, focusing on automation for on-premise data centers.
Responsibilities:
- You will design and implement comprehensive automation frameworks to accelerate software delivery and infrastructure management
- You will manage and scale containerized workloads across hybrid environments using Docker and Kubernetes orchestration
- You will build and maintain robust CI/CD pipelines within GitHub or GitLab to ensure seamless code deployment and testing
- You will develop sophisticated monitoring and alerting systems using tools like Prometheus and Grafana to ensure high system availability
- You will create and manage infrastructure-as-code templates using Terraform and Ansible to maintain consistent testing environments
Requirements:
- You are a seasoned professional who bridges the gap between cloud-based workflows and physical, on-premises data center environments
- You are a code-first engineer who prioritizes building reusable automation scripts in Python, Go, or Bash over manual configuration
- You possess a proactive mindset that focuses on identifying system bottlenecks and infrastructure vulnerabilities before they impact development
- You are a collaborative team member who communicates effectively with cross-functional groups like software engineering and QA
- You are an analytical troubleshooter capable of performing deep-dive root cause analysis on both software pipelines and hardware failures
- You will design and implement comprehensive automation frameworks to accelerate software delivery and infrastructure management
- You will manage and scale containerized workloads across hybrid environments using Docker and Kubernetes orchestration
- You will build and maintain robust CI/CD pipelines within GitHub or GitLab to ensure seamless code deployment and testing
- You will develop sophisticated monitoring and alerting systems using tools like Prometheus and Grafana to ensure high system availability
- You will create and manage infrastructure-as-code templates using Terraform and Ansible to maintain consistent testing environments