Terawatt is a leader in delivering large scale, turnkey charging solutions for autonomous and electric vehicles. As a Staff DevOps Engineer, you will contribute to the development and reliability of Terawatt’s charging network management system and help scale the cloud infrastructure to support organizational growth.
Responsibilities:
- Lead and architect the evolution of our cloud infrastructure using Terraform, building resilient and scalable systems to support business growth
- Maintain helm charts and deployment patterns that enable teams to manage the lifecycle of their services while adhering to established deployment standards
- Build tooling to enable engineering teams to own the application deployment process through CI/CD pipelines using GitHub Actions
- Promote security best practices across all layers of the stack, including software access, managed workloads, and services running in pre-production and production environments
- Strengthen cloud and network security using industry-standard tools to detect vulnerabilities and anomalies, and help prevent suspicious or malicious activity
- Advance observability practices using frameworks such as OpenTelemetry (OTel) and tools like Grafana Cloud for monitoring and alerting across services and infrastructure
- Develop tooling that supports both local and remote container-based cloud development workflows
- Create and automate simulated production scenarios used for testing during development and validating production releases
- Implement automation and alerting to maintain security and compliance standards, including SOC 2 controls
- Design and manage infrastructure that supports machine learning model training and deployment, ensuring scalable compute resources for ML workloads
- Partner with the Data team to manage core data infrastructure, including our Databricks data lake and Kafka event streams (Aiven/AWS), while advising on scalable data architecture and infrastructure improvements
- Contribute to building a highly available, web-based depot operations platform that supports the future of EV charging using NodeJS
- Participate in a 24/7 on-call rotation to support the reliability of production systems
Requirements:
- 8+ years of experience building and operating high availability production software systems, preferably in DevOps or platform engineering teams
- Experience building and maintaining scalable cloud-based infrastructure, including services running in managed Kubernetes (EKS)
- Experience building or maintaining CI/CD pipelines (e.g., GitHub Actions) to support reliable software delivery
- Experience leading or contributing to SRE or DevOps initiatives supporting production cloud platforms
- Experience with observability frameworks and tools (e.g., OpenTelemetry, Grafana, or similar platforms)
- Experience working with managed databases such as PostgreSQL, MongoDB, or similar systems
- Strong communication skills and the ability to collaborate effectively across engineering, product, and infrastructure teams
- Experience working with multi-region AWS infrastructure and Kubernetes (EKS) at scale
- Experience improving security and compliance practices through automation and internal tooling
- Experience implementing or scaling observability standards using OpenTelemetry and tools like Grafana Cloud
- Experience maintaining or scaling data infrastructure, such as Databricks, Kafka (MSK), or similar streaming/data platforms
- Proficiency in Python or NodeJS