Otter.ai is a leading tool for meeting transcription and collaboration, aiming to make conversations more valuable. They are seeking a talented Production Engineer to build and operate large-scale systems, focusing on automation, performance, and system reliability.
Responsibilities:
- Help automate the continuous integration and testing processes to enable and scale
- Manage and maintain infrastructure
- Own, design and implement monitoring systems such as Prometheus and Grafana
- Optimize Linux systems for performance, reliability, and security
- Own configuration management process(es) and build product features as appropriate
- Investigate and dig into data to find the root of a problem and strategize with our engineers on solutions
- Participate in on-call rotation
Requirements:
- 2+ years experience in SRE, Production Engineering and/or DevOps
- Expert level experience architecting, developing, and troubleshooting large scale systems
- Advanced level proficiency with one or more programming languages (i.e. Python, Golang)
- Deep experience with data structures and Linux systems internals (e.g., filesystems, system calls) and administration
- Extensive experience with CI/CD pipelines and infrastructure as code (i.e. Terraform, Ansible)
- Strong familiarity with AWS services (i.e. ECS, S3, ALB, VPC)
- Knowledge in containers and orchestration using Kubernetes
- Experience building production quality cloud infrastructure that enables reliable and rapid deployment of large-scale systems with effective monitoring and resilient operations
- Proven track record taking on projects from inception to launch
- Bachelor's in Computer Science or Electrical Engineering
- Master's in Computer Science or Electrical Engineering