Motive empowers the people who run physical operations with tools to make their work safer, more productive, and more profitable. As a Staff Site Reliability Engineer on the Platform team, you will design, scale, and manage AWS-backed services for millions of connected IoT devices and ensure high availability and performance.
Responsibilities:
- Collaborate with other engineering and product teams to design and build the infrastructure and services required to deliver new features to customers in a cloud-native and event-driven fashion
- Leverage and progress our IaC (Terraform) and CM (Helm) code and strategies for advanced scaling and self-service usage by engineering teams
- Identify and remove bottlenecks from systems in production throughout AWS services and with our Kubernetes platform
- Ensure 99.99% customer-facing uptime
- Continuously improve the monitoring and alerting capabilities of our platform, enabling us to be proactive instead of reactive
Requirements:
- 8+ years of professional SRE/DevOps experience, and a demonstrated ability working on high volume production systems
- Demonstrable systems architect expertise, solving complex technical problems and implementing company wide solutions
- Advanced knowledge of AWS services and technologies (ALB/ELB, IAM permissions, DynamoDB, SNS, EKS/Fargate, etc.)
- Experience with infrastructure as code and configuration management (Terraform and Helm charts especially) to design and provision new services
- Knowledge of Python, Bash or other scripting languages. Knowledge of Ruby or Golang is a plus
- High-level of ownership and drive to work with others and see improvements through to production