Lean TECHniques is a company that values freedom and flexibility in the workplace. They are seeking a Site Reliability Engineer to design, build, and maintain reliable infrastructure and systems, while improving performance and automating processes.
Responsibilities:
- You’ll join forces with a small, collaborative team to design, build, and maintain highly reliable infrastructure and systems that power our applications
- You’ll work closely with engineers to improve system performance, reliability, and scalability while helping automate infrastructure and deployment processes
- You’ll help build and maintain modern cloud infrastructure using AWS, leveraging tools like Terraform to ensure environments are reproducible and manageable
- You’ll design and maintain CI/CD pipelines using GitHub Actions, enabling teams to ship faster and safer
- You’ll deploy and manage containerized applications in Kubernetes, helping improve reliability, observability, and operational efficiency
- You’ll write scripts and automation using Bash and work comfortably in UNIX-based environments to streamline operations and reduce manual toil
- You’ll participate in incident response and troubleshooting when things break (because sometimes they will), and help implement long-term improvements to prevent them from happening again
- You’ll help improve monitoring, alerting, and system visibility so we can detect and resolve issues quickly
- You’ll continuously look for ways to automate processes, reduce operational overhead, and make systems more resilient
Requirements:
- Experience working as a Site Reliability Engineer, DevOps Engineer, or similar role
- Hands-on experience with AWS cloud infrastructure
- Experience managing infrastructure using Terraform
- Experience building or maintaining CI/CD pipelines with GitHub Actions
- Experience running and operating applications in Kubernetes
- Strong experience working in UNIX/Linux environments
- Ability to write automation scripts using Bash
- A mindset focused on automation, reliability, and scalability
- Experience troubleshooting production systems and improving system reliability
- A collaborative attitude and strong communication skills