Ensure services and systems are reliable by focusing on scalability, latency, performance, availability, efficiency, and observability
Develop systems and maintain key components to automate and minimize human labor
Enhance system reliability while decoupling system size from operational toil and complexity
Training and mentoring
both technical and customer satisfaction oriented
Thought leadership
Develop readable and reusable code by applying standard patterns and using standard libraries
Mentor junior team members on code readability and reusability
Ensure data security, integrity, and quality by adhering to company standards and best practices
Design solutions that fulfill current business requirements and accommodate future enhancements
Guide teams to guarantee systems are reusable and interoperable
Use best practices to reduce risks to business operations by creating clear documentation like runbooks and operational guides
Collaborate with development teams to define and implement relevant observability metrics, with the goal of enhancing application reliability
Spearhead incident response for issues impacting their track
Leverage new technologies and automation to reduce operational and maintenance costs
Eliminate technical debt, identify scaling bottlenecks, and proactively plan for future growth to ensure infrastructure is kept up-to-date
Use excellent technical judgement, innovation, and execution to prioritize and solve challenging problems
Independently drive business results across multiple teams by either leading high-level architectural decisions or implementing complex components of a project
Set technical strategy and define technical roadmaps with cross-team dependencies for business impacting projects, requiring a high level of technical expertise
Requirements
Demonstrated experience in Object Oriented Programming with Python or Go
Tool Stack: Terraform, AWS, Docker, Vault, ProxySQL, and Gitlab
Skills: Automation (General), Infrastructure as Code, Python Scripting, Bash shell scripting and Unix system admin
5 Years working in an SRE role, including a minimum of two years verifiable experience working with terraform on AWS in a production environment
Intermediate-level experience with MySQL or other relational database systems
Logical and analytical thinking for problem-solving
Proven ability to systematically identify patterns and underlying issues in complex situations
Strong skills in identifying opportunities to improve processes, systems, and structures, enhancing performance through analysis and assessment of existing process flows, methods, and standards
Experience in delivering clear, well-structured, and meaningful information to a target audience, using appropriate communication mediums and language tailored to the audience
Ability to achieve mutually agreeable solutions by staying adaptable, communicating ideas clearly, and practicing active listening
Tech Stack
AWS
Docker
MySQL
Python
Shell Scripting
Terraform
Unix
Vault
Go
Benefits
Competitive total rewards package
Blog during work hours; take a day off and volunteer for your favorite charity
Flexibly work remotely from your home, there’s no daily travel requirement to an office!
Substantial training allowance; participate in professional development days, attend training, become certified
Annual budget to personalize your work environment
Annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)