Appspace is a company dedicated to creating better work experiences for people everywhere. They are seeking a Senior Site Reliability Engineer to ensure the reliability, performance, and scalability of their SaaS applications by designing, implementing, and maintaining robust systems and processes.

Responsibilities:

Executing projects that rollout new platform maintenance features, automate tasks, or other big picture changes to improve our customers’ experience on our Cloud Platform
Deploying new features and releases of our software into Kubernetes via Helm, so strong experience in Kubernetes and Helm is a must
Troubleshooting performance issues or errors thrown by the cloud platform or application, and either resolving the underlying cause, or forwarding your research to Engineering to address in the product
Mentoring others towards technical and procedural success and providing daily operational support to our DevOps team members
Actioning Request Tickets from other teams in support of their needs to enable and prepare for upcoming releases
Monitoring and maintaining our Platform’s, uptime, resiliency and performance, looking for improvement opportunities, and proactively taking action to solve any negative trends before they become issues
Lead, Participate, or Execute within the incident management process when alerts fire, and quickly ascertain root cause, resolve the issue, and find new and creative solutions to prevent recurrence
Configure, Monitor, Research, and Evaluate workload performances both on Google Cloud Platform and Microsoft Azure Clouds
Security and Compliance: Work closely with security teams to ensure adherence to security best practices and compliance standards
Collaborating with our Development and Quality Assurance teams to address issues in the product and platform, particularly around recurring problems
Documenting new or updating existing processes and procedures to share knowledge and improve on standardized approaches to solution

Requirements:

Must have a passion for life-long learning
Must communicate well and adapt to working well with others across different countries and cultures
Strong background in Containers, Kubernetes, Helm, Linux, Python coding, and some experience with Windows Server OS and MacOS are a must
Experience with Google Cloud Platform and Microsoft Azure required
Expert-level troubleshooting experience and the ability to reason through a process workflow to identify a fault or odd behavior (i.e., spending time following log trails)
Must be flexible on occasionally attending 'off-hour' meetings (we're a global team supporting a global customer base!)
No travel required for this role
Experience with administering MySQL & MongoDB preferred
Experience with administering message brokering systems like RabbitMQ preferred
Experience with Build pipeline tools and the Atlassian suite (JIRA, Confluence, Bitbucket/Git, Azure DevOps, Bamboo, Octopus)
Experience with monitoring and alerting platforms, especially StackDriver
Experience with HashiCorp Terraform
Experience with IIS

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: