Appspace is dedicated to creating better work experiences for people everywhere, and they are seeking a Senior DevOps & Site Reliability Engineer to ensure the reliability, performance, and scalability of their SaaS applications. The role involves building CI/CD frameworks, automating workflows, and collaborating with engineering teams to enhance operational efficiency and security compliance.

Responsibilities:

Identifying manual "toil" and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI
Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency
Designing and maintaining "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform)
Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures
Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects
Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency
Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL)
Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP

Requirements:

Must have a passion for life-long learning
6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP)
Expert-level PowerShell and Python skills. Hands-on experience with Bicep or Terraform is required
Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments
Experience with Atlassian suite (Jira, Confluence, Bitbucket)
Experience with AI-driven log analysis or automated incident remediation
Knowledge of database tuning (SQL Server, MySQL, MongoDB)
Familiarity with compliance standards (SOC2, HIPAA, GDPR)

Senior DevOps & Site Reliability Engineer - Americas

Key skills

About this role

Responsibilities:

Requirements: