Appspace is dedicated to creating better work experiences for people everywhere, and they are seeking a Senior DevOps & Site Reliability Engineer to ensure the reliability, performance, and scalability of their SaaS applications. The role involves building CI/CD frameworks, automating workflows, and collaborating with engineering teams to enhance operational efficiency and security compliance.
Responsibilities:
- Identifying manual "toil" and replacing it with automated workflows for monitoring, change management, and routine administration of large-scale VM environments to ensure a positive ROI
- Leading the integration of AI tools for automated code reviews, development frameworks, and predictive log analysis to drive departmental velocity and efficiency
- Designing and maintaining "self-service" deployment frameworks and CI/CD pipelines (GitHub Actions, Bamboo) using Infrastructure as Code (Bicep, Terraform)
- Evaluating platform components to determine the most cost-effective path: automating the current state or migrating features to modern, shared architectures
- Designing and maintaining a comprehensive observability stack across Azure and GCP (metrics, logs, traces) to identify performance bottlenecks and proactively address system defects
- Partner with engineering, security and operations teams to ensure new features are "born" with reliability, security and automated delivery in mind; Ensure adherence to security best practices and compliance standards (SOC2, HIPAA, ISO 27001) and operational excellence with cost efficiency
- Investigating complex performance defects by following log trails across web, application, and database tiers (SQL Server, MongoDB, MySQL)
- Ensuring all platforms meet security standards (SOC2, HIPAA, ISO 27001) through automated policy enforcement across Azure and GCP
Requirements:
- Must have a passion for life-long learning
- 6+ years in DevOps or SRE roles, with a proven track record of bridging development and operations in complex cloud environments
- Extensive experience with Microsoft Azure (IaaS, PaaS, App Services, Networking) and/or Google Cloud Platform (GCP)
- Expert-level PowerShell and Python skills. Hands-on experience with Bicep or Terraform is required
- Strong background in Windows/Linux Server OS, Kubernetes (AKS/GKE), Helm, and container orchestration
- Familiarity with various middleware and PaaS technologies (e.g. Event Hub, Service Bus, CosmosDB, RabbitMQ, MongoDB, etc.)
- Expert-level troubleshooting and the ability to reason through complex process workflows to identify faults in large-scale platform environments
- Experience with Atlassian suite (Jira, Confluence, Bitbucket)
- Experience with AI-driven log analysis or automated incident remediation
- Knowledge of database tuning (SQL Server, MySQL, MongoDB)
- Familiarity with compliance standards (SOC2, HIPAA, GDPR)