Assist with incident response and participate in an on-call rotation as the team grows
Work with SRE to improve system reliability, backups, and disaster recovery practices
Reduce manual operational work through automation and scripting
Help maintain system performance and platform stability as usage scales
Document infrastructure and operational procedures to reduce knowledge silos
Facilitate knowledge transfer and contribute to the growth and development of other engineers
In collaboration with a Security Engineer, ensure systems meet information security, compliance, and operational standards within their area of ownership
Adhere to internal processes for completing work items and deploying code to production
Other responsibilities as assigned
Requirements
B.S. in Computer Science or equivalent relevant experience or education
4+ years of professional software engineering experience working on production SaaS platforms or distributed systems – DevOps, Cloud Engineering
Experience operating production systems in a cloud environment (Azure preferred)
Hands-on experience with Kubernetes (AKS or similar)
Infrastructure as Code experience (Terraform preferred)
CI/CD pipeline experience (GitHub Actions or similar)
Strong Linux troubleshooting skills
Experience supporting production applications and diagnosing incidents
Familiarity with monitoring and alerting systems
Ability to work collaboratively with software engineers and assist with deployments
Ability to implement best practices and mentor engineers across multiple teams
Great teamwork and cross-functional collaboration skills
An eagerness to learn and adapt to the needs of a greenfield industry.