The University of Washington is seeking a Site Reliability Engineer 3 to join UW Medicine IT Services, a shared services organization that supports all of UW Medicine. The role involves applying software engineering methodologies to enhance infrastructure reliability, supporting web and mobile applications, and leading transitions to cloud environments.
Responsibilities:
- Apply software engineering methodologies to infrastructure and operational systems to reduce toil and increase reliability
- Support day-to-day operations of public-facing web and mobile applications, including uptime management, vulnerability assessment, patching, and lifecycle maintenance
- Lead and support the transition toward container-based infrastructure and modern orchestration platforms
- Support strategic expansion of services into cloud environments where appropriate, balancing cost, security, compliance, and performance
- Participate in security reviews and implement required mitigations in alignment with institutional and regulatory standards
- Design and support highly available cloud-native application deployments
- Implement and maintain comprehensive monitoring and observability frameworks
- Develop and test disaster recovery and business continuity strategies
- Participate in an on-call rotation (approximately one weekend per month) to support critical services
Requirements:
- Bachelor's degree in Computer Science, Information Technology, Business Administration, or related field or equivalent combination of education/experience
- 4+ years of professional experience systems engineering, infrastructure operations, DevOps, Site Reliability Engineering, or related field must include the below:
- Supporting Linux-based server environments (e.g., RHEL or similar distributions)
- Working within at least one major cloud environment (AWS, Azure, or Google Cloud)
- Building, deploying, or maintaining containerized applications
- Using version control systems (e.g., Git)
- Supporting web application infrastructure (e.g., web servers, application stacks)
- Supporting authentication or single sign-on (SSO) solutions
- Supporting network or firewall configurations
- Participating in production support and incident response activities
- Ability to troubleshoot infrastructure and application issues in production environments
- Ability to communicate effectively with technical and non-technical stakeholders
- Ability to manage multiple priorities in a team-based operational environment