Support the infrastructure and operational environment that underpins Scholars Portal applications and services
Contribute to the administration, monitoring, maintenance, and enhancement of primarily on-premises systems and platforms
Focus on service reliability, security, performance, and operational sustainability
Work closely with Systems team members, developers, and other colleagues on server provisioning, configuration management, deployment support, patching, monitoring, backup and recovery, and disaster preparedness
Help maintain and improve robust, high-availability services that respond to evolving technical and community needs
Requirements
Bachelor’s degree in Computer Science or an equivalent combination of education and experience
Minimum five (5) years of relevant experience in systems administration, infrastructure operations, or platform support
Extensive experience administering and supporting Linux-based production environments
Experience deploying, maintaining, and troubleshooting highly available, resilient, and secure systems
Experience supporting distributed systems and services in a production environment
Strong knowledge of operating systems administration, networking, storage, and core infrastructure concepts
Experience with configuration management, automation, and infrastructure management tools (e.g., Ansible, Chef, or equivalent)
Experience administering web servers, reverse proxies, and load balancing technologies (e.g., Apache, Nginx, HAProxy, or equivalent)
Experience with containerization and related platform technologies (e.g., Docker, Kubernetes, or equivalent)
Experience supporting virtualization and infrastructure platforms using open-source technologies (e.g., KVM, OpenStack, or equivalent)
Experience supporting databases and backend services in a production environment, including writing and maintaining SQL scripts; familiarity with technologies such as MySQL, PostgreSQL, Oracle, MongoDB, Redis, RabbitMQ, or Elasticsearch is an asset
Experience with monitoring, logging, and operational support tools used to maintain service reliability and performance
Experience with scripting and automation using languages such as Python, Bash, Go, or equivalent
Experience with identity and access management systems such as Shibboleth, LDAP, Active Directory, SAML, OAuth2, or OpenID Connect
Familiarity with secure systems administration practices, backup and recovery procedures, patching, and disaster recovery planning
Strong analytical, troubleshooting, and problem-solving skills
Excellent communication and interpersonal skills, with the ability to work effectively with technical and non-technical colleagues
Demonstrated ability to work independently and collaboratively in a team environment, and to contribute to documentation, knowledge sharing, and continuous improvement