Developing and updating architectural framework for highly complex and confidential university-wide IT systems
Analyzing, troubleshooting and testing highly complex systems
Analyzing operational requirements to implement plans for network and internet presence
Analyzing, recommending and designing internal network solutions to meet client needs
Serving as an expert resource to a group of professionals in the speciality
Requirements
Bachelor’s degree in Computer Science, Software Engineering, or an equivalent combination of education and experience
Six to Seven years of progressively responsible experience in systems architecture, infrastructure engineering, or platform operations
Extensive experience designing, implementing, and supporting Linux-based production environments
Strong experience with on-premises infrastructure and open-source virtualization technologies (e.g., KVM, libvirt, Proxmox, OpenStack, or equivalent)
Proven ability to architect, maintain, and secure scalable, highly available infrastructure supporting enterprise, research, or service-oriented applications
Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes), preferably in on-premises environments
Strong knowledge of infrastructure automation, configuration management, and Infrastructure as Code tools (e.g., Ansible, Terraform, Chef, or equivalent)
Experience building or supporting CI/CD pipelines and operational automation tools
Experience supporting databases, backend services, and search platforms in a production environment; familiarity with technologies such as PostgreSQL, MySQL, Solr, and MarkLogic is an asset
Experience with monitoring, logging, and observability tools to support performance, reliability, and incident response
Experience with identity and access management systems (e.g., Shibboleth, LDAP, Active Directory, SAML, OAuth2, OpenID Connect)
Strong knowledge of networking and core systems administration concepts, including DNS, load balancing, TLS certificates, firewalls, storage, and backup and recovery practices
Demonstrated ability to troubleshoot complex systems, improve service reliability, and support resilient distributed environments
Excellent communication and interpersonal skills, with the ability to work effectively with technical and non-technical colleagues
Demonstrated ability to provide technical leadership, mentor team members, and foster knowledge sharing within a systems team
Experience working in a collaborative environment with shared responsibility for service reliability, security, and continuous improvement
Knowledge of academic library systems, digital scholarship infrastructure, or research support environments is considered an asset
Relevant certifications or professional development in Linux systems administration, infrastructure automation, security, or systems architecture are considered an asset.