Procom is seeking a Principal Site Reliability Engineer for a 12 month remote contract role. This role involves ensuring the availability, reliability, and performance of the Red Hat Demo Platform’s AI services and OpenShift Virtualization infrastructure.
Responsibilities:
- Design, develop, and implement robust IT infrastructure solutions
- Architect and deploy Model-as-a-Service platforms
- Implement automation and DevOps processes for cloud lifecycle optimization
- Perform architectural planning and management of OpenShift Container Platform environments
- Architect virtualization solutions and design advanced network architectures
- Develop storage strategies and oversee bare-metal infrastructure administration
- Drive automation initiatives using Ansible and Red Hat Advanced Cluster Manager
- Establish and optimize CI/CD pipelines and provide technical mentorship
- Work cross-functionally to gather requirements and create architectural documentation
Requirements:
- 8+ years of experience in IT architecture with a focus on infrastructure design
- 5+ years of experience with Public Cloud, Virtualization, and Linux technologies
- 5+ years of experience with Red Hat OpenShift or Kubernetes
- 3+ years of experience with automation frameworks like Ansible or Terraform
- Hands-on experience with bare-metal administration
- Practical experience with CI/CD methodologies
- Excellent interpersonal and presentation skills
- Experience with OpenShift AI and inference systems
- Proven experience with enterprise-grade Software-Defined Storage
- Deep knowledge of SDN principles and advanced networking
- Extensive experience with Red Hat Enterprise Linux administration
- Proficiency in Python or Go for building infrastructure components