Architect and Design Enterprise OpenShift Solutions: Lead the high-level design (HLD) and low-level design (LLD) for multi-tenant Red Hat OpenShift and Kubernetes clusters across on-prem and hybrid cloud environments.
Define the technology stack, standards, and blueprints for deploying AI solutions across global, multi-region public clouds (AWS/Azure/GCP) and diverse on-premise hardware.
Oversee the successful end-to-end rollout of critical services including AI SOC, OpenShift AI, and AI-based Cybersecurity Log Optimization.
Drive Network DevOps Strategy: Define and standardize the automation roadmap using Ansible, Terraform, and Python to achieve "Zero-Touch" infrastructure provisioning and configuration.
Lead Customer & Stakeholder Engagement: Act as the primary technical consultant for global clients, leading design workshops, architecture validation, and executive-level technical reviews.
Integrate Advanced AI apps, Networking & Security: Collaborate with Pre-sales, AI application developers & Engineers, Firewall Architects to design secure AI agents & use cases, container networking (CNI) models, implementing Zero-Trust security, service mesh (Istio), and micro-segmentation within OpenShift environment.
Optimize Hybrid Infrastructure: Oversee the seamless integration of OpenShift with physical networking (Cisco ACI, VXLAN) and virtualized platforms (RHEL-V, VMware ESXi).
GPU & Hardware Orchestration: Design and manage hardware acceleration using the NVIDIA GPU Operator and Node Feature Discovery (NFD). Implement Multi-Instance GPU (MIG) and time-slicing to optimize resource utilization across multi-tenant clusters.
Establish CI/CD Governance: Architect robust CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions) for infrastructure and application delivery, ensuring compliance and security are baked into the workflow.
Lead Observability & Reliability: Design comprehensive monitoring and logging architectures using Prometheus, Grafana, and ELK stack to ensure 99.99% availability of cluster services.
Mentorship & Technical Leadership: Guide and mentor L2/L3 engineers, providing expert-level escalation support and establishing best practices for the DevOps and Network teams.
Innovation & R&D: Evaluate and introduce emerging technologies such as Advanced Cluster Management (ACM), Advanced Cluster Security (ACS), and Cloud-Native Networking (OVN-Kubernetes).
Requirements
Bachelor’s or master’s degree in computer science, Engineering, or a related field
10+ years of progressive experience in systems architecture, infrastructure engineering, or network DevOps.
Expert/Architect-level proficiency in OpenShift (RHOS), and Kubernetes architecture in large-scale production environments.
Experience architecting hybrid-cloud OpenShift solutions involving AWS (ROSA), Azure (ARO), or Google Cloud.
Proven track record in Automation & IaC: Mastery of Ansible, Terraform, and Git-based workflows to manage complex infrastructures.
Deep understanding of Linux (RHEL) Internals: Mastery of kernel networking, storage drivers (CSI), and container runtimes (CRI-O).
Strong Network Background: In-depth knowledge of BGP, VXLAN, and EVPN as they apply to connecting containerized workloads to physical data center fabrics.
Experience in "Architecture as Code": Ability to develop and maintain compliance artifacts, design validation reports, and automated documentation.
Excellent Leadership Skills: Demonstrated ability to manage high-stakes customer relationships and lead cross-functional technical teams