Troubleshoot private cloud deployments based on OpenStack, Kubernetes, Mirantis Container Runtime (MCR), and related cloud technologies, detecting, reporting, and resolving complex issues across the stack.
Provide high-tier support for critical product issues escalated by peers or management, including leading high-severity incident calls and coordinating resolution efforts across multiple teams.
Act as a shift-level technical leader, maintaining awareness of platform health, responding to alerts, and proactively managing emerging issues within customer environments.
Perform cluster upgrades and lifecycle operations as new releases become available, ensuring minimal disruption and clear communication throughout the process.
Communicate urgently, clearly, and in detail with customers during incidents, providing accurate status updates and guiding them through troubleshooting and resolution.
Own escalations end-to-end by routing issues to the appropriate teams, including OpenStack, Ceph and storage, networking, hardware, and infrastructure, while maintaining accountability and follow-through.
Lead incident management efforts by structuring outage calls, distinguishing root cause from contributing factors, documenting findings, and driving corrective actions to completion.
Reproduce customer issues in internal lab environments, validate defects, and provide detailed diagnostics and reproduction steps to development teams.
Work closely with engineering teams to review customer issues, suggest improvements, identify potential product defects, and track fixes through delivery.
Requirements
High school diploma or equivalent required; four-year degree preferred or equivalent experience with three or more years in a senior or Tier 3 systems administration role.
Strong experience with OpenStack core services including Nova, Neutron, Cinder, Keystone, and Glance.
Hands-on knowledge of Neutron networking, including OVS and OVN, VLAN and VXLAN, and SDN concepts.
Experience with Cinder storage backends, including Ceph RBD and LVM.
Strong Ceph experience, including troubleshooting cluster health, OSD and MON issues, performance tuning, and recovery operations.
Working knowledge of Kubernetes architecture and troubleshooting containerized workloads.
Advanced Linux system administration skills.
Strong understanding of networking fundamentals.
Excellent written and verbal English communication skills.
Tech Stack
Cloud
Kubernetes
Linux
OpenStack
Benefits
Work with an established Silicon Valley leader in the cloud infrastructure industry.
Work with exceptionally passionate, talented and engaging colleagues, helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies.
Be a part of cutting-edge, open-source innovation.
Thrive in the high-energy environment of a young company where openness, collaboration, risk-taking, and continuous growth are valued.
Receive a competitive compensation package with strong benefits plan.