Proactively troubleshoot, detect, document, and resolve issues for private cloud deployments based on OpenStack, Kubernetes, and other cloud technologies.
Reproduce customer issues in a lab, confirm bug reports, provide detailed information to the development team.
Work closely with development teams: discuss customer issues, suggest improvements, fix product bugs, etc.
Provide engineering support for product issue escalations.
Take ownership of escalated critical customer issues and participate in troubleshooting sessions as needed.
Requirements
Bachelor degree or equivalent experience.
5+ years systems or operations engineering.
Expert Linux system administration and troubleshooting skills.
System performance profiling abilities (e.g. CPU, memory, disk speed, etc).
Deep understanding of highly available cloud based infrastructure.
Ability to identify, document, and articulate software defects to multiple stakeholders (management, tenants, developers, etc).
Expert understanding of networking concepts and protocols.
A working knowledge of Kubernetes operations and the ability to troubleshoot Kubernetes environments.
Experience with shell scripting and automation.
Knowledge of virtualization solutions (libvirt, KVM, VMWare).
Knowledge of distributed storage solutions.
Ability to read and understand Python and python logs.
Experience using monitoring and log aggregation software to aid in troubleshooting (e.g. Grafana, Kibana, Nagios, Prometheus, etc).
Experience making monitoring software recommendations. configuring, customizing, and extending monitoring tools.
Experience working configuration management tools (Puppet, Chef, Salt, Ansible, Helm).
Good understanding of CI/CD workflow (e.g. git, jenkins, jfrog, artifactory, etc).
Tech Stack
Ansible
Cassandra
Chef
Cloud
Grafana
Jenkins
Kubernetes
Linux
MySQL
OpenStack
Postgres
Prometheus
Puppet
Python
RabbitMQ
SaltStack
Shell Scripting
VMware
Benefits
Work with an established Silicon Valley leader in the cloud infrastructure industry;
Work with exceptionally passionate, talented and engaging colleagues, helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies;
Be a part of cutting-edge, open-source innovation;
Thrive in the high-energy environment of a young company where openness, collaboration, risk-taking, and continuous growth are valued;
Internship provides you with the opportunity to combine work and education;
Professional development and training;
Attend conferences and working groups;
Modern bright office, centrally located and close to public transportation;
Customized workstation (macOS, Windows, Linux);
Company outings, happy hours, hackathons, and tech talks;
Receive a competitive compensation package with a strong benefits plan.