Crusoe is on a mission to accelerate the abundance of energy and intelligence through sustainable technology. As a Senior Cloud Support Engineer, you'll empower customers to leverage Crusoe Cloud's low-cost GPU compute power, providing exceptional technical support and ensuring seamless utilization of the technology for groundbreaking advancements.

Responsibilities:

Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)

Requirements:

Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
Strong command-line interface (CLI) skills in Linux environments
Proficiency with Git for code management and collaboration
5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
Excellent communication and customer service skills, including the ability to prioritize competing escalations
Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
CKA, CKAD, CKS, KCNA, AWS Machine Learning - Specialty, Data Analytics - Specialty, Solutions Architect - Professional, Developer - Associate, NVIDIA AI Infrastructure and Operations, Generative AI and LLMs, Generative AI Multi-modal, Infiniband, Linux Foundation IT Associate, System Administrator
Deep understanding of specific cloud platforms and services
Experience with automation tools and scripting languages
Demonstrated ability to analyze complex technical issues and develop effective solutions
Proven ability to mentor, train, and onboard colleagues
A strong interest in contributing to a more sustainable future through technology

Senior Cloud Support Engineer

Key skills

About this role

Responsibilities:

Requirements: