Crusoe is an innovative AI infrastructure company dedicated to revolutionizing energy and computing solutions. As a Senior Cloud Support Engineer, you will provide exceptional technical support to customers, ensuring they can effectively utilize Crusoe Cloud's GPU computing technology for advancements in AI and other fields. This role involves troubleshooting, collaboration with various teams, and contributing to customer success.
Responsibilities:
- Provide exceptional technical support to customers via Zendesk, meeting SLAs and maintaining high CSAT (95%+)
- Participate in a 24/7 on-call rotation to ensure timely resolution of critical issues
- Diagnose and resolve issues related to VMs, hardware failures, and scaling tests using CLI and internal tools
- Manage alert triage, prepare for maintenance windows, and conduct node delivery testing
- Work closely with SRE, Networking, and Storage teams from initial triage to root cause analysis (RCA) delivery
- Adhere to global team collaboration and handoff processes for ticketing and on-call procedures
- Develop onboarding/training materials, knowledge base documentation, and standard operating procedures (SOPs)
Requirements:
- Bachelor's degree in IT, Computer Science, Engineering, or a related field, or 4+ years of equivalent technical experience
- Strong command-line interface (CLI) skills in Linux environments
- Proficiency with Git for code management and collaboration
- 5+ years of experience in a customer support role, ideally within cloud, storage, or networking environments
- Experience with container orchestration (e.g., Kubernetes), workload management (e.g., Slurm, Terraform), and monitoring tools (e.g., Grafana)
- Familiarity with other public cloud platforms (e.g., AWS, Azure, GCP)
- Excellent communication and customer service skills, including the ability to prioritize competing escalations
- Understanding of HPC technologies such as Infiniband, RDMA, RoCE, and Software Defined Networking (SDN)
- CKA, CKAD, CKS, KCNA, AWS Machine Learning - Specialty, Data Analytics - Specialty, Solutions Architect - Professional, Developer - Associate, NVIDIA AI Infrastructure and Operations, Generative AI and LLMs, Generative AI Multi-modal, Infiniband, Linux Foundation IT Associate, System Administrator
- Deep understanding of specific cloud platforms and services
- Experience with automation tools and scripting languages
- Demonstrated ability to analyze complex technical issues and develop effective solutions
- Proven ability to mentor, train, and onboard colleagues
- A strong interest in contributing to a more sustainable future through technology