Together AI is a research-driven artificial intelligence company, and they are seeking a Customer Support Engineer to support customers in building training, fine-tuning, and inference solutions. The role involves resolving complex technical challenges and collaborating with various teams to ensure customer satisfaction.
Responsibilities:
- Engage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge Kubernetes GPU clusters; ensure swift and effective solutions every time
- Become a product expert in our GPU Cluster service, serving as the last line of technical defense before issues are escalated to Engineering and Product teams
- Collaborate seamlessly across Engineering, Research, and Product teams to address customer concerns; collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction
- Transform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)
- Maintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers
- Be flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers
Requirements:
- 3+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI or supporting a mission-critical API in SaaS
- Strong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments
- Familiarity with infrastructure services (e.g., Kubernetes, SLURM), infrastructure as code solutions (e.g., Ansible) high-performance network fabrics, NFS-based storage management, container infrastructure, and scripting and programming languages
- Foundational understanding in the installation, configuration, administration, troubleshooting, and securing of compute clusters
- Complex technical problem solving and troubleshooting, with a proactive approach to issue resolution
- Ability to work cross-functionally with teams such as Sales, Engineering, Support, Product and Research to drive customer success
- Strong sense of ownership and willingness to learn new skills to ensure both team and customer success
- Excellent communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders
- Ability to operate in dynamic environments, adept at managing multiple projects, and comfortable with frequent context switching and prioritization