WEKA is a pre-IPO, growth-stage company focused on transforming enterprise data infrastructure with AI-native solutions. The Senior Designated Services Engineer will ensure customer success by providing technical expertise, troubleshooting issues, and strengthening relationships with top-tier customers.
Responsibilities:
- Serve as the primary technical liaison between customers and Engineering to address feature gaps, reliability concerns, and documentation improvements
- Troubleshoot and resolve technical issues, escalating to Engineering when necessary
- Provide feedback to Engineering to enhance product supportability, usability, and serviceability
- Support pre-sales engineers, partners, and resellers with technical expertise
- Proactively monitor WEKA systems using remote monitoring tools to identify and address potential issues
- Own, track, and document customer issues via the ticketing system
- Communicate clearly and professionally with customers, partners, and internal teams
- Contribute to knowledge-sharing through internal and customer-facing documentation (FAQs, KB articles)
- Manage multiple projects and support cases concurrently
- Advocate for customer concerns and represent WEKA externally
- Develop subject matter expertise in WEKA and customer technologies
- Participate in on-call, follow-the-sun support rotations as needed
- Availability for alternative work hours (nights, weekends, holidays) and potential regional/international travel
Requirements:
- 10+ years in customer-facing technical roles, solving complex enterprise infrastructure issues
- Ability to diagnose hardware failures, network congestion, and performance bottlenecks
- Enterprise infrastructure L3 or higher support experience (Linux-based storage, networking, virtualization, cloud, etc.)
- Strong technical troubleshooting in multi-platform, distributed environments
- Strong understanding of distributed storage systems
- Expertise in Linux/Unix administration
- Deep understanding of networking (Infiniband, Ethernet, DPDK, UCX), cloud computing, and distributed storage
- Proficiency in Python, Bash, and experience with automation scripting for system monitoring and troubleshooting
- Knowledge of POSIX, NFS, S3 protocols, log management, and monitoring tools (Prometheus, Grafana)
- Experience with JIRA, Confluence, Slack, and other collaboration tools
- Experience collaborating between customer support and product development teams
- Familiarity with Kubernetes, Containers, LXC, and cloud platforms (AWS, Azure, OCI, GCP)
- Prior experience managing large-scale HPC clusters
- Strong technical writing skills and a creative approach to problem-solving