Own the end-to-end architecture and delivery for telemetry solutions, including fleet health monitoring, fault remediation, and data visualization at scale
Own OOB telemetry solution and data validation for telemetry from each underlying device
Recruit, develop, and motivate a high-performing engineering team focused on platform telemetry, RAS and observability
Continuously improve software development processes for optimal productivity and quality
Work across teams to ensure seamless integration of telemetry solutions with platform firmware, server architecture, and data center management
Drive product life cycles with QA teams, ensuring robust testing, productization, and delivery
Conduct performance reviews, foster a culture of excellence, and ensure high productivity
Requirements
12+ overall years of relevant experience
5+ years of managing systems/platform software teams
BS, MS, or PhD in EE/CS or related field (or equivalent experience)
Strong knowledge of DMTF/PLDM for OOB telemetry collection
Time series databases (e.g., InfluxDB, Prometheus) and REST APIs (Redfish)
Deep understanding of Server and firmware architecture and optimization for low-latency APIs
Proven track record of delivering scalable server products and telemetry solutions
Experience with SCM (Git, Perforce) and project management tools (Jira)
Hands-on experience with x86/ARM system architecture and coding (C/C++, Python)
Familiarity with Confidential Compute and notification systems
Demonstrated ability to analyze algorithms for time/space complexity and system resource requirements