Build AI-Powered Remediation Systems: Design and implement machine learning models that can identify, diagnose, and automatically resolve system issues detected by the observability platform.
Own the AI/ML Pipeline: Take end-to-end ownership of the AI lifecycle — from data collection and preprocessing to model training, evaluation, and deployment.
Integrate with Observability Stack: Work closely with the core platform team to integrate AI solutions into the existing observability infrastructure (e.g., logs, metrics, traces).
Experiment and Iterate: Rapidly prototype and experiment with different models and approaches (e.g., anomaly detection, root cause analysis, LLM-based insights) to find what works best.
Collaborate Cross-Functionally: Partner with product, backend, and DevOps teams to align AI capabilities with user needs and infrastructure realities.
Set the Technical Direction: As an early technical hire, contribute to foundational architecture decisions and establish best practices for AI/ML within the company.
Ensure Reliability and Scalability: Build systems that perform reliably at scale and integrate safely into production environments.
Stay Ahead of the Curve: Keep up with the latest advancements in AI/ML and observability to help shape the product roadmap.
Requirements
Engineers with experience building AI products (do side-projects count?)
Solid software engineering skills: Proficiency in Python and TypeScript.
Systems knowledge: Understanding of observability tools (e.g., Prometheus, OpenTelemetry).
Owner mindset: Comfortable working in a fast-paced, ambiguous environment with limited structure and high ownership.