This individual will be the primary resource for complex issue resolution, serving as the escalation point for the Skill Level 2 technician. Their core responsibilities include:
Triage and Deep Analysis: Leading the initial assessment and root cause analysis for high-priority incidents.
Strategic Planning and Oversight: Contributing to long-term stability planning, process improvement, and comprehensive documentation development.
Knowledge Transfer and Mentorship: Actively mentoring the Skill Level 2 technician and leading scheduled knowledge-sharing sessions.
Usage Monitoring: Observing model consumption patterns to analyze demand sources and trends.
Skillset: Expertise in CATS (internal platform running on Kubernetes/K8s), Python, Rust (infrastructure focus), Cloud platforms (AWS), Git, Next.js, and excellent oral and written communication skills.