General Motors is seeking an Engineering Manager for their AI Cloud and Developer Infrastructure organization, which focuses on enhancing development tools for engineers. The role involves leading a team, defining technical strategies for observability, and ensuring comprehensive visibility across GM’s AV software stack.

Responsibilities:

Manage and grow a team of engineers, conducting performance reviews, providing coaching, and supporting career development
Define and execute the technical vision and roadmap for the observability platform, ensuring it provides actionable insights into complex systems
Provide technical guidance on instrumentation, logging, metrics, and tracing to ensure comprehensive visibility across GM’s AV software stack
Ensure the team's tools enable rapid detection, debugging, and resolution of unknown or unforeseen system failures to minimize downtime
Work with other engineering teams—such as those developing AI/ML, firmware, and infrastructure—to implement observability practices and improve system reliability
Lead the development of internal tools and data pipelines to collect, analyze, and visualize telemetry data at a massive scale
Manage relationships and costs associated with third-party observability software and platforms

Requirements:

Leadership experience: 5+ years of experience leading software or site reliability engineering (SRE) teams and balancing the tradeoff between velocity and reliability
Bachelors Degree in Computer Science or related field or equivalent work experience
Observability expertise: Deep understanding of core observability pillars: logs, metrics, and traces. Experience with technologies like Prometheus, Grafana, OpenTelemetry, and log management systems is crucial
Software architecture: Strong background in designing, developing, and architecting distributed systems, cloud-native applications, and microservices
Programming proficiency: Familiarity with Go, Python, Typescript or similar along with software development practices to inform code reviews and architectural decisions
Cloud infrastructure: Experience with modern cloud offerings like GCP, AWS, or Azure and technologies like CI/CD pipelines, Kubernetes, and Docker
Communication skills: Excellent interpersonal and communication skills to collaborate effectively with diverse teams and stakeholders
Management experience: 3+ years of experience managing software engineering or site reliability engineering (SRE) teams
Experience working with GCP, AWS, or Azure
Familiarity with Kubernetes, Docker, Istio, Terraform, Prometheus, Grafana, TSDBs and observability pipelines (e.g. either for logging or metrics or tracing)
Skilled in defining and instrumenting SLIs and SLOs
Own or contribute to Open Source projects
Passion for self-driving technology and its potential impact on the world

Engineering Manager, Observability

Key skills

About this role

Responsibilities:

Requirements: