CareFirst BlueCross BlueShield is a health insurance company seeking a Manager for their Technology Operations Center. This role involves leading a technical team to enhance service reliability and operational automation for cloud-native applications, while managing incident response and observability processes.
Responsibilities:
- Lead the technical and operational execution of the Technology Operations Center supporting cloud-native applications and platforms
- Define TOC architecture, workflows, and tooling strategy aligned to Azure, Dynatrace, ServiceNow, and AI-enabled operations
- Build and lead a technically skilled team of TOC analysts and engineers
- Establish and measure operational KPIs including availability, MTTD, MTTR, error budgets, and alert quality
- Ensure operational excellence for systems supporting member enrollment, claims, provider services, and digital health platforms
- Own and operate the Major Incident Management (MIM) process using ServiceNow
- Act as Incident Commander for high-severity production incidents impacting member- or provider-facing systems
- Coordinate real-time response across application, Azure infrastructure, SRE, security, and vendor teams
- Ensure clear, technically accurate communication to leadership and business stakeholders
- Act as a subject matter expert, contributing to blameless post-incident reviews, root cause analysis, and corrective action tracking
- Own the Event Management lifecycle leveraging Dynatrace for real-time observability
- Design and maintain monitoring standards across: Azure infrastructure (compute, network, storage, cloud platforms and services (PaaS, containers, APIs), and drive alert optimization, noise reduction, and event correlation
- Partner with engineering teams to embed observability best practices (metrics, logs, traces)
- Improve early detection of service degradation impacting health care workflows
- Serve as technical owner for Dynatrace, including: Architecture and deployment strategy, configuration standards and best practices, integration with ServiceNow and other operational tools and leverage Dynatrace AI (Davis) for root cause analysis, anomaly detection, and proactive issue identification
- Drive adoption of Dynatrace across engineering teams to support SRE and DevOps practices
- Manage licensing, capacity planning, and platform optimization
- Ensure ServiceNow effectively supports Incident, Major Incident, Event, and Problem Management processes
- Partner with ITSM, platform, and automation teams to enhance workflows and integrations
- Use data from ServiceNow to drive operational insights and continuous improvement
- Ensure ITIL-aligned processes are optimized for cloud-native and DevOps operating models
- Leverage AI and machine learning capabilities within Dynatrace, ServiceNow, and related platforms to: Improve signal-to-noise ratio, accelerate root cause identification, enable proactive and predictive operations, and identify opportunities for automation and self-healing in incident response
- Drive adoption of AIOps practices aligned to reliability, scalability, and cost efficiency
Requirements:
- Bachelor's Degree in Computer Science, Information Technology or related field OR in lieu of a Bachelor's degree, an additional 4 years of relevant work experience is required in addition to the required work experience
- 5 years Related professional experience
- 1 year Supervisory experience or demonstrated progressive leadership experience
- Experience in technology operations, TOC/NOC, SRE, or cloud operations
- Strong hands-on experience with Major Incident Management in enterprise environments
- Deep experience with monitoring and observability platforms, preferably Dynatrace
- Strong experience with ServiceNow ITSM processes
- Solid understanding of Microsoft Azure infrastructure and cloud-native architectures
- Experience applying ITIL v4 and SRE practices in production environments
- ITIL Foundation or higher certification
- Dynatrace or ServiceNow certifications
- Experience with AIOps, automation, or self-healing systems
- Background in health care, insurance, or other regulated industries
- Strong technical communication skills with the ability to lead under pressure