Centene Corporation is a diversified national organization focused on improving health outcomes through technology. They are seeking a Senior Site Reliability Engineer to lead projects that enhance platform infrastructure performance and reliability, utilizing SRE practices and observability tools.

Responsibilities:

Assists application development teams create a Disaster Recovery playbook
Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents
Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility
Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools
Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization
Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time
Leads lower level Engineers in stress, security, and performance testing
Resolves issues that come up through support escalation
Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise
Leads post incident reviews and documents findings for future informed decision making
Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
Performs other duties as assigned
Complies with all policies and standards

Requirements:

A Bachelor's degree in a quantitative or business field (e.g., statistics, mathematics, engineering, computer science)
4 – 6 years of related experience or equivalent experience acquired through accomplishments of applicable knowledge, duties, scope and skill reflective of the level of this position
Assists application development teams create a Disaster Recovery playbook
Troubleshoots and resolves more complex problems with systems and services and initiates regular deployment of new versions of the systems and their subcomponents
Leads more complex projects focused on building and maintaining observability/monitoring for the application, monitoring key performance indicators, maintaining alerting, and continuously improving visibility
Helps make decisions around periodic system validation and testing, service monitoring, and standing up new services/tools
Uses knowledge and experience to identify strategies that increase system reliability and performance through on-call rotation and process optimization
Identifies and implements necessary manual and automated procedures for improved collaborative response in real-time
Leads lower level Engineers in stress, security, and performance testing
Resolves issues that come up through support escalation
Keeps documentation and runbooks up to date to effectively deal with new incidents that might arise
Leads post incident reviews and documents findings for future informed decision making
Reviews proposals to optimize Software Development Life Cycle (SDLC) to boost service reliability and makes decisions around which proposals should move forward
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
Complies with all policies and standards
Disaster Recovery
AWS
SQL
MongoDB

Senior Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: