Home
Jobs
Saved
Resumes
Staff Reliability Engineer at MSD | JobVerse
JobVerse
Home
Jobs
Recruiters
Companies
Pricing
Blog
Jobs
/
Staff Reliability Engineer
MSD
Website
LinkedIn
Staff Reliability Engineer
Czechia
Full Time
2 hours ago
Visa Sponsorship
Apply Now
Key skills
Cloud
ITSM
SDLC
AI
Performance Optimization
CI/CD
Collaboration
About this role
Role Overview
Partner with application and platform teams to embed reliability into system design, development, and operations
Support implementation and operationalization of Service Level Objectives and reliability indicators
Contribute to improving observability coverage across logs, metrics, traces, and events
Apply reliability patterns such as fault isolation, failover, and recovery mechanisms in collaboration with engineering teams
Participate in and support improvements to the incident lifecycle, including detection, response, root cause analysis, and follow-up actions
Assist in identifying reliability risks and performance bottlenecks and contribute to remediation efforts
Support continuous improvement initiatives focused on reducing incident volume and improving system stability
Apply established enterprise standards for observability, resilience engineering, and Service Level Objectives
Support adoption of reliability practices across teams through hands-on guidance and collaboration
Contribute feedback to help evolve reliability frameworks and tooling
Develop and enhance automation for incident response, monitoring, and operational workflows
Leverage existing platforms (e.g., observability tools, incident management systems) to improve efficiency and visibility
Utilize AI-enabled capabilities where appropriate to support diagnostics and operational workflows under defined governance
Work closely with product, platform, and ITSM teams to align on reliability improvements
Participate in cross-team initiatives focused on improving system resilience and operational maturity
Contribute to knowledge sharing within the reliability engineering community
Requirements
Experience in one or more of the following: system integration, software development, system administration, or operations engineering
Familiarity with software development life cycle (SDLC) and production support models
Understanding of monitoring, observability, and performance optimization concepts
Experience supporting applications in cloud and/or on-premises environments
Working knowledge of CI/CD pipelines and deployment practices
Basic understanding of incident management and root cause analysis processes
Knowledge of system reliability principles, including availability and performance engineering
Strong problem-solving skills with a focus on continuous improvement
Ability to collaborate effectively across engineering and operations teams
Tech Stack
Cloud
ITSM
SDLC
Benefits
Flexible Work Arrangements
Apply Now
Home
Jobs
Saved
Resumes