Synopsys Inc is a leader in chip design and technology innovation, seeking a Principal Site Reliability Engineer to enhance the reliability and performance of their engineering environment. The role involves applying SRE practices, collaborating across teams, and transforming processes into scalable solutions.
Responsibilities:
- Applying SRE practices to identify, monitor, communicate, and resolve issues in the environment, while also collaborating with internal teams and customers on post-mortem analysis to deliver root cause insights
- Following up on issues reported and looking for procedures to prevent similar occurrences
- Reviewing current processes and transforming them into scalable solutions
- Debugging OS and engineering issues within our provided Linux environment
- Collaborating on internal projects across different time zones and teams
- Following up with customers and handing over tasks/issues with team members to utilize time zones efficiently
Requirements:
- 10+ years of SRE processes and related skills required
- Capability to understand complex engineering implementations and their inter dependencies for troubleshooting
- Deep Knowledge with Linux distributions (CentOS, RedHat, Ubuntu, SuSE)
- Deep Knowledge of virtualization and containerization technologies
- Extensive knowledge of storage solutions, including network storage and associated protocols
- Good Experience in network technologies
- Good Experience in load sharing facilities such as LSF, Slurm and various workload scheduling technologies
- Good interpersonal, communication and leadership skills