Sectigo is the most innovative provider of certificate lifecycle management, delivering comprehensive solutions that secure identities for large brands. The Site Reliability Engineer will design and implement solutions to enhance the reliability of critical services at Sectigo.

Responsibilities:

Ensure the reliability of our critical products and services by meeting or exceeding SRE objectives
Instantiate and maintain production infrastructure using Infrastructure as Code and Configuration Management tools
Build and maintain proper monitoring of our services by utilizing centralized logging and time series databases
Automate deployments, administration, and monitoring of our services by following CI/CD practices
Work with engineering and information security teams to enhance, document, establish processes and generally improve the operability and security of our services
Participation in team on-call rotation is required
Additional tasks associated with this position may be assigned in response to company initiatives and business needs

Requirements:

Bachelor's degree in information systems, computer science, technology, or a related field is strongly preferred. In lieu of degree, 2+ years of relevant and/or equivalent experience is acceptable
Minimum of 3+ years of software and/or operational experience in building and maintaining internet-facing production environments is required
Strong experience with Linux/Unix systems administration
Knowledge of source control tools (Git preferred)
Experience with Configuration Management and Infrastructure as Code tools (Ansible, Puppet, Terraform preferred)
Good understanding of container technology (Docker, Kubernetes preferred)
Experience with monitoring tools (Prometheus, Grafana, Nagios, or similar.) and alerting systems
Experience with non-cloud infrastructure
Experience running a large-scale 24/7 production environment
Experience with distributed data processing, databases, and large-scale file systems is a plus
Strong scripting abilities in Bash and Python
Experience with incident management, troubleshooting, and root cause analysis
Experience in handling postmortems, building incident response plans, and improving incident resolution procedures
Experience running and maintaining real-world build systems (Jenkins, DroneCI, or similar tools)
Demonstrable experience with the entire life cycle of software, starting with Systems Architecture, Systems Design, Implementation, Maintenance, and Operation
Programming experience using HTTP Service APIs
Virtualization experience (VMWare, Proxmox, Oracle Linux Virtualization Manager)
Network administration experience is a plus
Exposure to Security and Testing frameworks is a plus
Exposure to compliant regulated industries such as Finance, Healthcare, or Government is a plus

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: