Stage 4 Solutions is a global B2B high-tech company seeking a Site Reliability Engineer. The role involves data monitoring, quality assurance, and improving system performance through various technical operations and processes.

Responsibilities:

Data monitoring and alerting, data quality assurance and anomaly detection
Document team processes and policies, including methods of engagement and SLOs
Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance
Implement monitoring and alerting to improve issue detection and response
Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues
Participate in on-call rotations, responsible for resolving or escalating incoming events
Maintain and operate a Linux and Kubernetes environment

Requirements:

3+ years of experience working with Unix and Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols
Experience reading Python scripts for platform operations
Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment
Experience in developing and operating one or more of the following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc
Bachelor's Degree

Site Reliability Engineer (Remote – Culver City, CA, Mountain View, CA, Seattle, WA)

Key skills

About this role

Responsibilities:

Requirements: