BayOne Solutions is seeking a Site Reliability Engineer (Datacenter) to enhance the performance and reliability of their systems. The role involves monitoring data, implementing solutions for system-level improvements, and participating in on-call rotations to address performance issues.
Responsibilities:
- Data monitoring and alerting, data quality assurance and anomaly detection
- Document team processes and policies, including methods of engagement and SLOs
- Analyze, design, and implement solutions at the system level to remove bottlenecks and improve edge service performance
- Implement monitoring and alerting to improve issue detection and response
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues
- Participate in on-call rotations, responsible for resolving or escalating incoming events
- Maintain and operate a Linux and Kubernetes environment
Requirements:
- 3+ years experience working with Unix Linux systems from kernel to shell and beyond
- Experience working with system libraries, file systems, and client-server protocols
- Experience reading python scripts for platform operations
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment
- Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc
- Bachelor's degree or above, majoring in Computer Science or related fields, with at least 2 years of related work experience