PROS, Inc. is the leading offer management provider to the airline industry, helping airlines deliver seamless retail experiences designed to maximize revenue and margin growth. The Site Reliability Engineer II optimizes service performance, participates in reliability improvements, and conducts in-depth SLO and capacity analysis to enhance system reliability and scalability while contributing to automation and self-service tool development.
Responsibilities:
- Monitor service performance, assist in troubleshooting production issues, and learn system architecture
- Monitor service reliability, participate in resolving basic issues, and learn disaster recovery testing procedures
- Understand SLO concepts, monitor and analyze SLO patterns, and assist in implementing SLO visualization and alerting
- Perform basic capacity analysis, identify trends in system capacity, and participate in capacity planning
- Deploy and maintain existing automation tools, create simple scripts, and troubleshoot automation scripts
Requirements:
- 5+ years of experience in enterprise networking, including hands‑on work with routing, switching, firewalls, load balancers, and VPN technologies
- Strong understanding of cloud networking architectures across including VPC/VNet design, peering, private link, and hybrid connectivity models
- Experience with network security technologies, such as security groups, NACLs, firewall policies, WAF, IDS/IPS, and micro‑segmentation
- Proficiency in Layer 2 and Layer 3 network protocols, including BGP, OSPF, EIGRP, DNS, DHCP, NAT, and IP addressing/subnetting
- Hands‑on experience with load balancers and ingress technologies, including F5, NGINX, Azure Application Gateway, ALB/NLB, or equivalent
- Strong troubleshooting skills using packet analyzers tools, flow logs, and network monitoring platforms
- Skilled in analyzing performance trends and identifies optimization opportunities
- Collaborates with teams to improve monitoring coverage
- Ability to participate in structured reliability testing and analysis
- Able to evaluate system components for resilience
- Contributes to reliability-focused design discussions
- Skilled in analyzing trends to inform service improvements
- Collaborates with teams to align SLOs with user expectations
- Develops moderately complex automation tools
- Skill in building internal self-service capabilities
- Evaluates automation opportunities for operational efficiency
- Skilled in analyzing capacity data to inform scaling decisions
- Able to recommend improvements for resource utilization
- Ensures scalability is considered in feature development
- Follow predefined procedures to deploy PROS products and third-party applications to the Cloud environments
- Contribute to the release management documentation
- Gain understanding of application architecture and interaction between system components
- Bachelor's Degree in Computer Science, Information Technology, or a related field
- Practical experience with Fortigate firewalls and F5 appliances is highly desirable