SS&C Technologies is a leading financial services and healthcare technology company headquartered in Windsor, Connecticut. They are seeking a highly skilled Site Reliability Engineer to join their Operations team, responsible for ensuring the availability, performance, scalability, and reliability of systems and services.
Responsibilities:
- Maintain and improve the uptime, performance, and availability of production systems
- Define and track SLIs, SLOs, and SLAs to ensure service reliability and user satisfaction
- Implement and manage monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, Datadog, ELK)
- Participate in on-call rotations and respond to incidents, performing root cause analysis and postmortems
- Automate repetitive tasks and processes using scripts, configuration management, and Infrastructure as Code (IaaC)
- Develop CI/CD pipelines to streamline deployment and operational processes
- Analyze system performance and capacity trends to plan for future growth
- Collaborate with engineering teams to design systems that scale reliably
- Support cloud and/or hybrid infrastructure (AWS, Azure, GCP, VMware, etc.)
- Manage system provisioning, configuration, and patching via tools such as Ansible, Terraform, or Puppet
- Act as a bridge between development and operations teams, championing DevOps and SRE principles
- Contribute to a culture of continuous improvement, reliability, and accountability
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
- 3+ years of experience in a Site Reliability, DevOps, or Systems Engineering role
- Experience with Linux/Unix systems, Windows, shell scripting, and administration
- Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
- Hands-on experience with cloud platforms (AWS, Azure, or GCP)
- Strong knowledge of networking, security, load balancing, and DNS
- Experience with monitoring/logging tools (e.g., Prometheus, Grafana, ELK, Splunk, Datadog)
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Familiarity with ITIL processes, incident/change/problem management frameworks
- Exposure to compliance and security standards (e.g., ISO 27001, SOC 2, HIPAA)
- Experience in large-scale distributed systems and microservices architectures