SS&C Technologies is a leading financial services and healthcare technology company headquartered in Windsor, Connecticut. They are seeking a Senior Site Reliability Engineer to join their Internal Platform Services team, responsible for ensuring the reliability, scalability, and performance of core services that power their internal engineering ecosystem.
Responsibilities:
- Ensure reliability, scalability, and performance of services through SLIs/SLOs, capacity planning, and incident response
- Drive automation of infrastructure operations to minimize toil
- Develop and support monitoring, alerting, and observability systems to support proactive issue detection
- Partner with internal engineering teams to define service-level objectives, improve deployment workflows, and integrate infrastructure with development needs
- Contribute to on-call rotations and incident management, helping ensure high availability of services
- Drive post-incident reviews and blameless retrospectives to improve reliability
- Stay current with emerging technologies and recommend improvements to existing systems and practices
Requirements:
- 3+ years of experience as an SRE, DevOps Engineer, or Infrastructure Engineer
- Solid experience with Kubernetes administration and tooling (e.g., Helm, ArgoCD, Kustomize)
- Strong expertise in cloud platforms (e.g., AWS, GCP, or Azure)
- Experience managing databases in production environments (e.g., backups, replication, tuning)
- Proficiency in programming or scripting (e.g., Go, Python, Bash)
- Deep understanding of CI/CD pipelines and infrastructure automation
- Familiarity with monitoring/observability tools (e.g., Prometheus, Grafana)
- Strong communication skills and ability to collaborate with software engineering teams
- Experience in multi-tenant infrastructure environments
- Exposure to compliance and security best practices in infrastructure environments