Vynca is dedicated to transforming care for individuals with complex needs. They are seeking a Site Reliability Engineer to build and operate the infrastructure for their healthcare technology platform, focusing on reliability, scalability, and security of their systems.

Responsibilities:

Design, provision, and manage AWS infrastructure using Terraform as the source of truth
Operate, maintain, and scale production workloads running on Kubernetes
Package, deploy, and manage applications using Helm and infrastructure automation tools
Build, operate, and improve distributed and event-driven systems, including event sourcing, partitioning, event ordering, replay, and failure recovery mechanisms
Define, monitor, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to balance reliability and engineering velocity
Develop automation for deployment, scaling, monitoring, incident response, and operational workflows to reduce manual effort and improve system resilience
Own platform observability by implementing and maintaining metrics, logging, tracing, monitoring, and alerting solutions
Lead incident response efforts, facilitate blameless postmortems, and drive long-term corrective actions that improve system reliability
Partner with Product and Engineering teams on capacity planning, performance optimization, and resilient system design
Implement and maintain security best practices to support HIPAA, SOC 2, and other compliance requirements
Participate in an on-call rotation and provide operational support for production systems

Requirements:

Three to five (3–5) years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, Cloud Infrastructure Engineering, or similar infrastructure-focused roles, preferably within healthcare, SaaS, or high-growth technology environments
Bachelor's degree in Computer Science, Information Systems, Software Engineering, or a related technical field; equivalent professional experience will also be considered
Strong hands-on experience operating production workloads within AWS environments
Proven experience managing infrastructure as code using Terraform, including module development, state management, and deployment automation
Experience operating and supporting production Kubernetes environments
Hands-on experience deploying and managing applications using Helm
Experience working with distributed systems, event-driven architectures, or event-sourcing platforms, including concepts such as partitioning, event ordering, replay, and fault tolerance
Experience establishing and managing observability practices including monitoring, logging, tracing, alerting, and incident response
Strong understanding of Linux systems administration, networking, cloud architecture, and distributed systems fundamentals
Experience designing, implementing, and maintaining CI/CD pipelines and deployment automation
Strong problem-solving skills with the ability to troubleshoot complex infrastructure and application issues
Excellent written and verbal communication skills with the ability to collaborate effectively across technical and non-technical teams
High level of ownership, accountability, and initiative with a proactive approach to reliability and operational excellence
Ability and willingness to participate in an on-call rotation supporting production systems
Strong programming or scripting experience with Python, Go, or similar languages
Experience with observability platforms such as Prometheus, Grafana, Datadog, CloudWatch, SigNoz, or OpenTelemetry
Experience with GitOps tools such as ArgoCD or Flux
Experience managing databases such as PostgreSQL, MySQL, Redshift, or ClickHouse
Experience implementing secrets management solutions such as AWS Secrets Manager or HashiCorp Vault
Experience supporting healthcare technology platforms or other highly regulated environments
Familiarity with data infrastructure technologies including Snowflake, Redshift, and ETL/ELT pipelines
Experience with database performance tuning and optimization

Site Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: