Beacon Biosignals is on a mission to revolutionize precision medicine for the brain, and they are seeking a skilled Site Reliability Engineer to join their Platform team. In this role, you will ensure the reliability, availability, and security of the cloud infrastructure that supports large-scale machine learning on biosignal data.
Responsibilities:
- Design and implement infrastructure as code solutions that improve reliability, security, and maintainability of our cloud infrastructure
- Lead and execute major infrastructure initiatives including cluster upgrades, security improvements, and architectural changes
- Develop and maintain CI/CD pipelines that enable teams to deploy safely and efficiently
- Improve observability across our systems through enhanced monitoring, logging, and alerting
- Participate in an on-call rotation and lead incident response efforts when issues arise
- Collaborate with development teams to improve application reliability and performance
- Maintain and enhance our security posture through infrastructure hardening and automation
- Create and maintain documentation for infrastructure, deployment processes, and incident response procedures
Requirements:
- Strong experience with Kubernetes administration, including cluster management, security, and troubleshooting
- Proven track record implementing infrastructure as code using Terraform or similar tools
- Experience building and maintaining CI/CD pipelines, particularly with GitHub Actions, Azure DevOps, or ArgoCD
- Solid understanding of container technologies and build processes, especially Docker
- Strong cloud provider (e.g. AWS) knowledge including networking, security, and infrastructure services
- Experience with incident response and on-call responsibilities in a production environment
- Deep experience with Linux systems administration and debugging
- Proficiency in at least one programming language (Python, Go, Typescript etc.)
- Understanding of security and networking concepts including OAuth2/OIDC, DNS, TLS, TCP/UDP, etc
- Approximate experience: Bachelor's degree + 5-8 years of experience in SRE, DevOps, or other similar professional experience
- Experience with Azure is a plus
- Familiarity with Windows Server environments is a plus