Cleerly is a healthcare company revolutionizing heart disease diagnosis and treatment. They are seeking a highly skilled Site Reliability Engineer to ensure the health and integrity of their systems, focusing on cloud infrastructure and system reliability within AWS.

Responsibilities:

Cloud Environment Buildout: Stand up and harden the new Hub cloud environment and deployment pipeline, ensuring reliability, security, and repeatability
Infrastructure Management: Design, develop, and manage cloud infrastructure using AWS services, Terraform (Infrastructure as Code), and Docker containers
System Integrity: Use strong system administration and network engineering skills to ensure the reliability, scalability, and performance of all platform systems
Own Observability & Incidents: Own observability and incident readiness end-to-end, including third-party connectivity patterns, runtime guardrails, and defining upgrade strategies (canary/rollback). This ensures the platform can scale safely as new AI integrations are added
Drive DevOps Automation: Implement DevOps methodologies and tools, facilitating Continuous Integration (CI), Continuous Delivery (CD), and the automation of infrastructure management tasks
Reduce Toil: Develop and maintain automation tools to proactively reduce manual operational tasks (toil)
Security Maintenance: Ensure system and network security is always maintained by implementing and enforcing appropriate security measures across the platform

Requirements:

6–10+ years of professional experience running and managing production services on AWS
Deep understanding of core AWS fundamentals, including VPC networking, IAM, KMS, security groups, and routing
Expertise with Infrastructure-as-Code (Terraform, CDK, or CloudFormation) and reliable environment replication
Experience operating and managing container platforms (EKS/ECS) and/or scalable managed services
Proven ability to design and automate comprehensive CI/CD pipelines (builds, tests, deploys, and rollbacks)
Deep knowledge of metrics, logs, and traces, along with setting SLOs, configuring robust alerting, and managing structured incident response processes
Practical High Availability (HA) / Disaster Recovery (DR) thinking, including backup strategies, multi-AZ patterns, and conducting failure drills
Strong security-by-default posture, including expertise in secrets handling, key rotation, and the principle of least privilege
Acute performance and cost awareness, including effective use of tagging, budgeting, right-sizing, and autoscaling
Proven ability to partner with engineering and security teams to achieve rapid deployment goals without compromising system reliability
Expertise in the Software Development Life Cycle (SDLC) specifically for software medical devices (SaMD)
Deep experience operating in regulated environments, managing audit logs, strict change control, and comprehensive evidence collection
Working knowledge of essential medical imaging standards, including DICOM and HL7
Proven experience developing comprehensive cybersecurity measures and implementing robust data protection and privacy controls across cloud infrastructure
Experience designing and implementing secure connectivity patterns for healthcare customers, including PrivateLink, VPN, and Direct Connect
Expertise in container supply-chain security, including SBOM (Software Bill of Materials), signing, scanning, and runtime policy enforcement
AWS Certified SysOps Administrator – Associate or Professional
Certified Kubernetes Administrator (CKA)
Bachelor's degree in computer science, Information Technology, or a related field, or equivalent experience
Proven experience in Site Reliability Engineering, DevOps, or a similar role

Software Engineer III (P3 - 360 Cloud SRE Engineer)

Key skills

About this role

Responsibilities:

Requirements: