Caris Life Sciences is transforming cancer care and changing lives through precision medicine. The Cloud Engineer will design, implement, and maintain scalable cloud-native infrastructure, focusing on AWS and Kubernetes, while collaborating with cross-functional teams to ensure reliable and secure cloud environments.
Responsibilities:
- Develop and maintain Infrastructure as Code (IaC) using Terraform and reusable Terraform modules, and design, implement, and maintain CI/CD pipelines in GitLab (GitLab CI/CD), using established patterns and reusable components to support consistent deployments across environments
- Design, deploy, and support Kubernetes environments on AWS EKS, ensuring reliability, scalability, and security of cloud workloads
- Implement and maintain core Kubernetes capabilities such as networking, ingress, storage, and cluster upgrades following established best practices
- Participate in proof-of-concept efforts for new tools and technologies, contributing hands-on implementation and feedback
- Follow and help maintain Kubernetes and cloud infrastructure standards, including upgrade processes, security configurations, and performance tuning
- Collaborate with application and platform teams to support cloud-native application deployments, containerized workloads, and microservices architectures
- Implement and maintain observability solutions using metrics, logs, and alerts to support monitoring, troubleshooting, and operational visibility
- Develop and maintain Infrastructure as Code (IaC) and CI/CD pipelines using established patterns and reusable components to support consistent deployments across environments
- Support cloud security initiatives by implementing access controls, encryption, and compliance requirements according to defined security standards
- Contribute to cloud migration and application modernization efforts by executing migration tasks and validating deployed solutions
- Participate in incident response and on-call rotations, helping diagnose issues, restore services, and improve operational reliability through post-incident learnings
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or a related field
- 6+ years of experience with Linux systems, cloud engineering, and DevOps practices
- Understanding of cloud technologies, such as AWS or GCP
- Ability to port on-prem python scripts to work in the cloud
- Proficient in Docker & Kubernetes, with a good knowledge of its ecosystems
- Security focused mindset
- Experience with Terraform and Terraform modules
- Experience with continuous integration and continuous deployment (CI/CD) infrastructure, including GitLab CI/CD
- A proactive approach to spotting problems, areas for improvement and performance bottlenecks across services and levels of the technical stack
- Know your way around the Linux shell
- Git source code control
- Understanding of how enterprise server hardware is setup and how to add devices to the configuration
- Amazon Web Services (AWS), Google Cloud Platform (GCP) or large cloud provider certifications
- Regarding MMR, experience with MMR (Management, monitoring and reporting), specific experience with Grafana, Prometheus, or AlertManager is a bonus
- Config Management and Automation
- Overall experience with MMR (Management, monitoring and reporting)