Riverside Recruiting is a company that develops software for people with disabilities, aiming to make a positive impact in their lives. They are seeking a Senior DevOps Engineer to lead the evolution of their high-performance infrastructure, ensuring systems are secure, cost-optimized, and resilient while managing diverse environments and AI infrastructure.
Responsibilities:
- Design and maintain a mix of VM-based (Alpine/Ubuntu) and containerized (GKE) environments using Terraform and Packer for cloud and on-premise deployments
- Optimize high-performance NFSv3 (Filestore) mounts and manage complex VPC networking, including subnets, firewalls, and secure internal routing
- Manage the lifecycle of Google Cloud Datastore, MySQL, and MongoDB, with a heavy focus on high-availability, replica set tuning, and automated backups
- Scale AI infrastructure utilizing GPUs and lifecycles for high-concurrency ML workloads
- Enforce strict security standards across Alpine, Ubuntu, and Debian systems, including OS-level hardening, IAM least-privilege, and answering complex security compliance requirements
- Develop and test robust DR strategies and implement comprehensive monitoring to ensure system health
- Proactively optimize GCP spend by rightsizing resources, managing idle costs, and leveraging efficient autoscaling policies
Requirements:
- Must live in Eastern Time Zone or Central Time Zone
- Design and maintain a mix of VM-based (Alpine/Ubuntu) and containerized (GKE) environments using Terraform and Packer for cloud and on-premise deployments
- Optimize high-performance NFSv3 (Filestore) mounts and manage complex VPC networking, including subnets, firewalls, and secure internal routing
- Manage the lifecycle of Google Cloud Datastore, MySQL, and MongoDB, with a heavy focus on high-availability, replica set tuning, and automated backups
- Scale AI infrastructure utilizing GPUs and lifecycles for high-concurrency ML workloads
- Enforce strict security standards across Alpine, Ubuntu, and Debian systems, including OS-level hardening, IAM least-privilege, and answering complex security compliance requirements
- Develop and test robust DR strategies and implement comprehensive monitoring to ensure system health
- Proactively optimize GCP spend by rightsizing resources, managing idle costs, and leveraging efficient autoscaling policies
- GKE (Standard/Autopilot), Managed Redis, Cloud Storage, Artifact Registry, and expert-level gcloud CLI
- Immutable Patterns: Terraform (state management), Ansible, Packer (HCL), and Managed Instance Groups (MIG)
- Multi-Distro Linux: Bash/Python scripting across Alpine (OpenRC), Ubuntu, and Debian
- Deep understanding of NFS (v3 vs v4), HTTP, TLS, Web Socket, SSH Tunneling (ProxyJump), and Load Balancing (GFE/URL Maps)
- MongoDB (replica sets/WiredTiger), MySQL, and Redis-backed task queues
- Managing Pod Autoscaling (HPA/VPA), Node Pools, Taints, and Tolerations
- Google Professional Cloud DevOps Engineer
- Google Professional Cloud Architect
- Certified Kubernetes Administrator (CKA)
- MongoDB Certified DBA