Planet is a company that designs, builds, and operates the largest constellation of imaging satellites in history. They are seeking a Software Engineer in Platform Operations to help design, build, and operate the core infrastructure for their engineering teams, focusing on security, scalability, and reliability of their cloud-native platform.
Responsibilities:
- Design and implement core Infrastructure-as-Code (IaC) solutions to ensure the secure and scalable operation of Planet's services
- Actively work on major platform modernization initiatives, including the full migration from legacy tooling to new solutions
- Manage cloud-based infrastructure services, notably our fleet of Kubernetes clusters, and associated tooling to meet internal needs and support customer-facing service level agreements
- Enhance and maintain observability for key platform services, leveraging Grafana and other tools to establish Service Level Objectives (SLOs) and improve operational readiness
- Implement improvements and features for core systems owned by the team, such as GKE clusters, public API gateway, and other managed infrastructure solutions
- Collaborate with software engineering teams to refine the developer experience (DevEx) of our managed infrastructure
Requirements:
- 4+ years of experience in a Platform Engineering, System Administration, DevOps, or Site Reliability Engineering (SRE) role
- Deep understanding of Kubernetes, underlying compute systems, and Linux
- Working knowledge of public clouds, particularly Google Cloud Platform (GCP) or Amazon Web Services (AWS)
- Experience with CI/CD tools (e.g. GitLab, ArgoCD), Configuration Management (e.g. Terraform, Crossplane) and GitOps principles
- Ability to use an operational mindset and troubleshooting prowess for complex production environments
- Experience building services in languages such as Go and Python using tools like Git, Docker, and CI/CD workflows
- Experience building services that leverage cloud-based infrastructure and tooling such as AWS or GCP
- Ability to collaborate and clearly communicate designs and decisions verbally and in writing
- Experience in the operational management and development of core platform systems or open-source infrastructure projects
- Experience with maintaining highly available or operationally resilient infrastructure at very large scales or across multiple clouds
- Practical experience with networking and network architectures as it relates to platform infrastructure
- Experience with Observability tools and best practices