Epic is the leading digital reading platform for kids ages 12 and under, used by millions of children, families, and educators around the world. The Senior Software Engineer, Infrastructure and Data will be responsible for ensuring the reliability and efficiency of Epic's platform, managing GCP infrastructure, and optimizing data pipelines.
Responsibilities:
- Build and operate Epic's GCP infrastructure, ensuring high availability, scalability, and cost efficiency
- Manage and improve our Docker and GKE container platform, including workload scheduling, autoscaling, and networking
- Maintain and improve CI/CD pipelines that enable fast, safe delivery across engineering teams
- Monitor and improve system health and performance using New Relic
- Write and maintain Terraform to codify infrastructure across the organization
- Contribute to capacity planning, cost optimization, and architectural reviews
- Champion platform security best practices, including secrets management, IAM policies, and network segmentation
- Build, maintain, and optimize data pipelines and transformations to ensure reliability, performance, and data quality
- Collaborate with data and backend engineers to troubleshoot pipeline and service issues
- Participate in a frequent on-call rotation; contribute to incident response, post-mortems, and systemic improvements
- Provide guidance to developers on infrastructure concerns and best practices
Requirements:
- Bachelor's degree or higher in Computer Science, Software Engineering, or a related field
- 5+ years of experience in infrastructure, platform, DevOps, or a related engineering role
- Hands-on experience with GCP (GCE, GCS, VPC, IAM, Cloud Monitoring, and related services)
- Experience with Docker and Kubernetes (GKE)—containerizing workloads, deploying to GKE, Helm, and cluster fundamentals
- Experience with CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins, or similar)
- Experience with an observability platform such as New Relic (metrics, logging, alerting, dashboards)
- Proficiency in Terraform for managing infrastructure as code
- Scripting/programming skills in Python, Bash, or similar
- Comfort participating in a frequent production on-call rotation
- Strong problem-solving skills, sense of ownership, and ability to work effectively in evolving systems
- Fluency in English for daily collaboration and technical documentation
- Proficiency in Mandarin Chinese to collaborate effectively with global engineering and business partners
- Experience with BigQuery, dbt, and Dagster for data pipeline development and orchestration
- Familiarity with data warehouse or data pipeline architectures
- Experience in distributed or global engineering teams