Mixpanel is a leading analytics company that helps teams turn data clarity into innovation. They are seeking a Software Engineer for their DevInfra team to enhance the software development lifecycle and improve the deployment of AI tools across engineering teams.
Responsibilities:
- We partner with a wide variety of teams to build out their software development lifecycle to be optimized for speed, safety, and reliability. We support teams writing front-end UI code as well as teams maintaining our highly stateful storage systems deep in our stack
- We are responsible for deploying AI tools like Claude Code to our entire engineering team. We are building agents into our platform to do things like augment on-call response and automatically one-shot bugs that come through a team’s triage queue
- We support systems that ingest more than 1 Trillion user-generated events every month while balancing low end-to-end query latency. Mixpanel queries typically scan more than 1 Quadrillion events over the span of a month. More details in this blog post
- Mixpanel runs entirely on Google Cloud Platform and Google Kubernetes Engine. DevInfra is the overall admin for our cloud environment, so we wear many hats. We are responsible for things like Terraform, cost management, networking, and security best practices
- We are the overall service owner for Kubernetes. Individual teams are responsible for maintaining and monitoring their own clusters and workloads. We are responsible for setting standards for deployment, observability, and developer experience. We facilitate efforts like Kubernetes version upgrades and helping teams adopt new Google Kubernetes Engine features
- Service ownership of our observability pipelines. We have instrumented over 30 million Prometheus metrics time series and use Open Telemetry to ingest over 4 billion distributed tracing spans per month. An example: How We Migrated from StatsD to Prometheus in One Month
- We maintain our GitHub Actions based CI/CD pipelines. We maintain a robust GitHub-native delivery process that empowers teams to self-serve their needs while ensuring safety and reliability. See this blog post for an example
- We maintain our devbox system that provisions cloud development environments for individual developers
- We procure best in breed SaaS tooling for the rest of engineering. We own our relationship with GitHub, Honeycomb, Chronosphere, Sentry, and more
- We run the new engineer onboarding process. If you join us, your first meeting will probably be with us!
Requirements:
- 3+ years of industry experience
- Experience with at least one of: Go, Python, or JavaScript/TypeScript
- Production experience working with Kubernetes
- Production experience managing infrastructure in a major cloud provider such as Google Cloud Platform, Amazon Web Services, or Azure. Preferably Google Cloud Platform
- Knowledgeable about coding agents like Claude Code
- Experience with observability solutions like OpenTelemetry, Prometheus, or Distributed Tracing for application monitoring and performance analysis
- Experience with service mesh
- You write excellent docs
- Proficient with GitHub Actions and the GitHub ecosystem
- Experience building deployment pipelines. We love GitOps!
- Bazel expertise
- Experience with Terraform
- Experience with cloud Identity and Access Management (IAM) systems and knowledgeable about security best practices
- Experience implementing Internal Developer Platforms like Backstage
- Working knowledge of site reliability engineering (SRE) principles such as implementing Service Level Objectives (SLOs)
- You ❤️ open source