Palo Alto Networks is dedicated to protecting the digital way of life and is seeking a Principal Engineer for their Dev Infrastructure team at Chronosphere. This role involves building developer tooling to enhance developer velocity and reliability, while managing the entire development lifecycle using modern cloud-native technologies.
Responsibilities:
- Architect & Build: Design and maintain high-scale developer tooling and backend services that improve productivity and reliability across a distributed cloud environment. We operate in a 100% modern, cloud-native ecosystem, and you will work exclusively with ephemeral infrastructure and containerized microservices
- Infrastructure as Code (IaC): Treat infrastructure as a first-class citizen. You will define, deploy, and manage entire environments using declarative IaC (Terraform), ensuring our platform is reproducible and version-controlled
- Drive Systemic Quality: Identify and eliminate systemic bottlenecks in the software development lifecycle (SDLC) through architectural changes or advanced tooling
- Scale & Reliability: Ensure our infrastructure remains resilient under massive traffic loads while optimizing for performance, cost-efficiency, and near-real-time telemetry processing
- Strategic Leadership & Mentorship: Define platform standards and reference architectures that span a 1–3 year horizon, balancing feature velocity with long-term technical debt. Act as the "glue" across teams, consulting on infrastructure best practices and up-leveling the organization through mentorship
Requirements:
- 8+ years of relevant experience with the following:
- Strong experience in at least one backend language (e.g., Go, Java, Python, or Rust). We value fluency and the ability to write modular, testable code over knowing a specific syntax
- Deep Systems Expertise: You go beyond 'using' the cloud; you understand how it works
- Cloud-Native: A solid understanding of cloud-native concepts and experience working with cloud providers like AWS or GCP. You should be comfortable navigating Kubernetes and container-level logic
- Operating Systems & Compute: Deep knowledge of Linux internals, process management, and resource isolation
- Networking & Security: Understanding of the OSI model, service meshes, load balancing, and 'zero-trust' security architectures
- Distributed Systems: Experience building and debugging systems that deal with CAP theorem trade-offs, eventual consistency, and distributed tracing
- Reliable Execution: A track record of completing assigned tasks/tickets reliably and estimating work effectively within a sprint. You take ownership of features from local development through to basic testing and delivery
- Analytical Debugging & Quality: The ability to debug your own code efficiently using logs and tests, while proactively identifying edge cases (like nulls or limits) during the design phase
- Collaborative Spirit: Strong communication skills to keep teammates informed, raise blockers early, and contribute meaningfully to code reviews and design discussions
- Continuous Learning: A proactive approach to learning new tools, processes, and libraries. You are open to feedback and use incidents or code reviews as opportunities to up-level your skills
- AI-Native Development: You embrace the future of engineering. Experience or interest in using AI coding assistants (like Cursor or Claude) to improve productivity and automate boilerplate tasks