Saviynt is a leader in identity security, delivering an AI-powered platform for governing and securing access to applications and data. As a Principal Engineer, you will define and drive the reliability strategy for their SaaS platform, focusing on designing and maintaining shared infrastructure services and ensuring high availability and performance.
Responsibilities:
- Define and drive the reliability strategy for our SaaS platform
- Design, build, and maintain the shared infrastructure services and platforms that our product and application teams will depend on
- Create reusable, reliable, and scalable solutions that abstract away complexity, enabling other teams to focus on their core business logic and deliver features faster in a multi-cloud environment
- Design and build core platform components and shared infrastructure services that other development teams will integrate with and leverage to deploy and operate their applications
- Architect, implement, and manage highly available and scalable Kubernetes platforms as a service for internal consumers
- Develop robust, internal-facing tools and automation for infrastructure provisioning and management primarily using Go (Golang)
- Architect and optimize foundational solutions within Cloud environments (AWS, Azure, etc.), focusing on creating reusable patterns and modules for other teams
- Design and implement shared Event-Driven Architecture components and messaging platforms using technologies like Kafka or Google Pub/Sub that product teams can easily utilize
- Develop and maintain robust CI/CD pipelines (e.g., GitLab CI and ArgoCD) as a service, providing standardized and automated deployment workflows for various development teams
- Design and build resilient Distributed Systems components that serve as building blocks for other applications, focusing on reliability, fault tolerance, and performance
- Manage and optimize our shared infrastructure across Multi-Region Cloud Environments, ensuring that platform services are globally available and performant for all consumers
- Establish and enhance centralized Observability and Monitoring platforms and tools that provide self-service insights for consuming teams
- Define and implement clear, well-documented RESTful API designs for the infrastructure services you build, ensuring ease of integration for internal clients
- Implement and manage Service Mesh (e.g., Envoy, Istio) capabilities, providing traffic management, security, and policy enforcement as a shared platform for services
- Design, implement, and optimize highly available Relational Database services or shared data platforms for broad organizational use
- Collaborate closely with product development teams to understand their infrastructure needs and pain points, providing technical guidance and support
- Participate in on-call rotations to support the critical shared infrastructure you build
Requirements:
- 9+ years of experience in an Infrastructure Development, Platform Engineering, or Site Reliability Engineering role, with a strong focus on building tools and services for other engineers
- Deep expertise with Kubernetes in production environments, particularly in providing it as a platform(i.e single tenant and multi-tenant deployment architectures)
- Strong programming skills in Go (Golang) and Python, with experience building robust, maintainable backend services and automation
- Extensive hands-on experience with at least one major Cloud Provider (AWS, GCP, or Azure); multi-cloud experience is a strong plus, especially in building abstractions over them
- Proven experience designing and implementing Event-Driven Architecture and message queuing systems (e.g., Kafka, RMQ, NATS) as shared services
- Solid understanding and practical experience with CI/CD pipeline tools (especially GitLab CI) and experience establishing automated delivery processes for other teams
- Demonstrable experience designing and operating Distributed Systems, with an understanding of patterns for creating reliable, shared components
- Familiarity with Multi-Region Cloud Environments and strategies for building globally distributed and highly available platform
- Proficiency in establishing and utilizing comprehensive Observability and Monitoring platforms (e.g., Prometheus, Grafana, ELK stack, Datadog) for shared infrastructure
- Strong experience with RESTful API design principles and building well-documented, consumable APIs
- Knowledge of Service Mesh concepts and practical experience with solutions like Istio in a platform context
- Hands-on experience with Relational Databases (e.g., MySQL, PostgresSQL), ideally in managing them as a service
- Excellent communication skills and the ability to clearly articulate complex technical concepts to both technical and non-technical audiences
- A strong customer-centric mindset, treating internal development teams as your primary customers
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience or equivalent military experience required