Innovaccer is building a secured modern healthcare cloud infrastructure and a massive data stack. They are looking for an experienced Site Reliability Engineer who will design and architect various domains of SRE, collaborate with different teams, and drive the adoption of SRE best practices while ensuring security and observability in their systems.
Responsibilities:
- Design, architect various domains of SRE
- Extensively collaborate with different teams and drive various initiatives and SRE best practices adoption
- Lead team of young engineers
- Responsible for building/automating secure cloud Infrastructure (Infrastructure As A Code - IaaC) with various pillars Cost, Reliability, Scalability, Performance, Cost etc
- Build CICD stack collaborating across Dev and QA/Automation team and drive organization to new level of (daily/hourly) continuous delivery and deployment
- Work closely with CISO, Dev team(s) and make security as first class citizens
- Develop S-CICD (Secure CICD), enable various security tool chains and vulnerability reports to developers via automation
- Drive observability charter spanning across logs, metrics, mesh, tracing etc
- Collaborate closely with Dev and QA team to bring given initiative to a closer, increased adoption of DevOps practices and tool chain
- Apply strong analytical skills to understand production system metrics, drive change, optimize system utilization and drive cost efficiency
- Autoscale/down the platform during peak season scenarios
- Ensure that the Platform is secured as per guidelines established by CISO, e.g., Secure against DDoS attacks by implementing WAF, Vulnerability and Patch management, install required security agents etc
- Lead least privilege based RBAC for various production services and tool chains
- Build and execute Disaster Recovery plan
- Key stakeholder to participate in case of IR (Incident Response)
Requirements:
- Bachelor's Degree or equivalent
- Solid experience with at least one of the clouds with automation focus is MUST -AWS, Azure, GCP
- Hands-on experience with Kubernetes along with Linux is MUST to have
- Programming experience with scripting languages e.g. Python is MUST
- Build and deployment experience building scalable CICD architectures and solutions is preferred
- Building observability stack from logs, metrics, traces, service mesh, data observability is preferred
- Must be good at documenting and structuring documents for consumption by various dev teams
- Certification has advantages
- Cloud Security is a major advantage and highly preferred skill
- Hands-on experience with a few of these - Kafka, Postgre, SnowFlake etc. is preferred