Create dashboards with Grafana/Prometheus which help communicate the metrics for a given product service.
Collaboration with other teams
Investigate, debug and provide resolution for customer issues.
Ensuring the security and reliability of shared Infrastructure with the Flexera cloud
Making Reliability a first-class citizen
Design, develop and deploy new features for Flexera products/platforms, as defined by goals from the SRE organization.
Work with product owners and product engineering teams as necessitated.
Be part of an on-call rotation for alerts that require engineering expertise to diagnose.
Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again.
Requirements
Bachelor’s degree in computer science, Information Technology, or a related field
Exposure to or hands on experience with AWS or other cloud services through Internships /training/work experience for 2+ years
Knowledge on Agile software delivery methodologies
Knowledge of managing cloud-based services like AWS or Azure at scale
Experience with DevOps
Knowledge on docker Containers, Kubernetes, EKS, ECS
Knowledge on Terraform, CloudFormation
Knowledge on Linux and good understanding of its commands
Good networking fundamentals
Good understanding of GitHub for collaboration and change management
Knowledge on AWS services such as EC2, ECS, EKS, S3
Knowledge on Database preferably MySQL, Amazon RDS and MongoDB
Understanding of RESTful APIs and other web-based application concepts
Any scripting language experience (Ruby is the current language, but comparable experience in Java, Python, Perl, etc. would suffice)