Flexera is a pioneer in Hybrid ITAM and FinOps, providing award-winning SaaS solutions for technology value optimization. The Site Reliability Engineer (Cloud Enablement) will design and support tools and platforms to enhance developer experience and operational excellence while managing cloud costs effectively.

Responsibilities:

Design, build, advocate for and support the common tools and delivery platform used by Flexera developers
Improve developer experience and operational excellence
Foster collaboration and knowledge sharing across Flexera
Select and rollout supported defaults and standards for CI/CD tooling, Observability, Security and Runtime Environment
Work with teams across several continents, build relationships with our engineers by listening and understanding their needs and balancing this with the needs of our business
Research new tools and patterns and continuously measure and evolve our ways of doing things
Cloud Cost Optimization uses a combination of strategies, techniques, best practices and tools to help manage/reduce cloud costs

Requirements:

Developer/DevOps/SRE/Platform experience and a strong interest in software delivery and ongoing operation
Rolled out automation, tools, technologies, patterns and guardrails across an organization
Experience working in a globally distributed team
Deep & extensive public cloud (preferably Azure) knowledge & experience
Deep knowledge of containers (Docker) orchestration (Kubernetes)
Knowledge of tools and patterns around CI/CD (familiar with GitHub Actions, Travis CI, Circle CI or similar)
Observability knowledge; Logs, Tracing, Metrics and experience in a few of Elastic Stack, XRay, Jaeger, Zipkin, Prometheus, Honeycomb or LightStep. Enterprise observability tools such as SumoLogic, NewRelic, DataDog etc
Cloud cost optimization; Using automation to keep Cloud cost under control and within budget. Enabling individual Engineering teams with cloud cost optimization
Knowledge of operations, including incident management, immutable infrastructure as code (esp. Terraform or CloudFormation), and problem-solving
Produced robust well-tested code preferably in Golang; however, we will also consider Python, JavaScript, Ruby, Java or C# if you are happy to learn Go
Excellent communication skills, including experience in writing good documentation and running workshops
Vendor selection and/or management experience
Agile software delivery methodologies
Experience managing cloud-based services e.g. AWS, Azure at scale
Experience with DevOps
Experience with docker Containers, Kubernetes, EKS, ECS
Infrastructure as code e.g. Terraform, CloudFormation
CI/CD pipelines using Jenkins, travisCI, teamcity, pipeline as code
Automation / Configuration Management at scale e.g. Puppet, Chef, Ansible, Salt, Packer etc
Service mesh such as ishtio, Consul or similar
Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++
Experience with IaaS and Serverless services from a cloud provider
A strong understanding in TCP/IP, DNS and experience designing networks
Linux & Windows system administration experience
Experience implementing fault detection, and automating fixes
Experience designing scalable services
Experience designing distributed, fault-tolerant systems
A good understanding of SQL, No-SQL databases
A solid understanding of data structures and algorithms
A positive attitude and willingness to learn
Strong conflict resolution competence
Excellent written and verbal communication skills
Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why
Bachelor's or higher degree in Computer Science, Information Technology, or a related field
At least 4 years of hands-on job experience managing services in a public cloud
At least 2 years of experience working as a senior member of a centralized Cloud enablement / Platform or a similar team

Site Reliability Engineer (Cloud Enablement)

Key skills

About this role

Responsibilities:

Requirements: