Flexera is a pioneer in Hybrid ITAM and FinOps, providing award-winning SaaS solutions for technology value optimization. The Site Reliability Engineer (Cloud Enablement) will design and support tools and platforms to enhance developer experience and operational excellence while managing cloud costs effectively.
Responsibilities:
- Design, build, advocate for and support the common tools and delivery platform used by Flexera developers
- Improve developer experience and operational excellence
- Foster collaboration and knowledge sharing across Flexera
- Select and rollout supported defaults and standards for CI/CD tooling, Observability, Security and Runtime Environment
- Work with teams across several continents, build relationships with our engineers by listening and understanding their needs and balancing this with the needs of our business
- Research new tools and patterns and continuously measure and evolve our ways of doing things
- Cloud Cost Optimization uses a combination of strategies, techniques, best practices and tools to help manage/reduce cloud costs
Requirements:
- Developer/DevOps/SRE/Platform experience and a strong interest in software delivery and ongoing operation
- Rolled out automation, tools, technologies, patterns and guardrails across an organization
- Experience working in a globally distributed team
- Deep & extensive public cloud (preferably Azure) knowledge & experience
- Deep knowledge of containers (Docker) orchestration (Kubernetes)
- Knowledge of tools and patterns around CI/CD (familiar with GitHub Actions, Travis CI, Circle CI or similar)
- Observability knowledge; Logs, Tracing, Metrics and experience in a few of Elastic Stack, XRay, Jaeger, Zipkin, Prometheus, Honeycomb or LightStep. Enterprise observability tools such as SumoLogic, NewRelic, DataDog etc
- Cloud cost optimization; Using automation to keep Cloud cost under control and within budget. Enabling individual Engineering teams with cloud cost optimization
- Knowledge of operations, including incident management, immutable infrastructure as code (esp. Terraform or CloudFormation), and problem-solving
- Produced robust well-tested code preferably in Golang; however, we will also consider Python, JavaScript, Ruby, Java or C# if you are happy to learn Go
- Excellent communication skills, including experience in writing good documentation and running workshops
- Vendor selection and/or management experience
- Agile software delivery methodologies
- Experience managing cloud-based services e.g. AWS, Azure at scale
- Experience with DevOps
- Experience with docker Containers, Kubernetes, EKS, ECS
- Infrastructure as code e.g. Terraform, CloudFormation
- CI/CD pipelines using Jenkins, travisCI, teamcity, pipeline as code
- Automation / Configuration Management at scale e.g. Puppet, Chef, Ansible, Salt, Packer etc
- Service mesh such as ishtio, Consul or similar
- Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++
- Experience with IaaS and Serverless services from a cloud provider
- A strong understanding in TCP/IP, DNS and experience designing networks
- Linux & Windows system administration experience
- Experience implementing fault detection, and automating fixes
- Experience designing scalable services
- Experience designing distributed, fault-tolerant systems
- A good understanding of SQL, No-SQL databases
- A solid understanding of data structures and algorithms
- A positive attitude and willingness to learn
- Strong conflict resolution competence
- Excellent written and verbal communication skills
- Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why
- Bachelor's or higher degree in Computer Science, Information Technology, or a related field
- At least 4 years of hands-on job experience managing services in a public cloud
- At least 2 years of experience working as a senior member of a centralized Cloud enablement / Platform or a similar team