Convergenz is seeking a Senior Site Reliability Engineer to work remotely during EST hours. The role involves collaborating with development, infrastructure, and operations teams to ensure the reliability and performance of applications and infrastructure.
Responsibilities:
- Work with program development teams, infrastructure and platform services teams, and traditional operations and maintenance teams
- Combine aspects of software engineering with traditional operations to maintain and improve the reliability, availability, and performance of cloud, infrastructure, and large-scale software systems and services while minimizing downtime and mitigating potential failures
Requirements:
- Bachelor's degree and 10 years of IT experience, Master's degree and 8 years of experience, No degree and 14 years of experience
- Eligible to obtain and maintain a government security clearance with Department of Commerce
- Must possess minimum 3+ years of actual experience in the industry in an SRE role
- Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python etc
- Must have recent Java and Spring Boot experience
- Must have observability experience
- Knowledge and experience with Agile and DevSecOps methodologies
- Experience with the following software/tools: Source code and binary repository products and techniques (GitHub, GitLab, BitBucket, Artifactory, Nexus, etc.)
- Infrastructure and Cloud Management tools such as AWS CloudWatch
- Log Management and Analysis tools such as Splunk
- Automation and Configuration Management tools such as Terraform or Puppet
- Knowledge and experience with NewRelic and/or other AIOps platforms
- Have programming skills – JavaScript, Ruby and/or Go
- Experience with Nginx, HAProxy, Docker, Kubernetes or similar technologies
- Experience with messaging systems, collaboration software, application-based firewall and proxy server(s), and operating systems
- Experience with Linux and Windows operating systems, along with scripting tools and techniques such as Bash, CSH, KSH, ZSH, etc. and/or PowerShell
- Experience with Monitoring and Alerting tools such as Prometheus, Grafana and Datadog