Distributed SystemsDockerGrafanaJenkinsKafkaLinuxMongoDBPerlPrometheusPythonSparkUnixBashDynatraceAppDynamicsGitCachingCI/CDRemote Work
About this role
Role Overview
Working closely with engineering/development teams to design, build, and maintain systems
Troubleshoot issues across the entire stack: hardware, software, application and network
Identifying and drive opportunities to improve automation for our platforms
Proactively identifying and addressing systems reliability risks
Represent the RPE organization in design reviews and operational readiness exercises for new and existing services
Participate in on-call rotation and periodic conference calls with other specialists from other time zones
Requirements
At least 4 years of experience in a SRE role
Background in Computer Science equivalent to a B.Sc.
Automation-related experience is particularly valued using scripting languages such as Python, Bash, Perl
One higher level language is desired
Experience on supporting three tier architecture which includes exposure to UNIX, Linux platforms and databases such DB2, Sybase or relational databases like MongoDB
Experience with source code and binary repositories, build tools, and CI/CD (Git, Artifactory, Jenkins, Docker) etc. and data streaming technologies like Spark, Kafka
Hands on experience on enterprise tools set such as Grafana, Prometheus, Dynatrace, AppDynamics
Awareness of modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes, micro services
Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; understanding of how applications are affected by the above, and ability to debug same