Design, develop, test, and deploy automation tools, scripts, and engineering solutions to improve the stability, performance, and efficiency of production systems.
Identify opportunities to automate manual operational processes and reduce operational overhead.
Support and improve the release and deployment lifecycle of applications, ensuring reliable and controlled production rollouts.
Collaborate with software engineers and infrastructure teams to troubleshoot and resolve system issues.
Contribute to system design discussions, platform management, and capacity planning.
Create and maintain clear technical documentation for automation tools, operational procedures, and reliability improvements.
Provide regular updates on progress and deliverables to engineering stakeholders.
Requirements
At least 1 year of professional software development or reliability engineering experience
Proficiency in one or more programming languages such as Python, C++, Java, or shell scripting
Strong understanding of Linux operating system internals
Solid knowledge of networking concepts and troubleshooting
Experience with modern version control systems such as Git
Familiarity with monitoring, logging, and CI/CD tools (e.g., Prometheus, Grafana, Splunk, Jenkins, GitLab CI) is highly beneficial.
Ability to work independently, manage priorities effectively, and deliver results with minimal supervision.
Excellent written and verbal communication skills, with the ability to clearly communicate technical topics to engineering stakeholders.
Ability to quickly learn new technologies and tools and work across multiple programming languages and frameworks.