Design, code, and deploy software solutions and automation while looking for opportunities to optimize the existing code-base for maintainability and reusability
Create and deploy scalable monitoring systems and end-to-end solutions for a massively growing global infrastructure in collaboration with Software Engineering and Development teams
Monitor applications and services within the environments, participate in on-call rotation, and implement strategies to prevent future occurrences of issues
Resolve escalated issues and prevent recurring operational overhead by documenting and automating processes while deploying patches, upgrades, and administrative tools
Collaborate with cross-functional teams to recommend integration strategies for platforms and applications to constantly improve and identify opportunities for process improvement
Requirements
US Citizenship is required (due to the nature of assigned customers)
5+ years of industry experience in a 24/7 NOC or Cloud Operations environment
Proficiency with programming languages such as Python or Bash
Deep understanding of networking standard protocols including HTTP, DNS, TCP/IP, ICMP, and the OSI Model
Hands-on experience with monitoring tools (e.g. Nagios, Grafana, Prometheus, etc.) and networking principles like Firewalls and Load Balancing
Ability and flexibility to work after hours or weekends for application releases and deployments in a fast-paced environment