Lead daily support operations for Apigee OPDK, Apigee Hybrid
Troubleshoot runtime, policy, routing, and security issues on DataPower appliances
Develop specifications for complex infrastructure systems
Contribute to testing of business, application and technical infrastructure requirements
Implement reliability improvements through Infrastructure-as-Code (IaC) using Terraform, Ansible, and GitOps
Develop automated recovery scripts and tools to reduce manual operational overhead
Review and analyze solutions for cloud security, secrets management and key rotations
Design, code, test, debug and document programs using Agile development practices
Plan and execute version upgrades, patching cycles, infrastructure migrations, and configuration refactoring
Improve proactive alerting to reduce mean time to detect (MTTD) and mean time to recover (MTTR)
Own and resolve P1/P2 high-severity incidents
Direct the daily risk and control flow of operations
Participate in design discussions, architectural reviews, API governance activities, and platform modernization initiatives
Work with CAB (Change Advisory Board) for change planning, approvals, and execution tracking
Contribute to runbooks, SOPs, architectural diagrams, and platform knowledge base assets
Requirements
4+ years of Technology Infrastructure Engineering and Solutions experience
4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics
3+ years of experience working with Red Hat Enterprise Linux and Kubernetes
3+ years of experience with Site Reliability Engineering and supporting production grade
3+ years of experience with automation & scripting
4+ years of experience in IT Service Management (ITSM)
Experience with API management platforms such as Apigee or API gateways
Exposure to IBM DataPower or similar enterprise integration tools
Expertise in Ansible Tower
Experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure
Strong experience working in Agile methodologies / Scrum environments
Experience improving system reliability, scalability, and operational efficiency
Experience in project management and stakeholder engagement
Proven experience in leading cross-functional teams