Ensure availability, scalability, reliability and security of cloud platforms and services.
Design, deploy, and govern AI-powered agents (e.g., using Azure Copilot /AWS Bedrock).
Supervise and refine AI-generated Infrastructure-as-Code (IaC) (Terraform/Ansible).
Implementing AI based automation solutions for Cloud Operations.
Manage security and compliance by utilizing AI agents to detect configuration drift and auto-generate compliant updates for IAM, network, and security policies.
Collaborate with application, architecture, AIOps, FinOps and security teams to ensure production readiness.
Lead proof-of-concepts, implement new cloud services and Adapt quickly to new cloud releases and features to enhance operational capabilities.
Requirements
5 years of experience in public cloud operations (AWS, Azure, GCP)
Deep, demonstrable expertise designing and operationalizing solutions leveraging AWS Bedrock/Agent Frameworks and Azure Copilot for Cloud Operations.
Expertise in Infrastructure as Code (Terraform, CloudFormation), Ansible, and CI/CD pipelines.
Proficiency in scripting languages (Python, Bash)
Expertise in integrating observability platforms (Dynatrace, Prometheus) into AI/ML platforms for predictive analysis and anomaly detection.
Understanding of Site Reliability Engineering (SRE) and operational reliability principles.
Experience with monitoring tools (CloudWatch, Prometheus, Dynatrace, azure monitor) and ServiceNow.
Excellent troubleshooting, problem-solving, and communication skills.
Hands-On experience on Jboss, Tomcat, IBM WAS, Oracle, DB2, MSSQL and Postgres is a plus.
Bachelor’s degree in computer science or other technical discipline.
Technical Certifications are a plus.
Tech Stack
Ansible
AWS
Azure
Cloud
Google Cloud Platform
Oracle
Postgres
Prometheus
Python
ServiceNow
Terraform
Benefits
Competitive salary commensurate with experience, qualifications and location.
Bonus Opportunity (based on Company and Individual Performance)