Oracle is seeking a DevOps Engineer to bridge development, infrastructure, and operations for modern cloud-native and AI-powered platforms. The role involves defining and implementing DevOps strategies, optimizing infrastructure, and ensuring high availability and performance of systems in production.
Responsibilities:
- Design, implement, and manage scalable, secure, and highly available cloud infrastructure across AWS, Azure, or GCP
- Build and maintain robust CI/CD pipelines for continuous integration, testing, and deployment using tools like Jenkins, GitLab CI, or GitHub Actions
- Develop and manage cloud-native architectures, ensuring efficient resource utilization, scalability, and reliability
- Implement and manage containerized environments using Docker and orchestrate them with Kubernetes
- Establish monitoring, logging, and tracing systems to ensure system reliability and enable real-time diagnostics
- Define and manage infrastructure using tools like Terraform, CloudFormation, or ARM templates
- Optimize infrastructure and deployment pipelines for performance, scalability, and cost-efficiency
- Support deployment and lifecycle management of ML/AI models, including versioning, monitoring, and scaling
Requirements:
- 10+ years of experience in DevOps, infrastructure engineering, or platform engineering roles
- Strong experience with cloud platforms (AWS, Azure, or GCP) and cloud-native architectures
- Expertise in CI/CD tools (Jenkins, GitLab CI, GitHub Actions) and automation practices
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes)
- Strong knowledge of Infrastructure as Code (Terraform, CloudFormation, ARM templates)
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, Datadog, etc.)
- Proficiency in scripting languages (Python, Bash, Shell scripting)
- Experience in microservices architecture and distributed systems
- Strong understanding of networking, security, and system design principles
- Experience with databases and storage systems (SQL, NoSQL, caching systems like Redis)
- Familiarity with MLOps practices and supporting AI/ML workloads in production
- Experience in managing high-availability systems, incident management, and disaster recovery
- Strong understanding of version control systems (Git)
- Ability to collaborate across teams and drive operational excellence