Oracle is a leading company in AI and cloud solutions, and they are seeking a Senior Cloud DevOps Engineer to manage critical cloud infrastructure deployments and enhance service efficiency. The role involves automating deployment processes, incident management, and collaborating with development and operations teams to ensure reliable service delivery for Oracle Analytics Applications.
Responsibilities:
- Develop and maintain automated tools and systems to streamline operations, reduce manual intervention and improve overall service efficiency
- Deployment of infrastructure as code (IaC) for provisioning and configuration
- Performing root cause analysis on defects and outages and prevent recurrences
- Monitor system performance, identify potential issues and ensure systems are running efficiently
- Monitor and maintain security measures to protect against threats and ensure compliance
- Write scripts and configurations to automate tasks like building and deploying software
- Ensuring system scalability, security, and high availability
- Responsible for testing and implementing disaster recovery plans to handle major outages
- Monitoring systems to detect potential issues before major incidents are reported in production
- Responsible to establish alerts systems to notify relevant teams when issues arise
- Managing changes to infrastructure, ensuring they are implemented safely and reliably
- Championing a culture of continuous improvement and innovation within the DevOps team
- Identifying bottlenecks and areas for improvement, as a DevOps engineers look out to optimize workflows and improve customer experience
- Create custom Analytics reports and customize the data that is used for reporting. This allows them to focus on specific aspects of the service resilience
- Monitoring and measuring customer experience and KPIs and point out to improve overall service resiliency
- Maintaining comprehensive documentation of security practices, procedures and incidents
- Create security controls to mitigate identified risks
- Automating security processes and integrating security tools with pipelines to create and assign jira to specific stakeholders
- Proactively monitor build and deployment, troubleshoot issues, and resolve errors
- Automating tasks and process within software development lifecycle, such as CI/CD pipelines and Infrastructure management
- Support the operations of Oracle Analytics Applications on OCI using Cloud DevOps methodologies including: Incident management: analyze T2 metrics, and alarms, alongside Lumberjack logs. Troubleshoot, repair, and document infrastructure and service issues
- Start / stop / upgrade cloud infrastructure and services using OCI tools
- Participate in 24x7 technical support offering customer technical assistance to manage Oracle Analytics Data Intelligence service
- Manage and continuously improve existing Oracle Analytics Apps cloud capabilities and tools, with a focus on OCI tools, process, and configuration
- Perform daily tasks in accordance with process, compliance and regulatory standards
Requirements:
- Citizen of United States
- Security Clearance required to work in OCI Gov Cloud
- Participate in on-call Rotations to support US Government and Commercial Cloud deployments
- On-call 24x7 rotations scheduled for after business hours or weekends
- Bachelors or Master's Degree in Computer Science or equivalent from reputed universities with a consistently good academic record
- 4+ years experience with hands-on knowledge on cloud platforms, cloud services, Docker Container based applications
- Experience with Cloud Platforms: Configuration, operations, tools and process
- Linux/Unix system administration including system level knowledge of Linux on Cloud Platforms, creating and executing scripts
- Production Application deployments across multiple preproduction and production environments on Oracle Cloud infrastructure
- Cloud Platform experience, such as OCI, AWS, Azure, or GCP compute, storage, and network operational experience
- Understand internet networking services, such as DNS, HTTP, etc
- Documenting technical procedures and configurations
- Proficiency in scripting languages like Python, shell or Bash to automate tasks
- Create and maintain CI/CD focusing on automated pipelines for continuous integration and continuous deployment
- Containers and orchestration (Docker, Kubernetes, and docker-compose)
- Assisting with system troubleshooting and problem resolution
- Oracle database (experience with strong query writing skills & DB management will be a plus)
- Managing Security Operations with good understanding of scanning tools, triaging and resolving vulnerabilities, using security frameworks and meet compliance standards
- Excellent scripting skills in Bash and Python
- Ability to multitask, prioritize and manage time efficiently
- Experience with monitoring and logging tools such as Prometheus, Grafana is essential
- Knowledge on Security Scanners (Parfait, Sonatype, Fortify, Nessus) is desirable
- Good interpersonal skills and communication with all levels of management
- Working with remote, global teams as well as individuals
- Monitoring and supporting OCI Cloud infrastructure services and databases
- Strong problem-solving and troubleshooting abilities
- Excellent communication and collaboration abilities
- Ability to work effectively with development and operations teams