Oracle is a leading company in AI and cloud solutions, seeking a visionary Principal Software Engineer – Agentic AI to lead the evolution of their observability platform. The role involves architecting scalable solutions that integrate telemetry, LLMs, and autonomous agents to deliver predictive insights and real-time decision support across their global cloud infrastructure.
Responsibilities:
- Architect and implement agentic AI frameworks for observability tooling stack on Oracle Cloud Infrastructure
- Design and operationalize Retrieval-Augmented Generation (RAG) pipelines and vector database integrations
- Lead the transformation of observability systems using Grafana, Prometheus, Loki, Tempo, and Open Telemetry across multi-cloud environments (OCI, AWS, Azure)
- Drive AI/ML initiatives from concept to deployment, including predictive analytics, time-series modeling, and statistical inference
- Collaborate with engineering, product, and executive teams to align technical solutions with business goals
- Deliver roadmap, architecture diagrams, and executive-level presentations to communicate complex systems with clarity
- Mentor cross-functional teams and contribute to strategic hiring for AI and observability talent
Requirements:
- Deep experience with Large Language Models (LLMs), with hands-on implementation of intelligent workflows on equivalent frameworks
- Proven track record in designing agentic AI systems, multi-agent orchestration, and LLM-based automation for enterprise-grade use cases
- Proficient in Retrieval-Augmented Generation (RAG) architectures, vector database integration, and advanced prompt engineering for scalable, context-aware AI solutions
- Extensive experience with modern observability stacks, including Grafana, Prometheus, Loki, Tempo, and Open Telemetry
- Skilled in large-scale implementation, service integration, and architecting telemetry pipelines across multi-cloud environments (OCI, AWS, Azure) with a focus on reliability, scalability, and performance
- Proven ability to lead AI/ML initiatives, including capabilities like predictions and anomaly detections, and operational integration
- Strong proficiency in Python, microservices architecture, containerization (Docker/Kubernetes), API design, CI/CD pipelines, and event-driven systems
- Adept at building iterative diagramming, and crafting executive-level presentations
- Skilled in stakeholder engagement and cross-functional alignment across engineering, product, and leadership teams
- Bachelor's degree in computer science, Engineering, or a related technical field
- 10+ years of experience in cloud architecture, observability platforms, or enterprise AI systems
- 5+ years in technical leadership or principal architect roles
- Master's or PhD in Computer Science, AI, or Systems Engineering
- Published thought leadership in AI observability or autonomous infrastructure
- Experience mentoring senior engineers, architects, and building high-impact technical teams