Oracle is a leading company in AI and cloud solutions, seeking a Senior Principal AI/ML Software Engineer. This role focuses on optimizing AI/ML infrastructure and guiding strategic decisions for Oracle Cloud’s AI offerings.
Responsibilities:
- Evaluate, Integrate, and Optimize state-of-the-art technologies across the stack, for latency, throughput, and resource utilization for training and inference workloads
- Guide strategic decisions around Oracle Cloud’s AI Infra offerings
- Design and implement scalable orchestration for serving and training AI/ML models, Model Parallelism & Performance across the AI/ML Stack
- Explore and incorporate contemporary research on generative AI, agents, and inference systems into the LLM software stack
- Lead initiatives in Generative AI systems design, including Retrieval-Augmented Generation (RAG) and LLM fine-tuning
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, leveraging Kubernetes, Python/Go, and observability frameworks
Requirements:
- Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, Machine Learning, or a related field (or equivalent experience)
- Experience with Machine Learning and Deep Learning concepts, algorithms and models
- Proficiency with orchestration and containerization tools like Kubernetes, Docker, or similar
- Expertise in modern container networking and storage architecture
- Expertise in orchestrating, running, and optimizing large-scale distributed training/inference workloads
- Have deep understanding of AI/ML workflows, encompassing data processing, model training, and inference pipelines
- Experience with parallel computing frameworks and paradigms
- Strong programming skills and proficiency in major deep learning frameworks