Oracle is a leading company in AI and cloud solutions, dedicated to empowering innovation. They are seeking a Principal Software Engineer to develop and maintain software toolkits for applied scientists, design cloud-based services, and collaborate with teams to enhance model experimentation and deployment.
Responsibilities:
- Develop and maintain robust software toolkits in Python and Java to support applied scientists in building, testing, and deploying machine learning models and agents
- Design, implement, and optimize cloud-based services for running applied science models, with an emphasis on scalability, reliability, and security in Oracle Cloud Infrastructure (OCI)
- Collaborate closely with scientists and engineers to deliver user-friendly APIs, libraries, and documentation enabling effective model experimentation and deployment
- Build and support asynchronous communication patterns (user-agent, agent-agent and multimodal) using message queues and data streaming systems
- Use and extend containerization practices with Docker; deploy and orchestrate services via Kubernetes
- Produce well-structured sample code and reference implementations—including basic integration with LLM APIs—demonstrating toolkit best practices
- Apply strong knowledge of algorithms, data structures, concurrent programming, and distributed systems fundamentals (including asyncio and threading) in the development of performant and maintainable software
- Incorporate feedback, write comprehensive documentation, and contribute to code reviews to continuously improve quality and usability
- Monitor and instrument solutions for performance, debugging, and reliability in production environments
- Stay current with the latest software engineering and AI toolchain practices, advocating for adoption where appropriate
Requirements:
- 8-12 years of relevant software development experience, with a focus on backend and AI first applications
- BS/MS in Computer Science or a related field, or equivalent practical experience
- Proficiency in both Python and Java, with experience developing and maintaining production software in both languages
- Solid foundations in software engineering—especially concurrent and distributed systems, data structures, and algorithms
- Professional experience with asynchronous communications (e.g., message queues, pub/sub, data streaming platforms such as Kafka or OCI Streaming)
- Hands-on experience with Docker and deploying containerized applications in Kubernetes environments (strongly preferred)
- Experience developing enabling tools, frameworks, or APIs for applied scientists, data scientists, or machine learning practitioners (highly desirable)
- Working knowledge of AI/LLM APIs and best practices, with the ability to create sample and reference code for scientific users
- Familiarity with Oracle Cloud Infrastructure, or other cloud platforms, with a willingness to specialize in OCI
- Strong communication skills; able to collaborate in a distributed and asynchronous team environment
- Track record of documentation, mentorship, or technical leadership is a plus