DigitalOcean is a cutting-edge technology company focused on simplifying cloud and AI for builders. The Principal Software Engineer will drive the design and operation of the Gradient AI platform, ensuring an innovative agent development experience while providing technical leadership and collaboration across teams.
Responsibilities:
- Drive the design and operation of Gradient AI platform, focusing on delivering a simple and innovative agent development experience with best of breed scale, performance, and predictability
- Drive architectural vision, technical excellence, and innovation across both backend systems and customer-facing interactions
- Design and evolve the architecture for our agent development experience including code integration, evaluations, observability, tools, and cross-agent interactions
- Drive initiatives to deliver an architecture optimized for scalability, reliability, low-latency, and cost efficiency
- Manage and evolve our benchmarking system to continuously raise the bar on our experience
- Roll out new services by taking on a hands-on lead role as required to ensure timely delivery
- Establish and enforce technical standards, coding practices, tooling, and infrastructure guidelines across the AI/ML engineering teams
- Establish best practices for design, testing, deployments, instrumentation, and performance tuning
- Mentor other senior engineers, shaping the team’s culture of architectural rigor and operational excellence
- Work with product managers, stakeholders, and business leaders to translate strategic objectives into scalable technical roadmaps
- Guide customer-facing teams (e.g., consultants, support, sales engineers) to shape AI modernization initiatives via agents
- Lead Operations Excellence for our Agent development platform, establish mechanisms and processes that scale to the engineering organization while raising the bar
- Oversee availability, performance tuning, failover strategies, capacity planning, and disaster recovery
- Drive AI-driven automation (e.g., IaC, CI/CD pipelines, deployments, monitoring) to optimize operations
- Drive development of internal tooling leveraging agents across the company, contributing towards increased efficiency and quality across all aspects of engineering
- Serve as subject matter expert on new agent development paradigms and lead implementation of mechanisms to productize them
Requirements:
- Hands-on experience designing and operating production-grade AI/ML platforms using the latest GenAI and Agent-development technologies
- 10+ years in designing and building applications on the cloud with 5+ years experience in AI/ML platforms
- Prior experience as a technical visionary in large-scale, mission-critical projects; ability to align technology strategy with business impact
- Expertise in driving operational excellence via automation and best practices
- Strong written and verbal communication skills with a track record of mentoring senior and junior engineers; translating complex concepts across engineering and business teams