Domino Data Lab is a company that builds software for AI-driven organizations to develop and operate data science solutions. The Staff Software Engineer will focus on building and enhancing platform features for machine learning workflows, as well as expanding the platform's infrastructure for large language models.
Responsibilities:
- Build and enhance platform features that enable teams to design, test, and deploy multi-agent workflows at scale
- Enhance Domino’s Extensions framework for enabling customers to build custom modules that extend platform feature and function
- Expand the platform's inference infrastructure to support high-throughput, low-latency serving of large language models, helping customers confidently operationalize LLM applications at enterprise scale
Requirements:
- Hands-on experience developing and managing high-performance back-end systems in distributed computing environments
- Working closely with cross-functional teams to integrate systems with front-end interfaces and third-party services
- Designing and implementing secure, scalable APIs (e.g., RESTful APIs, gRPC)
- Profiling and optimizing back-end performance, especially in cloud environments or with container technologies like Docker and Kubernetes
- Using robust testing frameworks (unit, integration, end-to-end) and setting up CI/CD pipelines
- Familiarity with traditional machine learning model development and AI workflows, including experiment tracking, hyperparameter optimization, model evaluation frameworks, and managing model artifacts
- Proficiency with cloud providers (AWS, Azure, GCP) and deploying services in these environments
- Expertise in languages such as Python, Java, Scala, or Go
- Experience with frameworks like Apache Spark, Azure ML, or SageMaker is a plus