Domino Data Lab is a company that builds software to help AI-driven organizations operate advanced data science solutions. The Staff Software Engineer will work on the Model Development Lifecycle Team, focusing on integrating model monitoring, enhancing tagging capabilities, and expanding LLM hosting capabilities to support AI model development and deployment.

Responsibilities:

Integrate model monitoring to provide a holistic view of deployment health and performance
Enhance tagging capabilities across Domino entities to improve discoverability and tracking
Expand LLM hosting capabilities to address customer needs for scale, performance, and logging
Innovate within our Domino Apps offering by incorporating feature requests from major customers
Building Scalable Systems: Hands-on experience developing and managing high-performance back-end systems in distributed computing environments
Collaboration Across Teams: Working closely with cross-functional teams to integrate systems with front-end interfaces and third-party services
API Development: Designing and implementing secure, scalable APIs (e.g., RESTful APIs, gRPC)
Performance Optimization: Profiling and optimizing back-end performance, especially in cloud environments or with container technologies like Docker and Kubernetes
Testing and CI/CD: Using robust testing frameworks (unit, integration, end-to-end) and setting up CI/CD pipelines
ML Model Deployment: Familiarity with model registries, versioning, and lifecycle management tools like MLflow or KubeFlow is a big plus
Distributed Computing: Experience with frameworks like Apache Spark, Azure ML, or SageMaker is a plus
Cloud Platforms: Proficiency with cloud providers (AWS, Azure, GCP) and deploying services in these environments
Back-End Development: Expertise in languages such as Python, Java, Scala, or Go

Requirements:

Hands-on experience developing and managing high-performance back-end systems in distributed computing environments
Working closely with cross-functional teams to integrate systems with front-end interfaces and third-party services
Designing and implementing secure, scalable APIs (e.g., RESTful APIs, gRPC)
Profiling and optimizing back-end performance, especially in cloud environments or with container technologies like Docker and Kubernetes
Using robust testing frameworks (unit, integration, end-to-end) and setting up CI/CD pipelines
Familiarity with model registries, versioning, and lifecycle management tools like MLflow or KubeFlow
Experience with frameworks like Apache Spark, Azure ML, or SageMaker
Proficiency with cloud providers (AWS, Azure, GCP) and deploying services in these environments
Expertise in languages such as Python, Java, Scala, or Go

Staff Software Engineer, MDLC

Key skills

About this role

Responsibilities:

Requirements: