Domino Data Lab builds software that empowers organizations to operate advanced data science and AI solutions. The Staff/Principal Software Engineer will lead the development of platform capabilities for regulated AI/ML-driven scientific workflows, focusing on creating scalable, reproducible, and compliant systems.
Responsibilities:
- Shape extensible platform interfaces that unlock new regulated workflows: Architect durable APIs, event models, and integration surfaces that enable internal teams, partners, and the world’s largest pharmaceutical organizations to compose entirely new classes of fast, reproducible, inspection-ready scientific workflows, not just integrate existing tools
- Redefine reproducibility and system of record as platform primitives: Lead the re-design of the information architecture to better support execution, lineage, and auditability so reproducibility and inspection-readiness are built into how scientific work runs across cloud, hybrid, HPC, and external data sources. Set the foundation for a platform where every result is traceable, every execution is reviewable, and compliance is intrinsic rather than bolted on
- Scale compute for the next generation of scientific workloads: Drive architectural and performance improvements across Domino Workspaces and execution environments to support larger datasets, higher concurrency, and compute-intensive workloads, establishing patterns that make hybrid and HPC-backed execution first-class and reliable at scale
Requirements:
- Deep backend and systems expertise in regulated environments: A track record of designing, building, and operating scalable backend platforms for regulated scientific or clinical workflows, such as statistical compute, QC pipelines, or inspection-ready execution
- Architectural leadership and technical judgment: Strong systems thinking and hands-on experience designing robust backend services and APIs that balance usability, scalability, performance, and regulatory rigor across cloud-native and containerized environments. Experience with distributed or HPC-backed compute is a meaningful plus
- Ability to lead through influence and collaboration: Comfort setting technical direction, leading cross-team design discussions, and aligning engineers, product partners, and domain experts (e.g., statisticians, reviewers, operators) around a shared architectural vision
- Fluency in turning regulatory complexity into durable solutions: Proven ability to navigate ambiguity in compliance-driven environments and translate regulatory, QC, and inspection requirements into practical, well-designed platform capabilities
- Strong command of modern backend languages and platforms: Proficiency in one or more backend languages such as Java, Scala, Go, or Python, with experience building long-lived, service-oriented systems using REST and/or gRPC, backed by solid API design practices
- Experience operating production systems in the cloud: Hands-on experience with major cloud platforms (AWS, Azure, or GCP), containerization and orchestration (Docker, Kubernetes), and modern observability and DevOps practices to ensure reliability at scale