Databricks is a data and AI company that provides a platform for organizations to unify and democratize data, analytics, and AI. As a Staff Software Engineer in AI Research Infrastructure, you will develop and run the research stack for AI models, focusing on building services that optimize large-scale training and inference workloads.
Responsibilities:
- Design and implement infrastructure that supports large‑scale experiments, data processing, and model training (e.g., HPC clusters, GPU fleets, or cloud‑based systems)
- Enable researchers to go from idea to large‑scale experiment in minutes, not days, by building powerful abstractions for job submission, scheduling, and monitoring
- Create tooling that improves research developer productivity, such as experiment management systems, CI/testing infrastructure for research code, and workflows that reduce iteration time
- Influence the long‑term roadmap for research computation, shaping how Databricks AI Research train, evaluate, and ship models to customers
- Serve as a technical mentor and force multiplier for other engineers working on compute, infra, and AI systems