Design, implement, and train discrete and continuous diffusion models for predicting biomolecular structure tokens
Develop and iterate on structure tokenizers, including vector-quantized representations of 3D molecular and protein structure
Build and maintain data processing pipelines for large-scale biomolecular structure datasets
Train models on multi-GPU clusters, managing large-scale training runs
Develop rigorous benchmarking and evaluation workflows; validate against external benchmarks while prioritizing internal discovery-relevant metrics
Collaborate with ML scientists, computational chemists, and drug discovery teams to integrate models into discovery workflows
Communicate results to internal teams, external partners, and at scientific conferences
Mentor interns and junior team members through code reviews, technical guidance, and best practices (Senior level)
Requirements
PhD in machine learning, computer science, computational chemistry, physics, or related computational STEM field, or equivalent industry experience demonstrating comparable depth
Strong Python and PyTorch skills, including end-to-end implementation and training of deep learning models
Demonstrated experience in one or more of the following:
3D atomistic or molecular modeling
Vector quantization and learned discrete representations
Diffusion, flow-matching, or related generative modeling in continuous vector spaces