Autodesk is transforming the Architecture, Engineering, and Construction (AEC) industry by embedding advanced AI and foundation models into cloud-native platforms. As a Senior Principal Machine Learning Engineer, you will act as a technical leader for complex ML initiatives and drive the development of large-scale training pipelines while collaborating with cross-functional teams.
Responsibilities:
- Define the long-term technical vision for Generative AI and Foundation Model infrastructure within the AEC Solutions team
- Influence architectural decisions across the broader organization
- Lead the design, development, and delivery of complex ML systems
- Own the full lifecycle from model architecture selection and data strategy to distributed training and production deployment
- Drive the development of large-scale training pipelines
- Collaborate with Research Scientists to translate experimental ideas (custom architectures, novel loss functions) into scalable, performant code
- Architect solutions for distributed training (e.g., FSDP, Megatron-LM, DeepSpeed) on massive compute clusters
- Identify and resolve bottlenecks in data processing and model parallelism to maximize training throughput
- Mentor Principal and Senior engineers, fostering a culture of technical ownership, rigorous experimentation, and best practices
- Act as a technical partner to Product Management and Engineering leadership
- Partner effectively with Data Engineering, Platform, and Research teams to integrate large-scale multimodal AEC data (3D geometry, images, text) into model development workflows
- Establish standards for model evaluation, versioning, monitoring, and MLOps best practices to ensure reproducibility and reliability in a high-stakes production environment
Requirements:
- Master's or PhD in a field related to AI/ML such as Computer Science, Mathematics, Statistics, Physics, Computational Linguistics, or related disciplines
- 10+ years of experience in machine learning, AI, or related fields, with a proven track record of technical leadership and hands-on implementation
- Demonstrated experience mentoring engineers and leading technical projects in cross-functional environments
- Proven history of leading the delivery of large-scale ML systems from conception to production
- Expert-level understanding of deep learning architectures (Transformers, Diffusion models) and modern frameworks (PyTorch is required)
- Hands-on experience with distributed training frameworks and techniques (e.g., PyTorch Distributed, Ray, DeepSpeed, Megatron, CUDA optimization) in HPC or cloud environments (AWS/Azure)
- Strong proficiency in Python, with an emphasis on performance profiling, debugging, and writing robust, maintainable production code
- Excellent ability to translate complex technical concepts into clear insights for executive leadership and cross-functional partners
- Experience with large foundation model training in distributed compute environments
- Experience designing data pipelines for multimodal datasets at the terabyte/petabyte scale (using Spark, Iceberg, etc.)
- Experience constructing internal developer platforms for ML, utilizing tools like Kubernetes, Slurm, or Metaflow
- A portfolio demonstrating the successful translation of academic research papers into tangible product features
- Background in AEC, computational geometry, or experience working with 3D data representations (BIM, CAD, meshes, point clouds)