Define the long-term technical vision for Generative AI and Foundation Model infrastructure within the AEC Solutions team
Influence architectural decisions across the broader organization
Lead the design, development, and delivery of complex ML systems
Own the full lifecycle from model architecture selection and data strategy to distributed training and production deployment
Drive the development of large-scale training pipelines
Collaborate with Research Scientists to translate experimental ideas (custom architectures, novel loss functions) into scalable, performant code
Architect solutions for distributed training (e.g., FSDP, Megatron-LM, DeepSpeed) on massive compute clusters
Identify and resolve bottlenecks in data processing and model parallelism to maximize training throughput
Mentor Principal and Senior engineers, fostering a culture of technical ownership, rigorous experimentation, and best practices
Act as a technical partner to Product Management and Engineering leadership
Partner effectively with Data Engineering, Platform, and Research teams to integrate large-scale multimodal AEC data (3D geometry, images, text) into model development workflows
Establish standards for model evaluation, versioning, monitoring, and MLOps best practices to ensure reproducibility and reliability in a high-stakes production environment.
Requirements
Master’s or PhD in a field related to AI/ML such as Computer Science, Mathematics, Statistics, Physics, Computational Linguistics, or related disciplines
10+ years of experience in machine learning, AI, or related fields, with a proven track record of technical leadership and hands-on implementation
Demonstrated experience mentoring engineers and leading technical projects in cross-functional environments
Proven history of leading the delivery of large-scale ML systems from conception to production
Expert-level understanding of deep learning architectures (Transformers, Diffusion models) and modern frameworks (PyTorch is required)
Hands-on experience with distributed training frameworks and techniques (e.g., PyTorch Distributed, Ray, DeepSpeed, Megatron, CUDA optimization) in HPC or cloud environments (AWS/Azure)
Strong proficiency in Python, with an emphasis on performance profiling, debugging, and writing robust, maintainable production code
Excellent ability to translate complex technical concepts into clear insights for executive leadership and cross-functional partners.