Design, build and operate GPU-centric AI infrastructure (primarily NVIDIA) in on-premise and cloud environments, with a strong focus on performance, scalability and efficiency.
Responsible for the architecture and operation of high-performance compute environments for distributed training and optimized model execution.
Optimize compute, memory, and high-performance networking (e.g., InfiniBand, NCCL) to enable large-scale AI workloads in industrial contexts.
Develop and operate infrastructure components such as scheduling and resource management systems (e.g., SLURM, Ray, Run:ai) to ensure efficient utilization of shared GPU resources.
Create and maintain automated, reproducible infrastructure using modern tools (e.g., Docker, Kubernetes, Terraform, Ansible, CI/CD).
Contribute to BMW-specific AI use cases by providing reliable and scalable infrastructure.
Technical ownership of the AI infrastructure stack, defining best practices and mentoring less experienced engineers.
Requirements
University degree in Computer Science, Computer/Electrical Engineering, or a related field
Several years of professional experience (8–10 years) in industry building and operating AI and HPC infrastructures
Solid hands-on experience with GPU systems (particularly NVIDIA), including drivers, CUDA, and performance optimization
Experience with distributed systems and high-performance networking (e.g., InfiniBand, NCCL), as well as cloud environments (AWS, Azure) in addition to on-premise infrastructure
Practical experience with resource scheduling and workload orchestration (e.g., SLURM, Ray, NVIDIA Run:ai)
Extensive experience in infrastructure automation (e.g., Docker, Kubernetes, Terraform, Ansible, CI/CD) and proficiency in Python for infrastructure and system tooling
Experience with training, fine-tuning, or deploying ML models in production, as well as exposure to industrial AI use cases (e.g., simulation, robotics, engineering), is a plus
Tech Stack
Ansible
AWS
Azure
Cloud
Docker
Kubernetes
Python
Ray
Terraform
Benefits
Challenging projects
Wide range of personal and professional development opportunities
Attractive, fair, and performance-based compensation
High job security
Annual special payments such as holiday pay, Christmas bonus, and profit-sharing
Flexible working hours, including six weeks of annual leave and overtime compensation