As a hands-on team lead, you would operate along the following dimensions:
Build and scale a high-performing team capable of tackling complex distributed ML challenges
Own the full employee lifecycle: recruiting, onboarding, performance management, career development, and retention
Empower your team members and help them grow in autonomy and technical expertise
Mentor engineers at all levels, fostering a culture of continuous learning and psychological safety
Create an inclusive environment where diverse perspectives drive innovation
Define and execute technical roadmaps aligned with company objectives and product needs
Lead resource allocation and capacity planning to balance team workload and business priorities
Own FinOps responsibilities: optimize cloud costs, track spending, and ensure efficient resource utilization
Ensure operational readiness through monitoring, incident response protocols, and system reliability practices
Establish and track KPIs for team performance, system efficiency and health
Design, develop, and maintain robust large-scale distributed training pipelines and ML infrastructure using cutting-edge technologies
Lead architecture decisions for distributed systems that enable efficient model development at scale
Hands-on contribution to critical technical challenges, including optimization of training pipelines and infrastructure
Drive technical excellence through code reviews and architectural guidance
Stay at the forefront of distributed training technologies and bring innovation to the team
Partner closely with Product teams to translate business requirements into technical solutions
Collaborate with (senior) Research Scientists to enable scalable model development and experimentation
Work with Platform Engineering to ensure robust infrastructure and tooling
Build strong relationships across engineering teams to drive alignment and knowledge sharing
Communicate technical concepts effectively to both technical and non-technical stakeholders
Requirements
Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field.
6+ years of software engineering or ML engineering experience, with at least 2 years in a technical leadership or team lead role
Proven track record of building and leading high-performing engineering teams.
Experience guiding projects across the whole Software Development Life Cycle, from requirements through design to implementation, deployment and maintenance.
Deep understanding of fundamental Machine Learning concepts and principles, familiarity with advanced model optimization techniques (such as distillation, graph optimization, quantization etc.)
Significant experience with large-scale distributed training systems and frameworks (especially PyTorch and NCCL).
Familiarity with GPUs, distributed systems, parallel computing and scaling laws.
Advanced programming skills in Python, experience in performance-critical languages (C/C++ or CUDA) being a plus
Familiarity of MLOps/DevOps best practices including CI/CD, Docker, Kubernetes, and observability, cloud platforms (GCP, AWS or Azure) and infrastructure-as-code
Experience with Linux, version control, and container technologies
Demonstrated ability in resource allocation, capacity planning, and FinOps principles
Excellent problem-solving and data-driven decision-making skills in ambiguous situations
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Docker
Google Cloud Platform
Kubernetes
Linux
Python
PyTorch
SDLC
Benefits
Cutting-edge AI research and development, with involvement of Charité, TU Berlin and our other partners
Opportunity to shape the technical direction and grow into broader leadership roles
Expand your skills by benefitting from our Learning & Development yearly budget of 1,000€ (plus 2 L&D days), language classes, and internal development programs
Access to leadership development programs and executive coaching
Flexible working hours and teleworking policy
Enjoy your well-deserved time off within our 30 paid vacation days per year
Family & pet friendly and support flexible parental leave options
Pick a subsidized membership of your choice among public transport, sports, and well-being
Enjoy our social gatherings, lunches, and off-site events for a fun and inclusive work environment