Develop, maintain, and optimize CI/CD processes (especially GitLab CI) for machine learning systems in the cloud
Create tools and processes for monitoring, logging, and analyzing the performance of deployed AI solutions
Implement infrastructure in mixed-cloud environments (primarily GCP, additionally AWS and/or Azure), using microservices, containers (Docker, Kubernetes), and Infrastructure-as-Code with Terraform, Terragrunt, and OpenTofu
Optimize and automate infrastructure scaling using the latest cloud technologies
Collaborate with cross-functional teams to identify and implement MLOps/DevOps requirements
Build and maintain test and production environments on cloud platforms for deploying AI products
Apply cloud-native design patterns, infrastructure best practices, and network concepts to ensure scalable and secure deployments.
Requirements
Strong DevOps background; MLOps experience is a plus
Excellent understanding of software engineering best practices, including modularity, maintainability, and SDLC processes; familiarity with principles like DRY and cloud-native design patterns is highly desirable
Hands-on experience with cloud infrastructures (particularly GCP), CI/CD pipelines, containers, Terraform, and applications using CUDA/GPU/TPU
Experience with AWS, Azure, Terragrunt, OpenTofu, and Kubernetes is an advantage
Strong skills in Python, SQL, TensorFlow, and PyTorch
Knowledge of JavaScript and Go is beneficial
Experience with cloud migrations is an additional plus
Strong stakeholder management skills across clients and functional teams
Excellent documentation and communication skills to make technical concepts understandable for non-technical audiences
Team player, enthusiasm for AI, and a commitment to continuous learning
Fluent German and English language skills.
Tech Stack
AWS
Azure
Cloud
Docker
Google Cloud Platform
JavaScript
Kubernetes
Microservices
Python
PyTorch
SDLC
SQL
Tensorflow
Terraform
Go
Benefits
30 days of annual leave
Days off on Christmas Eve and New Year's Eve
Full coverage of the Germany JobTicket (public transport)
Individual professional development budget per team member for conferences, certifications, or external courses
Opportunities for further training through regular (external) workshops, hackathons, and networking events