CloudKubernetesAIMachine LearningMLLarge Language ModelsHugging Face
About this role
Role Overview
Design, implement, and deploy advanced AI capabilities within the OneAI platform.
Shape the end-user experience by designing intuitive workflows for model management, deployment configuration, and job operation.
Streamline the model lifecycle by integrating public repositories (e.g., Hugging Face) for seamless discovery, import, versioning, and deployment.
Bridge the gap between systems engineering and product design to ensure a seamless transition from backend infrastructure to user features.
Integrate cutting-edge AI frameworks and engines, such as vLLM, NVIDIA Dynamo and Unsloth, into a secure and scalable environment.
Leverage OpenNebula to orchestrate high-performance inference and training workloads across diverse cloud and edge environments.
Develop and maintain reliable APIs for compute provisioning and workload scheduling.
Implement GPU-aware operations to ensure optimal resource allocation and hardware utilization.
Build comprehensive observability suites to monitor and track critical metrics, including latency, throughput, utilization, and failure rates.
Establish and refine deployment and workflow strategies to ensure AI workloads remain efficient and stable at scale.
Optimize system architecture to balance high performance with cost efficiency.
Research and integrate emerging AI tools and engines to keep the OneAI platform at the forefront of the industry.
Analyze performance bottlenecks to iterate on the efficiency of both training and inference processes.
Requirements
Bachelor’s or Master’s degree in Computer Science, Information Technology, or Engineering.
3+ years of experience in applied AI, machine learning, or software engineering, with hands-on delivery of AI/ML solutions in production environments
Demonstrated experience designing and deploying high-performance AI infrastructure, specifically focusing on the scalability and reliability of inference and training workloads.
Proven track record of deploying Large Language Models (LLMs) at scale, with deep knowledge of serving engines (e.g., vLLM) and fine-tuning tools (e.g., Unsloth).
Experience building AI-centric platforms or toolchains that manage the model lifecycle (versioning, deployment, and discovery).
Experience with GPU orchestration and optimizing workloads for cloud, distributed or large-scale environments and collaborating with platform or infrastructure teams.
Hands-on experience with high-throughput inference engines (e.g., vLLM) and fine-tuning tools (e.g., Unsloth)
Proficiency in integrating with the Hugging Face ecosystem (Transformers, Hub, Datasets) for model and data management.
Experience implementing monitoring tools to track system-level AI metrics such as token throughput, latency, GPU utilization, and failure rates.
Experience designing and implementing scalable, reliable APIs for compute provisioning and workload scheduling.
Experience working with cloud platforms and containerized environments (e.g., OpenNebula, Kubernetes)
Advanced English level (B2 or higher) is required.
Tech Stack
Cloud
Kubernetes
Benefits
Competitive compensation package and flexible remuneration: Meals, Transport, Nursery/Childcare
Customized workstation (macOS, Windows, Linux)
Private health insurance
Paid time off: Holidays, Personal Time, Sick Time, Parental leave
Afternoon-off working day every friday and during summer
Remote company with bright HQ centrally located in Madrid; offices in Boston (USA), Brussels (Belgium) and Brno (Czech Republic); and access to office space near your location when needed.
Healthy work-life balance: We encourage the right for Digital Disconnecting and promote harmony between employees personal and professional lives
Flexible hiring options: Full Time/Part Time, Employee (Spain/USA) / Contractor (other locations)