Role Overview
What Your Day Might Look Like:
- Make AI production-ready: Design and maintain the infrastructure that takes ML models from experimentation to reliable, scalable deployment.
- Build automated ML pipelines: Create repeatable workflows for training, evaluation, deployment, and retraining — with versioning and reproducibility built in.
- Deploy and serve models: Package models as production services across cloud, on-premise, or hybrid environments, with performance and reliability in mind.
- Monitor what matters: Track model performance, data drift, system health, and production signals to support better retraining and troubleshooting decisions.
- Enable AI teams: Work closely with Data Scientists, AI Engineers, and Software Engineers to improve how models are tested, deployed, and maintained.
- Set the standard: Contribute to best practices around CI/CD, model registry, observability, security, and governance.
- Support GenAI at scale: Help deploy and optimize LLM-based systems, including inference services, GPU usage, and RAG infrastructure where needed.
- Keep systems secure and reliable: Ensure ML deployments follow strong practices around access control, data governance, and operational resilience.
Requirements
Your Superpowers🚀:
- BSc or MSc in Computer Science, Software Engineering, or a related STEM field.
- 5+ years of experience in MLOps, DevOps, platform engineering, or ML engineering, with exposure to ML systems in production.
- Strong Python skills and good software engineering fundamentals.
- Hands-on experience with ML lifecycle tools such as MLflow, Kubeflow, SageMaker, Vertex AI, Azure ML, or similar.
- Experience deploying models using tools like BentoML, TorchServe, Triton Inference Server, or equivalent serving frameworks.
- Strong experience with Docker, Kubernetes, CI/CD, and production-grade deployment workflows.
- Comfortable working across cloud environments — AWS, Azure, GCP, or hybrid setups.
- Experience with monitoring and observability tools such as Prometheus, Grafana, or similar.
- Understanding of model performance, drift, retraining, reproducibility, and production reliability.
- Strong collaboration skills — able to work across Data Science, Engineering, and client-facing teams.
Bonus Points for:
- Experience with Terraform, Pulumi, or infrastructure-as-code practices.
- Experience with feature stores such as Feast or Tecton.
- Familiarity with data and model versioning tools such as DVC or Delta Lake.
- Experience with Kafka or event-driven ML workflows.
- Hands-on experience serving LLMs in production using vLLM, TGI, Triton, or similar.
- Familiarity with model optimization techniques such as quantization or GPU memory tuning.
- Experience operating RAG infrastructure, vector databases, and embedding pipelines.
- Exposure to LLM evaluation and observability tools such as LangSmith, RAGAS, or custom evaluation frameworks.
Tech Stack
- AWS
- Azure
- Cloud
- Docker
- Google Cloud Platform
- Grafana
- Kafka
- Kubernetes
- Prometheus
- Python
- Terraform
Benefits
Perks on Perks:
- Competitive salary and hybrid work model – come hang out in our Athens office or work remotely from anywhere in European economic Area (EU, Switzerland etc.) or UK (up to 6 weeks per year).
- Training budget to level up your skills from the top tech partners in the market (Microsoft, AWS, Salesforce, Databricks etc.) – whether it’s certifications or courses, we’ve got you covered.
- Private insurance, top-tier tech gear, and the chance to work with a stellar crew.