Ollion is an innovative company focused on transforming organizations through cloud technology. They are seeking a Specialist Engineer - Data to design and optimize scalable data pipelines and solutions, enabling enterprise organizations to derive value from their data. The role involves collaboration with data scientists and stakeholders to implement modern data engineering practices while ensuring performance and security across cloud platforms.
Responsibilities:
- Design, build, and optimize robust and scalable ETL/ELT data pipelines to ingest and process large volumes of data using Azure and GCP services (e.g., Azure Data Factory, Azure Synapse/Databricks, GCP Dataflow, Cloud Data Fusion, or Cloud Functions)
- Develop and maintain optimized data models (e.g., dimensional, vault) within multi-cloud data warehouse solutions (e.g., Google BigQuery or Azure Synapse Analytics) or data lakes to support BI, reporting, and analytical workloads. This includes ensuring data structures are optimized for consumption by tools like Power BI and Looker
- Monitor, troubleshoot, and optimize the performance of data warehouse queries and compute resources (e.g., BigQuery slots, Azure Synapse SQL pools, or Databricks/Dataproc clusters) to ensure cost-efficiency and fast data retrieval
- Collaborate with Data Scientists to design and implement feature stores and pipelines to prepare and serve data for ML model training and inference
- Develop and maintain pipelines for transforming unstructured data (text, documents) into embeddings and loading them into vector databases (e.g., Azure Cosmos DB, GCP Vertex AI Vector Search, or dedicated vector stores) to support RAG solutions
- Implement workflows (e.g., using Apache Airflow, Google Cloud Composer, or Azure Data Factory/Logic Apps) to automate the end-to-end data lifecycle for AI/ML processes, including data refresh and model retraining
- Work with Data Science teams to containerize, deploy, and manage machine learning models in production environments (e.g., using GCP Vertex AI, Azure Machine Learning, or AKS/GKE)
- Implement robust monitoring and logging solutions for production ML pipelines and models to track performance, data drift, and model decay using Azure Monitor or GCP Cloud Monitoring
- Integrate model training, testing, and deployment into CI/CD pipelines to ensure rapid, reliable, and automated updates to production ML services
- Implement and manage security best practices across Azure, GCP, and Snowflake, including access controls, role-based security (RBAC), IAM policies, and data encryption
- Write complex, efficient SQL queries and develop scripts in Python (or other relevant languages like Scala/Java) for data manipulation, process automation, and pipeline orchestration
- Work closely with data analysts, data scientists, and business stakeholders to understand data requirements and deliver high-quality, actionable data solutions, including setting up data sources and datasets for BI tools like Power BI and Looker
- Create and maintain technical documentation for data models, data flows, and ETL/ELT processes
Requirements:
- hands-on experience designing and operating data solutions using major Azure services (Blob Storage, Data Factory, Databricks, Synapse) and/or GCP services (Cloud Storage, BigQuery, Dataflow, Pub/Sub, IAM)
- Expert in SQL and modern ELT methodologies using tools like dbt (Data Build Tool) for version-controlled, production-grade data modeling within a modern cloud data warehouse (e.g., BigQuery, Azure Synapse, or Snowflake)
- Experience with BI tools, specifically the configuration and optimization of data for use in Power BI or Looker
- Advanced proficiency in Python for complex data transformation, API integrations, and automation scripting
- Proven experience working with Data Scientists to build data and feature pipelines for ML
- Familiarity with ML lifecycle tools and frameworks (e.g., GCP Vertex AI, Azure Machine Learning, Kubeflow, MLflow)
- Understanding of machine learning model deployment, serving, monitoring, and versioning best practices
- Expertise in applying security controls, including encryption, data masking, and implementing role-based access control (RBAC) models in Azure and GCP data services
- Familiarity with infrastructure as code (IaC) practices using Terraform (preferred for multi-cloud) or cloud-native tooling (Azure Bicep / GCP Deployment Manager) and experience with version control, CI/CD pipelines, and automation tools for cloud data services
- Bachelor's or Master's degree in Computer Science, Engineering, or a quantitative field
- 4+ years of professional experience in a Data Engineering, Software Engineering, or Data Architecture role
- Must hold professional-level data/AI certifications in at least one of the major platforms, such as: Azure: Microsoft Certified: Azure Data Engineer Associate, or Azure AI Engineer Associate; GCP: Google Cloud Certified Professional Data Engineer, or Professional Machine Learning Engineer