Lead Oritain’s data transformation as Principal Data Engineer: build a robust, scalable platform powering scientific insights, business intelligence, and global supply chain integrity.
Define and own the technical strategy and architecture for our entire data platform covering ingestion, storage, processing, governance, and consumption.
Design and implement highly scalable, performant, and reliable ETL/ELT data pipelines to handle diverse data sources, including complex scientific datasets and supply chain inputs alongside business information.
Evaluate, recommend, and drive the adoption of new data services and modern data tools to ensure we have a future-proof data ecosystem.
Lead the design of canonical data models for our data warehouse and operational data stores.
Serve as the most senior, hands-on developer, writing high-quality, production-grade code (primarily Python and/or Scala/Spark) to build initial pipelines and core data services.
Architect data security and governance policies, ensuring compliance and best practices around data access, masking, and retention.
Implement robust monitoring, logging, and alerting for all data pipelines and infrastructure, ensuring high data reliability and performance.
Partner closely with the Science teams to understand the structure, complexity, and requirements of raw scientific data, ensuring accurate data translation and ingestion.
Provide technical guidance and mentorship to software engineers on best practices for interacting with and consuming data services.
Requirements
Extensive experience (typically 7+ years) focused on data engineering, including significant time spent in a Principal, Lead, or Architect role defining data strategy from the ground up.
Deep, practical, and architectural experience of the Databricks platform.
Operational experience of building and running within the Microsoft Azure data ecosystem (e.g., Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure SQL/Cosmos DB).
Expert-level proficiency in Python (or Scala) and SQL, with a strong focus on writing clean, tested, and highly performant data processing code.
Proven track record designing and implementing scalable data warehouses/data marts for analytical and operational use cases.
Strong experience with workflow orchestration tools and implementing CI/CD for data pipelines.
Familiarity with Infrastructure as Code (Terraform) and containerisation.
Experience processing scientific, geospatial, or time-series data.
Experience in the governance or compliance sector where data integrity is paramount.
Familiarity with streaming data technologies
Tech Stack
Azure
ETL
Python
Scala
Spark
SQL
Terraform
Benefits
Paid Leave
35 days (inclusive of public holidays)
Birthday Off
Volunteering Leave Allowance
Enhanced Parental Leave
Life Insurance
Healthcare Cash Plan
Employee Assistance Programme (EAP)
Pension
Monthly Wellbeing Allowance
Breakfast, Snacks, Friday lunch & Barista Coffee Machine in the office!
Learning Portal with over 100,000 assets available to support professional development