Design, build, and operate robust, scalable, and secure data pipelines and infrastructure, meeting defined performance, availability, and governance standards across the data stack
Take hands-on ownership of data pipeline development, including batch and streaming workloads, from ingestion through to consumption
Ensure consistently high standards of data quality, reliability, and availability across all data platforms
Enable analytics and AI use cases through well-designed, well-governed data foundations and reusable data patterns
Partner closely with Analytics Engineers and Data Analysts to support the creation of reliable, business-aligned datasets and models
Collaborate with Product, Engineering, and Architecture teams to ensure data solutions align with platform strategy and product needs
Contribute to the evolution of the company’s data architecture in collaboration with the Architecture Review Board, ensuring pipelines and data solutions align with established platform patterns, with a strong AWS-first approach
Maintain and support existing data pipelines, including troubleshooting, bug fixing, and incremental improvements to ensure reliability and performance
Ensure data operations comply with privacy, security, and regulatory requirements, embedding governance and access controls into data pipelines
Monitor, maintain, and continuously improve data pipelines, infrastructure, and platform performance, including observability and alerting
Maintain strong collaboration with Product, Engineering, and Domain Leadership (EHS, ESG, Chemical Safety), contributing to quarterly reporting and ensuring data initiatives remain aligned with company objectives
Requirements
Typically 6+ years of experience in Data Engineering or similar roles
Strong hands-on delivery in cloud-based environments
Experience in SaaS, Health & Safety, ESG, or Chemical Safety domains is a plus
Experience working in Agile environments with DevOps, CI/CD pipelines
Strong delivery mindset with the ability to take data initiatives from design through production
Advanced proficiency in Python and SQL
Strong experience designing and operating ETL / ELT data pipelines at scale
Hands-on experience with AWS-based data platforms, including: S3, Glue, Redshift, Athena, Kinesis
Strong experience implementing Change Data Capture (CDC) patterns using Kafka and/or AWS Kinesis
Strong experience working with large-scale structured and unstructured data
Experience building and operating streaming and event-driven data pipelines
Experience embedding security, access control, and compliance into data platforms (e.g. GDPR, enterprise data security best practices)
Proven ability to design scalable and efficient data architectures, optimising pipelines and storage
Experience working with LLMs and AI-enabled data pipelines, including preparing, governing, and serving data for LLM, RAG, and agentic workflows
Familiarity with agentic development patterns and event-driven architectures that support AI-driven automation and decision-making
Working knowledge of BI and analytics tools such as QuickSight (preferred), Power BI, Tableau, or Looker
Experience with distributed systems and modern data platforms (e.g. Spark, Databricks)