Cambridge Spark is an AI and data science apprenticeship provider, working with clients including the BBC, NHS, Virgin Atlantic, and GSK. The role involves taking ownership of the data infrastructure, shaping the data landscape, and working closely with product engineers and business stakeholders to ensure effective data usage and governance.

Responsibilities:

Own the data estate. Our stack is Postgres, GCP, BigQuery, and Looker. You keep things running reliably today, but you are also the person who decides and drives any changes needed to our architecture
Set the standard for how we work with data. You will establish the patterns, conventions, and governance practices that others build on. That includes how data is classified, how access is controlled, and how requests from the business are evaluated. Our policies are in place, and we need to implement them
Work with product engineers as a peer. Schema decisions in the application database have downstream consequences for everything else. You engage with those conversations early, have opinions about what belongs where, and help build data quality into the source
Change how the business thinks about data. When someone says "I want to see this dataset," your job is not to grant or refuse the request. It is to understand what they are actually trying to solve and find the right path to insight. Over time, you shift the culture: fewer "just add a field" requests, more conversations that start with the business problem
Hold the line on data governance. Cambridge Spark handles sensitive learner and employer data in a regulated environment. You treat compliant data handling as a point of professional pride, enforce least-privilege access as a matter of course, and push back constructively when a proposed approach would expose more than it should
Balance the immediate and the strategic. You can triage what is urgent, make pragmatic fixes, and still keep the longer-term design coherent. Neither horizon gets sacrificed for the other

Requirements:

Postgres: schema design, query optimisation, indexing, and migrations in production
GCP: hands-on experience with core services: Cloud SQL, Cloud Storage, IAM, networking
BigQuery: data warehousing and mirroring pipelines; cost management and performance tuning
Looker or equivalent BI tooling: able to own a dashboard estate end-to-end, including coaching others to use it well
Data modelling: strong instinct for when to aggregate, when to denormalise, and when not to expose raw data
Data governance: classification frameworks, access controls, audit logging, data lineage; in practice, not just in theory
Judgment, not just experience. You know when the right answer is 'fix the pipeline' and when it is 'the pipeline is fine, the data model is wrong.' You push back on the path of least resistance when it matters
A product-led style. You're not purely analytics or BI, but close enough to software engineers to understand how application decisions affect data quality downstream, and confident enough to influence those decisions
Working directly with business stakeholders. You help people with questions get to better ones
Principled attitude to sensitive data. You will say 'we should not be exposing this' when it's right, and you hold that line while still being constructive and helpful to the business
Ownership without prompting. When something is wrong, you notice, understand what it means, and do something about it. You leave things better than you found them
Familiarity with UK GDPR and DPA 2018, particularly in a regulated or learner-facing context
Experience with dbt, Airflow, or comparable pipeline tooling
Experience designing a data contract between an application engineering team and an analytics layer
Background in edtech or another regulated sector

Senior Data Engineer

Key skills

About this role

Responsibilities:

Requirements: