Cambridge Spark is an AI and data science apprenticeship provider, working with clients including the BBC, NHS, Virgin Atlantic, and GSK. The role involves taking ownership of the data infrastructure, shaping the data landscape, and working closely with product engineers and business stakeholders to ensure effective data usage and governance.
Responsibilities:
- Own the data estate. Our stack is Postgres, GCP, BigQuery, and Looker. You keep things running reliably today, but you are also the person who decides and drives any changes needed to our architecture
- Set the standard for how we work with data. You will establish the patterns, conventions, and governance practices that others build on. That includes how data is classified, how access is controlled, and how requests from the business are evaluated. Our policies are in place, and we need to implement them
- Work with product engineers as a peer. Schema decisions in the application database have downstream consequences for everything else. You engage with those conversations early, have opinions about what belongs where, and help build data quality into the source
- Change how the business thinks about data. When someone says "I want to see this dataset," your job is not to grant or refuse the request. It is to understand what they are actually trying to solve and find the right path to insight. Over time, you shift the culture: fewer "just add a field" requests, more conversations that start with the business problem
- Hold the line on data governance. Cambridge Spark handles sensitive learner and employer data in a regulated environment. You treat compliant data handling as a point of professional pride, enforce least-privilege access as a matter of course, and push back constructively when a proposed approach would expose more than it should
- Balance the immediate and the strategic. You can triage what is urgent, make pragmatic fixes, and still keep the longer-term design coherent. Neither horizon gets sacrificed for the other
Requirements:
- Postgres: schema design, query optimisation, indexing, and migrations in production
- GCP: hands-on experience with core services: Cloud SQL, Cloud Storage, IAM, networking
- BigQuery: data warehousing and mirroring pipelines; cost management and performance tuning
- Looker or equivalent BI tooling: able to own a dashboard estate end-to-end, including coaching others to use it well
- Data modelling: strong instinct for when to aggregate, when to denormalise, and when not to expose raw data
- Data governance: classification frameworks, access controls, audit logging, data lineage; in practice, not just in theory
- Judgment, not just experience. You know when the right answer is 'fix the pipeline' and when it is 'the pipeline is fine, the data model is wrong.' You push back on the path of least resistance when it matters
- A product-led style. You're not purely analytics or BI, but close enough to software engineers to understand how application decisions affect data quality downstream, and confident enough to influence those decisions
- Working directly with business stakeholders. You help people with questions get to better ones
- Principled attitude to sensitive data. You will say 'we should not be exposing this' when it's right, and you hold that line while still being constructive and helpful to the business
- Ownership without prompting. When something is wrong, you notice, understand what it means, and do something about it. You leave things better than you found them
- Familiarity with UK GDPR and DPA 2018, particularly in a regulated or learner-facing context
- Experience with dbt, Airflow, or comparable pipeline tooling
- Experience designing a data contract between an application engineering team and an analytics layer
- Background in edtech or another regulated sector