Skip to content
GSS > Policy and guidance hub > Data linking

Data linking

Policy details

Metadata item Details
Publication date:16 January 2019
Author:Best Practice and Impact Division at the Office for National Statistics
Approver:Best Practice and Impact Division at the Office for National Statistics
Who this is for:Members of the Government Statistical Service
Contact:

gsshelp@statistics.gov.uk

Brief description:

This guidance provides advice on data linking – the process of joining datasets together so we can maximise the use of our data sources.

The government holds an enormous amount of data.  By using it effectively, we provide insight, drive policy change and answer society’s most important questions.  While datasets are useful on their own, bringing them together means that we can take advantage of the combined resource that they offer.

Data linking is the process of joining datasets together so that we can make as much use as possible of the information that they hold.

Data linking matches up data from different datasets.  For this to work effectively, the methods we use to perform the linkage must be ethically sound, secure, robust and trustworthy.

Good data linkage practice is a responsibility that extends across the whole GSS. Collaborative work between departments will help to achieve data linkage strongly underpinned by the Code of Practice for Statistics.

After a recent investigation into the role the UK statistical system can play in providing greater insight to users via data linkage, the Office for Statistics Regulation (OSR) published their report Joining up Data for Better Statistics (2018) (pdf, 1.2MB).

The National Statistician has responded to this report (pdf, 383KB) alongside an additional annex which highlights a data linkage work plan (pdf, 496KB).

In light of the National Statistician’s response, Ed Humpherson, the Director General for Regulation at the OSR has replied with this letter (pdf, 153KB).

National Statistician’s Data Ethics Committee

Strategies to maintain trust and confidence in government data use are supported through guidelines which promote ethical data sharing. The National Statistician’s Data Ethics Advisory Committee (NSDEC) advise the National Statistician on the use of public data to ensure it is ethical and in the interest of the public good.

National Statistician’s Quality Review

The National Statistician’s Quality Review (NSQR) of privacy and data confidentiality methods helps the GSS to take full advantage of the newest research and innovation in this field, maximising the usefulness of statistics while meeting its legal and ethical obligations to protect data confidentiality.

World leading experts from across academia and the private sector contributed articles to this NSQR that identify challenges in data linking and suggest steps that the statistical community could take to find solutions to these challenges.

One example is the article entitled Privacy, confidentiality and practicalities in data linkage (pdf, 962KB) from Associate Professor Kerina Jones and Professor David Ford.

Data architecture

The Office for National Statistics (ONS) is developing a data strategy that will be key in enabling us to put in place the correct infrastructure e.g. data capability, governance and management framework.

This aims to serve the ONS and support the GSS in the future, balancing the needs to extract value from data against the appropriate safeguards.

A comprehensive framework that underpins this strategy manages and governs data practices to ensure that data is protected and meets legal obligations. This includes linking and matching practices.

The full high-level data management framework comprises:

  • a set of data principles which define the scope and path of data management, from acquiring or collecting data, through to publication, and a set of security principles which define the foundation of our data protection practices
  • a suite of policies to support data and security principles. These are statements of intent which describe what will be done to ensure it complies with data and security principles
  • a set of data standards and security procedures and protocols which define how statistical and business activities are carried out

The linking and matching policy is currently being reviewed to reflect the UK Statistics Authority’s systemic review.

Data linking and harmonisation

When multiple datasets are combined through linking, it is very important to use consistent and coherent definitions in data collection wherever possible.

Without this, there is a risk that the linked data will measure the same topic in several, different ways.  This can present a confusing picture to users, and might also limit useful analysis because it can be difficult to reconcile such differences.

Harmonisation addresses this challenge by ensuring commonality in the use of definitions, administrative data and in the presentation of outputs.

The GSS Harmonisation team maintains and develops fully approved harmonised principles (harmonised definitions, survey questions, standards for administrative data and standards to be used when presenting outputs).

If you would like to know more about harmonisation, the GSS Harmonisation team can support you in developing and implementing harmonised principles.

The GSS data project

The approach to the GSS data project is to standardise and harmonise data. This means carefully analysing the structure of the datasets, establishing shared codelists, and being very specific about using metadata to describe the datasets.

While there are different ways of approaching this, linked data provides a convenient framework for modelling and publishing data in this way and is well suited to discovering and accessing the data using the web.

Linked data is about using the web to connect related data that wasn’t previously linked, or using the web to lower the barriers to linking data currently linked using other methods.

The fundamental reason for doing this is to make it easier for people to discover and use the datasets that have been published.

Where to access linked data

Researchers can access linked data through a number of research environments across the UK. These include:

  1. The Office for National Statistics Secure Research Service
  2. UK Data Service
  3. HMRC DataLab
  4. Administrative Data Research Centres in Northern Ireland, Scotland and Wales
  5. SAIL DataBank
  6. Administrative Data Research Partnership

Legal gateways

The Digital Economy Act (2017), sets out the criteria for enabling access to data, including linked data, by accrediting processors, projects and researchers and requiring that the highest ethical standards are maintained.

Part 5 of the Digital Economy Act included important new legal powers to provide the UK Statistics Authority (and the ONS as its executive office) with better access to data to support the production of official and National statistics and statistical research; and to provide accredited researchers with better access to de-identified public sector data to support research projects for the public good.

The Statistics and Registration Service Act (2007) establishes a legal gateway, known as the Approved Researcher Scheme, which allows the ONS to grant access to researcher data that cannot be published openly, for statistical research purposes.

Accreditation of individuals and projects

Accessing any data in a secure environment requires both researchers and their project proposal to be accredited.

For instance, to access linked data in the ONS Secure Research Service, individuals should hold ONS Researcher Accreditation and have their research proposal approved by the Research Accreditation Panel, an independent panel comprised of representatives from government departments, academia, commercial and voluntary sectors.

Protecting Data

Using linked data for statistical purposes that serve the public good requires additional safeguards.

Most secure research environments have developed well tested principles to ensure that access to data is secure, lawful and ethical.

The ONS and UK Data Service follow an internationally recognised set of principles known as the Five Safes Framework. This a set of principles for safely using secure data based on the safe people, safe projects, safe settings, safe outputs and safe data protocols:

  1. Safe people – trained and Accredited Researchers, trusted to use data appropriately
  2. Safe projects – data only used for feasible, legal, and ethical research that delivers clear public benefits
  3. Safe settings – access to data only possible using secure technology systems
  4. Safe outputs – all research outputs checked to ensure they cannot identify data subjects
  5. Safe data – researchers can use the appropriate data in a de-identified form

Guidance available on the GSS website

Communicating quality, uncertainty and change

The GSS Methodology Advisory Committee

The GSS Methodology Advisory Committee (GSS MAC) may be able to help by providing free methodological advice.

Other guidance

The Journal for Public Health has published “GUILD: GUidance for Information about Linking Datasets”. This provides direction to ensure that each step in the data linkage process is documented according to a common framework.

The GSS data linkage symposium will take place in London on Wednesday 23 October 2019. This page will be updated as more details on the event become available.

The aims of this symposium are:

  • To facilitate sharing of cross-government data linkage methods and experiences through a series of presentations from GSS colleagues
  • To open up a discussion about how to work together to facilitate data linkage work
  • To signpost people towards resources and training to build GSS capability in data linkage
  • To update delegates on the progress of the National Statistician’s Quality Review on data linkage and learn from academic experts on leading linkage methods.

If you would like to be involved in the symposium or have any queries, please get in touch.

Email: qualitycentre@statistics.gov.uk

Review frequency:

This guidance is reviewed annually.