Coronavirus (COVID-19) harmonisation guidance

Policy details

Metadata item Details
Publication date:20 May 2020
Author:Sofi Nickson
Approver:Catherine Davies
Who this is for:Users and producers of statistics
Type:Harmonisation guidance and principles

Development of these questions

The questions presented here have been developed to collect data about the impact of the coronavirus (COVID-19) pandemic in a harmonised way.

They have been adapted from government surveys. This process has involved changing the reference period and removing overly specific response options. Some wording has also been changed based on established questionnaire design principles and use of a consistent style guide.

The need for rapid development has meant that these questions have not been cognitively tested. Their development has been based on best practice principles and iterations with available evidence.

Although designed in accordance with best practice, these questions are experimental and may change as priorities evolve.


What is harmonisation?

Harmonisation is the process of making statistics and data more comparable, consistent and coherent. Harmonised principles set out how to collect and report statistics to ensure comparability across different data collections in the Government Statistical Service (GSS). Harmonisation produces more useful statistics that give users a greater level of understanding.

When it comes to collecting data about the impact of the coronavirus (COVID-19) pandemic we are proposing a harmonised set of questions. Given the lack of testing these are to be considered experimental and not a full harmonised principle.


What do we mean by the coronavirus?

Coronaviruses are a family of viruses that cause disease in people and animals. They can cause the common cold or more severe diseases, such as COVID-19.

COVID-19 refers to the “coronavirus disease 2019” and is a disease that can affect the lungs and airways. It is caused by a type of coronavirus. This set of harmonised questions relates to COVID-19, and refers to this as “the coronavirus” in line with the Office for National Statistics’ style guide.


Questions and response options (inputs)

The harmonised questions on this topic are designed to collect basic information, for use in the majority of surveys.

The choice of variables is based on priorities identified across government and how appropriate they would be to harmonise. They are not designed to replace questions used in specialist surveys where more detailed analysis is required.

VariableQuestion(s)Response options
Diagnosis and symptoms (these two questions are to be used together - see the "Using these questions" section for more information)Have you been officially diagnosed with the coronavirus (COVID-19)?
Don't know
Since January 2020, have you had coronavirus (COVID-19) symptoms? (Symptoms can include a high temperature or new continuous cough, or both)Yes
Don't know
WorryHow worried, if at all, are you about the coronavirus (COVID-19) pandemic?Extremely worried
Very worried
Somewhat worried
Not very worried
Not at all worried
Don't know
Keyworker statusDue to the coronavirus (COVID-19) pandemic, have you been given “key worker” status?Yes
Don't know
Impacts Which areas of your life are being affected by the coronavirus (COVID-19) pandemic?
Please select all that apply.
My health
My work
My education
My household finances
My well-being
My caring responsibilities
My relationships
My access to groceries, medication or essentials
Other (please specify)


Using these questions

Question placement

These questions can either be added to a wider block of questions exploring the coronavirus, or asked on their own.

If the diagnosis questions are used, they should be used together as the output of whether someone “has” or “has not” got coronavirus is based on combining responses to both questions.

Types of data collection this principle is suitable for

These questions are based on variables used in both interviewer administered and self-complete survey modes.


Presenting and reporting the data (outputs)

DiagnosisSum unique “yes” values to the two diagnosis questions to output probable cases of the coronavirus.

Sum unique “no” values to the two diagnosis questions to output probable non-cases of the coronavirus.

Responses of “yes” to question one and “no” to question two output probable asymptomatic cases.
WorrySum of each response option outputs levels of self-reported worry.
Keyworker statusSum of each response option outputs levels of self-reported key workers.
ImpactsSum of each response option outputs self-reported levels of each domain impacted.

Only output responses under “other” once aggregated or coded to different domains. Do not publish free text respondents provide.

From use on government surveys we know respondents are including both positive and negative impacts when responding to this question. Because of this, outputs from this question should be reported as areas affected not areas negatively affected.



Guidance for Devolved Administrations

Different policies across the UK nations may affect the outputs from these questions.

For example higher numbers of key workers may be down to a broader definition of what a key worker is and higher levels of diagnosis may be a result of different policies on testing.

In assessing comparability of statistics on the coronavirus, we have found two domains that benefit from extra guidance: key worker status and diagnosis.

Key worker status

Why collect this data?

The purpose of the variable key worker status is to ascertain which workers’ children are still permitted to attend school during a time of restricted schooling provision as a result of the coronavirus.

Terms used

“Key worker” is the most common term used in the UK according to Google data, but the phrase “critical worker” is also sometimes used.

Geographical comparisons of key workers

The central UK government has provided a definition of what a key worker is based on sectors but also includes people “if [their] work is critical to the COVID-19 response”.

However, the definition varies slightly in the UK nations, with Scotland and Northern Ireland noting that it is flexible. Varied definitions across the UK may mean that UK-wide data is not always capturing the same thing in each nation, and as such may not be geographically comparable.

Self-identification of key workers

Because there is flexibility in definitions, outputs based on occupation or industry may not be comparable to outputs based on self-identification as a key worker. Those who self-identify as a key worker are likely to be acting as though they are a key worker (for example going to work) whether or not they meet industry and occupation definitions. This means that to understand service provision needs, capturing the number of people who self-identify as a key worker is the most beneficial.


Surveys help us estimate cases

Without testing, we cannot know exactly how many people have the coronavirus. This means survey data on the topic is an estimate, and variance is to be expected.

Comparing survey data and test data

Survey questions are unlikely to be comparable to test data except in studies that use both survey and biological data.

One reason is that testing figures will miss cases because tests are only provided to a subset of people.

Another reason is survey questions which rely on self-reported symptoms will miss asymptomatic cases.

This means, testing data has higher accuracy, but survey data has more representative coverage. The decision of which of these is more appropriate for use will vary based on situation.   


When comparing data on prevalence of the coronavirus, it is important to also understand whether the data is reporting new cases, current cases or cumulative cases.

Cumulative survey data relates to questions that ask whether someone has had the coronavirus at all, which provides data that cannot be compared to questions asking about whether someone currently has the coronavirus.

The Department for Health and Social Care and Public Health England have a live tracker for both cumulative and new cases. As this is based on testing data, which is only available on a specific subset of people, it should not be compared to survey data that aims to achieve a representative sample.

Levels of prevalence will also vary based on levels of testing. As such, when levels of testing are known to vary between samples, this should be noted when comparing outputs.


Further information

To aid harmonisation, we recommend adopting other harmonised principles that may be relevant in data collection during this time, such as:

Before collecting further data on this topic, we also suggest looking at information that has already been published, for example:



We are always interested in hearing from users so we can develop our work. If you use or produce statistics based on this topic, please get in touch:


Review frequency:

This guidance will be reviewed regularly.

  • If you would like us to get in touch with you then please leave your contact details or email directly.
  • This field is for validation purposes and should be left unchanged.