Data quality in the time of coronavirus (COVID-19)

The role of data during the pandemic

The coronavirus (COVID-19) pandemic has fundamentally reshaped society. We’ve faced unprecedented restrictions on our ability to move, work and interact. The effects of COVID-19 on the economy, health outcomes and our personal well-being are stark. The long-term impacts of the pandemic are uncertain, but likely to be with us for many years to come.

Data has been at the core of the COVID-19 response. From daily updates on case numbers, deaths and testing capacity, to national infection surveys, the significance and public profile of data have never been higher. Major policy decisions are driven by access to a rapidly evolving pool of information. Lockdowns, school closures and quarantine restrictions are implemented based on the latest numbers. The need for a ‘real-time’ response demands that data are available at an unparalleled pace.

Underpinning the effective use of data is data quality. At a high-level, data quality can be thought of as ‘fitness for purpose’ – is this dataset good enough for what I want to use it for? The level of quality required will vary depending on the purpose, but will often consider the six dimensions of data quality. Data of a higher quality are more useful, and poor-quality data often lead to poor decisions. In the high-stakes context of COVID-19, therefore, it is vital that data are fit for purpose. This can pose challenges when working at such pace.

To understand and learn from these challenges, we’ve formed a cross-government working group. The group involves colleagues from NHS-X, NHS Digital, the Government Data Quality Hub (based in the Office for National Statistics) and Health Data Research UK. We aim to explore the challenges that COVID-19 has raised in relation to data quality, and to promote a data quality ‘culture’ across the health community. We’re doing this by sharing best practice, supporting collaboration and encouraging the adoption of cross-government frameworks and guidance.

Delivering data at pace: lessons learned from the Shielded Patients List (SPL)

In September we held the second of our webinar series, focussing on lessons learned from the Shielded Patients List (SPL). The SPL was developed to protect those most at risk of adverse effects from COVID-19 by advising them to shield from social contact. The list is also shared with partner organisations to provide support such as food parcels and medicine deliveries.

Richard Irvine (Associate Director, NHS Digital) and Kieran Baker (Head of Delivery for Data Services, NHS Digital) gave an overview of their work to develop the SPL. They also highlighted some of the challenges they faced. The list itself is derived from a clinical rule set which identifies those individuals most at risk of complications from COVID-19. Richard and Kieran explained that the first iteration of the methodology used to identify patients who meet the high risk criteria was developed in just four days following a Prime Ministerial mandate.

This process would ordinarily take months. The team collaborated with clinical leads to determine relevant clinical conditions and to develop the methodology to identify patients suffering from these. Over time, the methodology has been improved iteratively to provide a more nuanced picture of the at-risk cohort. An updated version of the SPL is generated weekly and securely shared with approved organisations for specific purposes.

Richard and Kieran also discussed some of the data quality challenges they faced when developing the SPL. One example was the use of an administrative dataset for direct care. Examples here include potential ‘over-coding’ of cancer patients. Patients who had recovered from cancer many years ago were in some instances having their diagnosis re-added as a co-morbidity to their patient record every time they returned to hospital, even if the cancer had not returned.

This activity was detected as part of the secondary care Commissioning Data Set, resulting in patients being added to the SPL. Another example given was the over-identification of patients with sickle cell traits. Only those with sickle cell disease should be included on the SPL, but those tested for sickle cell traits were sometimes incorrectly coded as having sickle cell disease.

Richard and Kieran reflected on two approaches they used to identify and address their data quality issues:

Communication and collaboration

Effective communication and collaboration with front-line colleagues were key to successful delivery. The team collaborated with clinical leads to develop the first version of the SPL methodology. They then worked with GPs, hospital consultants and data engineers to improve data quality in future iterations. GPs provided quality checks and advised which patients should be added to or removed from the list. Close collaboration with cross-government legal teams was also vital. This ensured the SPL could be shared safely with all those who needed to access it.

Data visualisation

Analysis and dashboards were used by the team to identify potential data quality issues. The sickle cell disease anomaly was identified using data visualisation. The team spotted unexpectedly high numbers of patients with apparent sickle cell disease at some GP practices, which led them to investigate further.

Outcomes from the SPL: the Shielding Behavioural Survey

We then heard from Tim Gibbs (Lead Analyst, Public Service Analysis Team, Office for National Statistics). Tim leads the team responsible for the Shielding Behavioural Survey. The aim of the survey was to understand the behaviours and outcomes for the Clinically Extremely Vulnerable (CEV) population. Data from the survey showed how the CEV population were responding to shielding advice. It also highlighted the impact shielding was having on their mental and physical health. The data showed, for example, that a greater proportion in younger age groups reported worsening mental health than those aged 70 and over.

The Shielding Behavioural Survey drew its sample from the SPL. Tim explained that high-quality data from the SPL was crucial to the success of their work. In particular, the timeliness and relevance of data from the SPL allowed for the rapid production of robust statistics. These fed into the formation of shielding policy for the CEV population. This is an example of how high-quality data can have a direct impact on outcomes for the public.

Want to know more?

You can watch a recording of the seminar and access the presentation slides on the FutureNHS Collaboration Platform. Simply register using your government email address. You’ll then need to ask to join the ‘Data and Analytics Support for COVID-19’ workspace in the ‘How can we help you?’ box. We’ll be holding further events in the coming weeks, too, so look out for these.

You can also contact the health data quality working group using the details below:

Karina Gajewska (NHS-X): karina.gajewska@nhsx.nhs.uk

Andrew Heggs (NHS Digital): andrew.heggs1@nhs.net

James Tucker and Andy Schofield (Government Data Quality Hub, ONS): dqhub@ons.gov.uk

Prof. Neil Sebire (HDR UK): neil.sebire@hdruk.ac.uk

Adam Millward and Courtney Irwin (MetadataWorks):
adam@metadataworks.co.uk
courtney@metadataconsulting.co.uk

Andy Schofield
Alexander Amaral-Rogers
Andy works at the Office for National Statistics (ONS) in the Government Data Quality Hub. Andy works to improve data quality across government and provides training, advice and support to analysts across the GSS. Prior to joining the ONS, he worked as a statistician in three other government departments.