An analyst’s job is never done
“Don’t trust the data. If you’ve found something interesting, something has probably gone wrong!”
Maybe you’ve been there too? It was a key lesson I learnt as a junior researcher. It partly reflected my skills as an analyst at the time – the mistakes could well have been mine! But, not entirely.
You see I was working with cancer registration and deaths data which on occasion could show odd patterns due to changes in disease classifications, diagnosis developments or reporting practices. Take a close look and you could spot the step changes when a classification change occurred. Harder to spot might be the impact of a new treatment or screening programme. But sometimes there were errors too – including the very human error of using the wrong population base for rates.
I was reminded of this experience when Sir Ian Diamond, the National Statistician, spoke to the Health and Social Care Select Committee in May (PDF, 258KB). He said:
“One of the things about good statisticians is that they are always just a little sceptical of the data. I was privileged to teach many great people in my life as an academic and I always said, ‘Do not trust the data. Look for errors.’”
Sage advice from an advisor to SAGE!
The thing with quality is that the analyst’s job is never done. It is a moving target. In the Quality Assurance of Administrative Data guidance from the Office for Statistics Regulation (OSR), the importance of understanding where the data come from and how and why they were collected is emphasised. But this information isn’t static – systems and policies may alter. And data sources will change as a result.
Being alert for this variation is an ongoing, everyday task. It includes building relationships with others in the data journey, to share insight and understanding about the data and to keep a current view about the data source. As Sir Ian went on to point out in his evidence, it should involve triangulating against other sources of data.
OSR recently completed a review of quality assurance in HMRC, at the agency’s invitation. It was a fascinating insight into the operation of the organisation and the challenges it faces. We used a range of questions to help inform our understanding through meetings with analytical teams. They told us that they found the questions helpful and asked if we would share them to help with their own quality assurance. So, we produced an annex in the report with those questions.
We have now reproduced the questions in a guide, as prompts to help all statistics producers think about quality.
We have used these headings:
- Understanding the production process
- Tools used during the production process
- Receiving and understanding input data
- Quality assurance
- Version control and documentation
- Issues with the statistics
The guide also signposts to a wealth of excellent guidance on quality on the GSS website. The GSS Best Practice and Impact (BPI) division supports everyone in the GSS in meeting the quality requirements of the code and improving government statistics.
BPI provides a range of helpful guidance and training:
Quality Statistics in Government guidance is primarily intended for producers of statistics who need to ensure that their products meet expectations for statistical quality. It is an introduction to quality and brings together the principles of statistical quality with practical advice in one place. You will find helpful information about quality assurance of methods and data and how to design processes that are efficient, transparent and reduce the risk of mistakes. Reproducible Analytical Pipelines (RAP) and the benefits of making our analysis reproducible is also discussed. The guidance complements the Quality Statistics in Government training offered by the GSS Quality Centre.
Communicating quality, uncertainty and change guidance is intended for producers of official statistics who need to effectively communicate information about quality, uncertainty and change. It can be applied to all sources of statistics, including surveys, censuses, administrative and commercial data, as well as estimates derived from a combination of these. There is also a Communicating quality, uncertainty and change training course.
The Quality Assurance of Administrative Data (QAAD) workshop allows producers to get an overview of the QAAD toolkit and how to apply it to administrative sources.
Tips for quality assuring urgent ad-hoc statistical analysis has also been published recently.
Finally, there is a GSS Quality Strategy in place which aims to improve statistical quality across the GSS to produce statistics that serve the public good.
If you use our new quality question guide let us know how you get on by emailing me at firstname.lastname@example.org – we would welcome hearing about your experiences. We are always on the look-out for some good examples of practice that we can feature on the online Code of Practice for Statistics.