Quality statistics in government
|Publication date:||2 October 2019|
|Who this is for:||Members of the Government Statistical Service|
The responsibility for producing high quality statistics lies with everyone working in the statistical system. Understanding quality, and how it feeds through into the use of our numbers, is fundamental to building trust in government statistics.
This guidance supports statistics producers in meeting the quality requirements of the United Kingdom (UK) Code of Practice for Statistics and Eurostat’s European Statistics Code of Practice, which sets out five dimensions for measuring the quality of statistical outputs. Effective quality assurance is key to anticipating and avoiding problems arising with statistics produced by the Government Statistical Service (GSS) and the guidance also offers practical advice on how to design processes that are efficient, transparent and reduce the risk of mistakes.
I therefore invite you all to make use of this guidance. It will help us to ensure that we are delivering high quality statistics that are trustworthy and inform better decisions.
Why is quality important?
Statistics are used to help make important decisions. To properly inform those decisions, users need to know about the quality of the statistics and how this feeds through into effective use.
As producers of government statistics, we have a unique insight into how and why those statistics were collected, processed and shared. It is our responsibility to manage and report on the quality of statistics, clearly and concisely, so that we support everybody who needs to use them.
Who is this guidance for?
This guidance is primarily intended for producers of statistics who need to ensure that their products meet expectations for statistical quality. The guidance will be useful for new starters as part of their induction. However, many of the topics covered are also relevant for anybody who needs to analyse data and understand their limitations.
What is its aim?
There are lots of tools and guidance about measuring and reporting quality. Our aim here is to bring together the principles of statistical quality with practical advice in one place, and to provide links to more detailed material. We’ve written this guidance to complement the Quality statistics in government training offered by the Government Statistical Service (GSS) Quality Centre. Producers may also want to adapt this guidance to develop internal quality manuals. Some examples are provided later in the guidance.
What does the guidance cover?
We cover the quality expectations of the UK Statistics Authority’s Code of Practice for Statistics and Eurostat’s European Statistics Code of Practice.
We discuss quality assurance of methods and data and how to design processes that are efficient, transparent and reduce the risk of mistakes. We also discuss Reproducible Analytical Pipelines (RAP) and the benefits of making our analysis reproducible.
What is not covered?
We do not cover programme and project quality management. Neither do we aim to cover statistical quality control: the use of statistics to monitor the quality of other products.
While we talk about measures of quality such as confidence intervals, we do not cover the technical calculations needed to produce such measures, as such calculations vary widely across different statistical outputs. Similarly, we will not include technical descriptions of the code required for reliable processes, as these vary between different processing environments. We cover communicating quality, uncertainty and change in separate guidance.
What is statistical quality?
We define the quality of our statistics in terms of their fitness for purpose. We must use our judgement as statistical professionals to relate what we know about data, methods and measures of statistical quality to the known and likely uses of the statistics. Good quality statistics meet user expectations and are fit for purpose.
We do not set absolute standards for metrics such as accuracy and precision in this guidance. What is appropriate to ensure fitness for purpose will depend on context, and is a matter of professional judgement. The Code of Practice states that statistics should not be materially misleading. Lower quality statistics (for example numbers with a wide margin of error) can still meet this criterion if the user is sufficiently aware of their quality and how it feeds through to appropriate use.
As well as measurable output quality, we should consider process quality. If the inputs and processes that contribute to a statistical product are demonstrated to be of a high standard, then we would usually expect the resulting statistics to be of a high standard. If little is known about the inputs and processes, then we cannot assess whether the statistics are fit for purpose and what level of quality they are.
Quality management can be broken down into Quality Assurance: anticipating and avoiding problems; and Quality Control: responding to observed problems. We focus on Quality Assurance in this guidance but include some examples of where observed problems have been addressed.
The Code of Practice is based on three pillars that underpin public confidence in statistics: Trustworthiness, Quality and Value.
The pillars are mutually supporting so, for example, producing high quality statistics will support trustworthiness and the statistics are likely to contribute more value. The code sets out cross-cutting commitments to transparency about processes, methodology and content, coherence between different statistical outputs and collaboration between producers.
The Quality pillar has three principles:
Q1 Suitable data sources: Statistics should be based on the most appropriate data to meet intended uses. The impact of any data limitations for use should be assessed, minimised and explained.
Q2 Sound methods: Producers of statistics and data should use the best available methods and recognised standards, and be open about their decisions.
Q3 Assured quality: Producers of statistics and data should explain clearly how they assure themselves that statistics and data are accurate, reliable, coherent and timely.
These principles are underpinned by eighteen practices. They set out the expected standards, behaviours and activities that will allow the delivery of the pillars and principles.
What does the Office for Statistics Regulation expect?
The Office for Statistics Regulation (OSR) supports statistics that are meaningful and not materially misleading through the application of the code. Users need to be able to verify the statements that the statistics make. They need to know where the underlying data come from, how they were collected, and any resulting risks of bias. They need to know about the methods used to create the statistics from source data. And from this, they need to know the strengths and limitations of the statistics: what they can and cannot be used for.
The OSR website provides case studies showing how departments are implementing the practices that support the quality principles.
For further information, watch a video introducing the quality pillar of the Code of Practice.
The GSS Quality Strategy was launched in June 2019. It is a two year strategy which sets out actions to address key quality challenges and improve quality across the GSS. The strategy was developed by the GSS Quality Centre in collaboration with the GSS.
The aim of the strategy is to improve statistical quality across the GSS.
The GSS will achieve this through four goals:
- We will all understand the importance of our role in producing high quality statistics.
- We will ensure our data are of sufficient quality and communicate the quality implications to users.
- We will anticipate emerging trends and changes and prepare for them using innovative methods.
- We will implement automated processes to make our analysis reproducible.
Each goal is underpinned by a set of deliverables that build towards achieving that goal. The deliverables are split between the Best Practice and Impact Division in Office for National Statistics (ONS) and the wider GSS. We are all responsible for its success. GSS Heads of Profession and quality champions are responsible for delivering the strategy on behalf of the GSS.
Goal 3 from the GSS Quality Strategy
|3.1 Produce National Statistican's Quality Reviews (NSQRs)||Quality Centre|
|3.2 Facilitate Methodology Advisory Committee (MAC) and Methodology Advisory Service (MAS)||Quality Centre|
|3.3 Engage with the development of NSQRs and take on board recommendations||GSS|
|3.4 Ensure there is space within departments to horizon scan for upcoming issues or opportunities||GSS|
|3.5 Regularly review publications||GSS|
Measuring the quality of statistical outputs
The European Statistical System’s (ESS) Dimensions of Quality set out criteria for assessing the fitness for purpose of statistical outputs. The dimensions form principles 11 to 15 of the European Statistics Code of Practice:
- Accuracy and Reliability
- Timeliness and Punctuality
- Comparability and Coherence
- Accessibility and Clarity
Each principle is supported by indicators that support its evaluation and measurement. The ESS Quality Assurance Framework sets out measures that can be implemented by National Statistical Institutes (NSIs) to evaluate each of the indicators.
The five dimensions provide a useful framework for informing users about the quality of statistical outputs. Producers should use the framework as the basis for reporting information on the quality of their statistics.
Information about the quality of statistics should be provided in the commentary of the statistical bulletin and in accompanying Background Quality Reports (BQRs). BQRs assess the quality of statistical outputs against the five dimensions and help our users to understand the strengths and limitations of the statistics. More detail is available in the Communicating quality, uncertainty and change guidance.
How do the ESS dimensions relate to the UK Code of Practice for Statistics?
By following the principles and practices in the UK Code of Practice for Statistics and addressing any concerns that arise, you can collect evidence on statistical process and output quality, and structure that evidence using the dimensions of quality from the European Code.
Relevance is the degree to which statistics meet the current and potential needs of users. Relevance is covered in the UK Code of Practice under principle V1 (Relevance to users).
User needs should be reviewed on a regular basis and should be considered by statistical producers in the production of statistics.
An example can be seen for the Department for Transport Search and Rescue Helicopter (SARH) Statistics where a user feedback survey was used to ensure relevant statistics are being produced.
There are a range of other approaches which can be used to understand and respond to user needs e.g. user groups, user events and StatsUserNet.
Guidance on user engagement is available on the GSS website.
The UK Code of Practice also states that statistics producers should consider whether to produce new statistics to meet identified information gaps. An example of this can be seen for the ONS Crime Survey for England and Wales. A set of self-completion questions on adult recollection of abuse have been developed through consultation with stakeholders. These were implemented in the 2018 data collection and will help provide valuable information on what is a sensitive topic (see Improving Crime Statistics for England and Wales for further information).
“Relevance is important because it places users and data at the centre of statistical production. Relevance is assessed by understanding user needs.”
Code of Practice for Statistics, UK Statistics Authority, 2018
Accuracy is the closeness between an estimated result and the (unknown) true value.
The Code of Practice for Statistics states that information on accuracy should be monitored and reported regularly to users. We may be able to estimate some aspects of accuracy from the data. In the road safety statistics publication from the Department for Transport, they quantify the uncertainty around the estimated number of drink drive road deaths.
It is usually impossible to measure non-response bias as the characteristics of those who do not respond to surveys are difficult to ascertain. In this instance, response rates may be used to give an insight into the possible extent of non-response bias in the survey. If possible, you should report on response rates for different groups and advise on likely implications for using the statistics.
Reliability is the closeness of early estimates to subsequent estimated values.
This is a specific definition. Other disciplines often consider reliability to refer to the stability of results under repeated runs, for example in the context of an iterative algorithm.
In the Department for Transport road safety statistics publication, two estimates of drink drive road deaths are published. The first is published in February, just over a year after the end of the reference year with the final estimate published in August of the same year e.g. the first estimate of the number of drink drive road deaths in 2017 was published in February 2019 with the final estimate published in August 2019. The initial estimate of drink drive road deaths is less accurate than the final one because the data available at the time of initial publication is incomplete.
Scheduled revisions are planned amendments to published statistics to improve quality by incorporating additional data that were unavailable for the initial publication.
Reliability is measured from these scheduled revisions by the closeness of the early estimates to the revised ones. The communicating quality, uncertainty and change guidance provides more information on this topic.
Timeliness refers to the time gap between the publication date and the reference period for the statistics.
A reference period is the time period for which statistical results are collected or calculated.
An example is the Faster indicators of UK economic activity project being undertaken by the ONS. The aim is to provide faster insight into the state of the UK economy to policymakers and analysts in order to make informed, timely decisions on matters, such as the setting of interest rates, which affect the whole UK.
Punctuality is the time lag between the actual and planned dates of publication for the statistics.
The Code of Practice requires that the release of both regular and ad-hoc official statistics should be pre-announced through a 12 month release calendar with a specific release date given at least four weeks in advance where practicable. NHS England have a 12 month release calendar to enable users to see what we will be publishing over the year.
Changes to pre-announced release dates should be agreed by the Chief Statistician or Head of Profession for Statistics and promptly announced to users with reasons provided for the change.
An example of this is the ONS Migration Statistics quarterly publication (July 2018). These statistics were originally announced for publication on 24 May 2018. The Migration Statistics Team announced the postponement of the publication with the reasons for this delay (due to a transformation change). The statistics were published on 16 July 2018 with a Migration Statistics Transformation Update which communicated clearly to users the changes made since the initial, delayed publication.
Comparability is the degree to which statistics can be compared over time, region or another domain.
The UK Code of Practice states that information about comparability and coherence should be monitored and reported regularly to users. The GSS Harmonisation Strategy sets out how the GSS can work towards greater harmonisation.
The Department for Digital, Culture, Media and Sport Taking Part Survey began in 2005 and its core questions have remained similar since the survey was first launched so that a consistent time series is available for many topics. In addition, the survey enables comparable statistics across the nine regions in England to be produced. However, the survey cannot produce accurate comparisons at local authority level (see Taking Part Quality Indicators Report for further information).
Changes in methods, or improvements in data availability, might cause breaks in time series so that statistics from before and after the change are not comparable. In the NHS Digital alcohol statistics, ongoing improvements in data quality and coverage led to a break in the time series (further details can be found in the Data Quality Statement from the NHS Digital alcohol statistics publication). Data from before 2016 cannot be compared with later data due to a change in the survey question. Part 5 of the publication: drinking behaviours among children gives more information, including an annotated chart.
In many cases, time series breaks cannot be avoided. Users should be notified, and the implications for use made clear. Breaks can have important implications when interpreting and comparing the data through time.
Coherence is the degree to which the statistical processes that generate two or more outputs use the same concepts and harmonised methods.
Coherent processes allow statistics from two sources to be compared and reconciled and increase the value of the statistics. Coherence is a theme that cuts across several of the practices in the Code of Practice for Statistics. Further information on coherence can be found on the OSR website. OSR have also published an insight report on coherence.
An example on coherence can be seen for ONS baby name statistics for England and Wales. Baby names are derived by using the exact spelling of the first name given on the birth certificate. This method is consistent internationally with countries such as Scotland, Northern Ireland, the Netherlands and the United States. This means that ONS figures can be compared with the numbers from these other countries (see Baby names Quality and Methodology Information (QMI) for more information).
Another GSS example on coherence is statistics about homelessness in the UK. Statistics for the four UK countries cannot be compared directly. This is because of operational differences in how homelessness data are collected across England, Wales, Northern Ireland and Scotland. The Homelessness Statistics in Scotland 2018/2019 publication explains the similarities and differences between the Scottish statistics and those from other UK publications.
In 2019, the GSS Harmonisation Team published Harmonisation of Definitions of Homelessness for UK Official Statistics: A Feasibility Report, which will be used as a starting point for improving the harmonisation of homelessness statistics across the UK.
“Producers must demonstrate that they do not simply publish a set of numbers, but that they explain how they relate to other data on the topic, and how they combine with other statistics to better explain the part of the world they describe.”
Code of Practice for Statistics, UK Statistics Authority, 2018
Acessibility is the ease with which users can access the statistics and data. It is also about the format in which data are available and the availability of supporting information.
From 2020, public sector websites will have a legal obligation to meet accessibility standards. Accessible communication formats should be used which work with the most commonly used assistive technologies. Data should also be made available at the highest level of detail that is practicable.
More information on accessibility and the Code of Practice for Statistics is on the OSR website.
Clarity refers to the quality and sufficiency of the commentary, illustrations, accompanying advice and technical details.
Statistics should be presented in a clear, unambiguous way that supports and promotes use by all types of users. The See a Voice project’s article Average Reading Age in the UK reports that, on average, the people in the United Kingdom have the reading ability that we would normally expect of a nine year old child.
We have a duty to support everybody who needs to use government statistics by providing the information they need in a way that is clear and accessible. Our statistical releases should use plain language. Technical terms, acronyms and definitions should be defined and explained when this is appropriate, to ensure that the statistics can be used effectively.
There is GSS guidance on Writing about statistics that may be of use. The Office for National Statistics have also produced a style guide which aims to help make statistical content more open and understandable.
Trade-offs describe the extent to which the dimensions of quality are balanced against each other and time and cost.
Producers should consider what effect trade-offs will have on overall quality and whether there is a benefit to having less quality in one dimension to improve another one.
Trade-offs between quality and cost should be considered and appropriately balanced. For example, the sample size of a survey could be doubled, improving accuracy, but this would have very significant cost implications.
The burden placed on those who supply the information that feeds into our statistics should also be considered. Adding questions to a survey may add value to the outputs, but could create an excessive burden for respondents.
There are trade-offs between timeliness, accuracy and reliability. Statistics published soon after collection will be timely, but there may not have been enough time to gather and validate all the data, so they may not be as accurate and reliable as later releases based on more data. This trade-off can be managed by producing scheduled revisions to statistics. For example, the Office for National Statistics produces two estimates of quarterly Gross Domestic Product (GDP) statistics. The first is produced about 40 days after the end of the quarter to which it refers. A second, final estimate is produced approximately 85 days after the end of the quarter (see Introducing a new publication model for GDP for more information). The first estimate is of lower quality than the second, but is very timely. The second estimate is more accurate but less timely.
When reporting on quality it is useful to consider the procedures and policies applied to ensure sound confidentiality, security and transparent practices. This might include how the data are stored securely, data protection considerations and how confidentiality is protected in the outputs.
The Search and Rescue Helicopter statistics Background Quality Report contains some examples of reporting against the quality dimensions and considering trade-offs. Reviewing current methods for producing statistics or using new methods can lead to improvements in the quality dimensions.
Each of the dimensions are dependent on one another, so that an improvement in one could lead to a deterioration in another. It is not possible to preserve maximum quality across all five dimensions at the same time.
Producers must decide how to manage trade-offs appropriately for their statistics and should communicate the rationale for these decisions to users. You should include the possible trade-offs in discussions with users. Their needs should be considered before making any decision, as users might have a preference for one dimension over another.
What is quality assurance?
Quality Assurance (QA) is about identifying, anticipating and avoiding the problems that can arise from our data inputs or the methods and processes we use to calculate statistics. Quality assurance should also include assessment of statistics against the five dimensions of quality.
You must ensure that sufficient time and resources are available for quality assurance. This should be proportionate to the likelihood of quality concerns and to the importance of the statistics.
As a statistics producer, it is your role to be curious. Don’t take data at face value. Items that look unusual or inaccurate should be investigated and verified.
The data journey
For effective quality assurance, we need to know how data got from the initial input to a statistical publication. We refer to this as the data journey.
For example, the initial input might be from a survey interview, or self-reported in an application for a service. The data may pass from the collection agency to a local agency and be collated centrally by a department. It might be analysed there and passed on to others, where it feeds into further analysis.
You will need to know about the processes used along this data journey and how quality is checked at each step. Your own quality assurance should be complementary, confirming what has already been established and filling in the gaps.
You should know the exact circumstances in which data were input. What was the question asked? What was on the collection form? If the input process was automated, did automatic checks prevent invalid entries? What were those checks? Be aware of motivations of data inputters: is a particular field crucial to their work or is collecting it a necessary chore to be completed with minimum effort?
An example of a data journey can be found on page 31 of the Ministry of Housing, Communities and Local Government (MHCLG) Social housing lettings in England: April 2017 to March 2018.
Page nine of the Office for Statistics Regulation’s quality assurance toolkit for administrative data provides a risk/profile matrix to guide you to an appropriate level of assurance activities. Similar thinking can be applied to any data source.
The GSS Quality Centre is developing similar toolkits for survey data and data sources from outside government, including ‘big data’.
Quality assuring data
We edit and validate data at different stages of the data journey. This may happen case-by-case (micro-editing), or for aggregated values (macro-editing).
Editing and validation is most effective when carried out as close to the data source as possible. For example while an interviewer is with a survey respondent or when a coder processes an administrative record. In practice, editing and validation continue after contact with the reporting unit concludes.
Checks will anticipate what might go wrong:
- missing values represent an error, or a legitimate refusal or don’t know.
- check for any logical or arithmetic relationships in the data.
- check if values fall inside an acceptable or the expected range.
You should aim for a cost-effective set of edit and validation checks. Focus on changes that could have an impact on your analysis. For example, in surveys of enterprises, small relative errors in the sales of large businesses could have a material impact on your totals. Don’t focus too much corrective effort on details that cannot substantially impact your analysis. When the treatment of a particular error always leads to the same mitigation, apply it automatically.
Such decisions rely on a clear understanding of the use of the data. Your editing and validation processes should report on checks that are triggered. This will help to assess the cost-effectiveness of the checking process and may highlight concerns about falling data quality.
When looking across the data set, you may want to use graphs to examine emerging patterns. Line graphs will show change and whether that change is typical of recent history and other subsets of the data. A scatter plot will reveal outliers. The cumulative distribution of a numeric variable will reveal the general shape, range and percentile points and vertical sections indicating a value shared by several records.
The Department for Transport (DfT) noticed a large increase in serious injuries by police force, who provide data, in 2017 compared with provisional 2018 data. This needed further investigation as part of the QA process. Through further analysis this increase was explained by a change in how the injury information was collected and was flagged appropriately in the ‘‘Reported road casualties Great Britain, main results: 2018′ publication and tables.
Quality assuring methods and processes
Methods and processes transform data and perform calculations as we move along the data journey from raw inputs to final statistics. To ensure quality, you need to be confident that the methods and processes chosen are appropriate and that they have been implemented correctly.
In choosing the right methods, you must take into account the nature of the data source and the steps in the data journey. Do the methods account for any initial sampling and the attrition due to initial and subsequent response? You will need to choose the methods most likely to meet user needs, for example producing statistics at an appropriate level of detail. Are there known issues where the statistics fall short of meeting those needs?
Methods that were once fit for purpose might not always be. Changes in the data source, the data journey or methodological developments can all mean that a new approach is required. You may need to seek advice from outside, perhaps through consultancy, or a process of peer review.
To quality assure new processes, consider full or partial dual running. You might recalculate some or all of the statistics using an alternative software package, or the same package but blinded from the original process. Expect to see detailed logs and scrutinise these. For example, make sure that the system has picked up the correct data files and verify checks have been completed and act on any warnings reported. We will discuss in more detail ways to ensure robust approaches to implementing processes when we consider reproducible analysis later in the guidance.
Aim for transparency in your methods and processes. This is both in the choice of methods—which should be open to detailed scrutiny and described frankly, including any known shortcomings—and in the reporting of the implementation.
“Producers of statistics and data should use the best available methods and recognised standards, and be open about their decisions.”
Code of Practice for Statistics, UK Statistics Authority, 2018
MAC is a free methodological advice service that draws on experts from academia, the private sector, the GSS and other National Statistical Institutes (NSIs). MAS is a free service providing methodological advice and guidance. For more information, contact firstname.lastname@example.org.
Quality assuring (QA) outputs
By drawing on quality management information accumulated in the production chain you can assess statistical outputs against the five quality dimensions, taking into account user needs.
As with raw data, you should explain unusual changes or patterns where one area departs from others. Could you explain these to a policy colleague or the media? What are the possible explanations? Are patterns coherent with what you see in other data sources? Do different, related statistics tell the same story?
Preparing outputs for dissemination can introduce risks, especially if manual processing is needed. For manual transfer of figures from reference tables to text, include extra systematic checks on each transferred number. Try to avoid ‘in flight’ calculations manipulating reference table figures for publication. For example, if your table includes two totals and you want to comment on a relative change between the two, it is more reliable to include that change figure in the reference table rather than calculating it manually. As a rule, make sure that all the figures used in statistical bulletins are included in the reference tables.
Reproducible analysis can reduce these risks.
Consider these steps when quality assuring your statistical outputs
- Check breakdowns sum to totals
- Cross check totals between tables
- Compare with previous years
- Check formulas are linking to correct cells
- Compare with other sources
- Are there any revisions?
Proof reading steps:
- Cross check figures in publication against tables
- Check explanations of key trends
- Check footnote numbering
- Do hyperlinks work?
- Are chart and tables numbers in the right order?
- Get sign off from your Head of Profession
- Check who has pre-release access
- Make contact with press office
- Check publication on website once published
Post publication steps:
- Review what went well and what didn’t go so well
- What improvements can you make for next time?
- Engage with users of the publication to get feedback
Some departments have produced internal guidance on QA. Examples are available on the GSS Quality Champions Slack channel.
Roles and responsibilities
For effective quality assurance, you need to be clear on roles and responsibilities throughout the chain of production. It’s not enough to say that an individual will carry out some QA.
You must be clear on the nature and extent of the QA. When this is clear for everyone involved, you can reassure yourself that there are no known gaps and that any duplication of effort is appropriately targeted.
In allocating QA roles, think what will work best in your team. Allocate quality assurance roles to the person who is best placed to do the work effectively, given the time and resource available. Allocate detailed, comprehensive QA tasks to those with the time to do them rigorously.
A more senior grade does not necessarily ensure better results. Senior colleagues can take a strategic look at the consequences of your findings and place them in a wider context. They might sign-off the statistical bulletin, but would be unlikely to do detailed checks such as ensuring the numbers in the bulletin match the reference tables.
Senior leaders such as Deputy Directors during a QA process should:
- ensure that QA checklists are in place and that they review them when signing-off statistical bulletins so they can be confident that sufficient QA has been completed.
- check that sufficient quality information has been provided to users in the statistical bulletin and other documents like Background Quality Reviews.
- make certain that members of their team are aware of other requirements set out by the Code of Practice, like adhering to pre-release access.
To coordinate across the team, you should keep records of what has been checked (and how) as well as follow up actions.
Quality assurance checklist example
|Task||Work completed by||Notes||Sign off|
|Check new questions with data providers||Priovide the name of the person completing each task||Provide any additional notes for completing each task||Provide the name of the person who signed off the work. Should be someone not initially involved with the work.|
|Check unlikely values with data providers|
Senior leaders should also:
- be advocates for quality. They should plan to ensure delivery of the goals in the GSS quality strategy within their area.
- ensure that staff are aware of the GSS quality guidance and the training offered by the GSS Quality Centre and that everyone understands their role in assuring the quality of the statistics, fostering a culture of continuous improvement.
Why make analysis reproducible?
Producing official statistics can be time-consuming and painstaking, because we need to make sure that our outputs are both accurate and timely.
In a typical manual statistical production process, the large number of steps and moves between tools in the manual workflow introduce risk and increase the burden of QA. Manual processes are often time consuming and frustrating because steps are hard to replicate quickly. Manual work flows are also prone to error. The input data and the outputs are not connected directly, only through the analyst’s manual intervention.
Reproducible analysis is about opening up the production process so that anybody can follow the steps we took and understand how we got to our published results.
By making our analysis reproducible, mainly through automation, we make it easier for others to quality assure, assess, critique and re-use our methods and results, and for colleagues to assure what we have done.
In a reproducible workflow (we call this a Reproducible Analytical Pipeline, or RAP) we bring together the code and the data that generate the outputs with version control methods from software engineering. This means we can automate the statistical workflow and provide a full audit trail. RAP lets us be fully open about the decisions we have made, so that others can follow what we did and re-create the steps.
Reproducible analysis supports the requirements of the Code of Practice for Statistics around quality assurance and transparency, as, wherever possible, we share the code we used to build the outputs, along with sample data to allow for proper testing.
Benefits of RAP
- In RAP the production process is coded, which creates a process that is completely transparent, auditable and verifiable. QA is embedded into the code, using logging and automatic testing.
- RAP improves quality due to the reduced risk of errors occurring in the production process. Automation can also improve timeliness, freeing up analyst time to focus on the interpretation of the statistics.
More details about RAP can be found on the Reproducible Analytical Pipelines blog post on the Data in government blog and through the RAP champions network. An example illustrating the benefits of implementing RAP in the Department for Transport is given in the case study section.
Finding the right balance
It is not necessary to automate a statistical workflow end-to-end to use reproducible analysis ideas to improve the timeliness, auditability and quality of statistics. Many departments are taking a pragmatic, partial approach to automation, reducing risk and improving efficiency by starting small and identifying quick wins to reduce the level of manual intervention needed.
Reproducible analysis is not a panacea. It supports effective quality assurance, but does not remove the need for proper analytical oversight of the end-to-end process. Automation is not about implementing and forgetting, or building black boxes that cannot be maintained properly.
Once a RAP workflow is in place it should be regularly reviewed and assured to make certain that it remains fit for purpose. A curious analyst will be able to identify issues that automatic tests may not pick up.
Finally, while RAP can automatically generate standard text, graphs and tables, statistical reports will require an analyst’s expertise to write clear, useful commentary. RAP is very useful for generating the first draft of an output and removing the need for cutting and pasting, and while full automation of short, simple documents is possible, this is not the case yet for in-depth statistical commentary.
Finding this balance is the key to implementing RAP successfully. This balance will vary across statistical products depending on the level of interaction and interpretation needed for the data.
Case study: Department for Transport
The Department for Transport (DfT) has been working hard to implement RAP into their processes. They have set a target of 2020 to have RAP implemented, at least partially, in all of their statistical output production processes. DfT is growing the capability of their statisticians through training courses, the RAP champion network and a Coffee and Coding club. If they identify a piece of work or project that would benefit from automation, the team is supported to get the relevant training. The Coffee and Coding club is an informal setting in which colleagues discuss coding and any challenges they have. These sessions were originally run monthly but were so popular they now run weekly.
The data for Search and Rescue Helicopter statistics are produced by aggregating monthly spreadsheets from the Maritime and Coastguard Agency. These were checked manually, but the checks have now been automated using R, with a report produced by R Markdown including maps, counts and tabulations and an R Flex interactive dashboard. This makes it quicker and easier for the person carrying out QA to spot discrepancies. Further information can be found in the Code of Practice for Statistics: V4 case study.
There are many places to obtain support for RAP. There is the RAP champions network run by the GSS Best Practice and Impact Division. This includes a Slack channel for RAP champions and a RAP website for sharing code and ideas. The Government Digital Service has created a free online training course, supported by the RAP companion – a comprehensive guide to implementing RAP.
With increased use of software tools, such as R or Python in producing statistics, there is more need for producers of statistics to be able to quality assure the coding that has been used to produce statistics.
The GSS Quality Centre is developing a new guidance document focusing on this.
The UK Code of Practice states that:
T3 Orderly release: Scheduled revisions or unscheduled corrections to the statistics and data should be released as soon as practicable. The changes should be handled transparently in line with a published policy.
We recommend that your department has a revisions policy that sets out how the reporting of revisions is handled. This should be published on the Department’s statistics web page and reviewed annually to ensure it remains up to date.
An example of this is a revisions policy from the Department for Business Energy and Industrial Strategy. More information on revisions policies and good practice in the handling of revisions can be found in the Communicating quality, uncertainty and change guidance.
Quality Assurance of Administrative Data (QAAD)
Many departments publish QAAD reports that set out for users how they have implemented the Office for Statistics Regulation’s Quality Assurance of Administrative Data toolkit. These reports set out how the producer has explored the administrative data source and assured themselves that the data are of sufficient quality to produce statistics. See the Office for National Statistics’ (QAAD) report for forestry activities for more information.
The Code of Practice for Statistics emphasises that producers of statistics must clearly communicate information on the quality of statistics to users. Clear, concise reporting of quality enhances the trustworthiness and value of statistics and helps users to decide on suitable uses for the data e.g. a good Background Quality Report. Critical information about the quality of statistics and how it affects their use should be included in the commentary and visuals in our statistical releases as well as providing more detail in Background Quality Reports.
Producers should make sure that they are involved in the sign-off process for products that use our statistics such as social media, press releases and ministerial briefings. This will ensure that the strengths and limitations of the statistics are considered when developing messages for these products and that messages give an accurate reflection of the story told by the statistics. Further information can be found in the Communicating quality, uncertainty and change guidance.
GSS Quality Strategy action plan
The GSS, supported by Quality Centre, will implement the GSS Quality Strategy and monitor progress.
Departments have drawn up action plans outlining the steps they will take during the lifetime of the strategy to achieve the GSS deliverables across the four goals. These have been produced by quality champions and Heads of Profession in consultation with GSS members in their departments. A template and example action plan is available on request, please email email@example.com.
Quality champions will work with each Head of Profession to provide biannual updates on the agreed actions to the Quality Centre. This will enable us to measure the progress of the strategy.
The Quality Centre will produce biannual updates for the Statistical Policy and Standards Committee (SPSC) on how the implementation of the strategy is progressing.
This guidance is reviewed every two years.