Benefits to government from Reproducible Analytical Pipelines

This case study was written by Jonah Peter Adaun, Matt Gregory and Matt Upson in 2017.

Reproducible Analytical Pipelines

This case study looks at the benefits of using Reproducible Analytical Pipelines (RAP) in producing official national statistics. Open source tools have been used to reproduce statistics in three departments:

  • Ministry of Justice (MOJ)
  • Department for Digital, Culture, Media and Sport (DCMS)
  • Department for Education (DfE)

We have looked at the process before and after software development skills have been gained by the analysts. This has been used to estimate the potential efficiency savings from adopting RAP across government. There are also qualitative benefits such as building capability to use a programming language to making the quality assurance process simpler.

Government Digital Service has previously blogged about RAP and on the potential for it to transform the process of producing official statistics. This included collaborative efforts with statisticians and data scientists in the above departments used in this case study. These blogs also discuss the journey to engage cross-government and others with the ideas of RAP such as departmental heads of statistics profession.

We have blogged about this work on the GOV.UK website:

Official statistics production process

Producing official statistics for publications is an important function of many teams across government. It can be a time-consuming and meticulous process to ensure that statistics are accurate and timely. With open source software becoming more widely used, there are now a range of tools and techniques that can be used to reduce production time, whilst maintaining and even improving the quality of publications.

The process for official statistics production in government varies widely across departments. The short description here is not definitive, but processes often have some or all of the following steps:

  1. Data Store
  2. Statistical software
  3. Spreadsheet
  4. Word processor
  5. PDF
  6. Publish on GOV.UK

What RAP does

RAP borrows ideas from software development and academia to automate time-consuming processes of data assembly, verification and integration, generation of charts and tables, and set up and populate statistical reports. There are potential efficiency savings for analysts, freeing them up to focus on the interpretation of the results – turning four steps into one using a file format that allows intermingling of code and prose. This process effectively turns analysis into high quality documents, reports, presentation or dashboards that can be easily reproduced.

Data product production steps using Open Source programming languages such as R:

  1. Data store
  2. Rmarkdown
  3. Publish on GOV.UK

RAP reduces the number of stages in the statistical production process which saves time, money and also improving the quality assurance of publications by analysts. The main savings come from reduced time spent manually doing quality assurance of the process outlined in Figure 1. Instead, RAP enshrines business and statistical knowledge into code and documentation as part of a software package. However, there are also the costs of analysts learning an open source language which are factored into estimating the benefits from using this method.

Method

This case study has looked at the benefits of RAP from several departments: DCMS, MOJ and DFE. We have generalised the average staff time savings per publication (or efficiency gains after RAP skills have been acquired) to other official national statistical publications.

We have monetised the benefits of analysts using RAP by looking at the time it took to produce the publication compared to after RAP skills have been acquired. This method also includes the time spent building up capability to use the method.

We have generalised the average efficiency savings (£8,800) from the case study to approximately 17,000 statistical publications across government. We estimate between £90m and £149m efficiency savings depending on the number of official statistical publications that benefit from RAP. There are a further 1,462 upcoming publications that could also benefit from RAP but are yet to be released.

Efficiency savings 1

This benefit refers to the total estimated efficiency savings once RAP skills have been acquired on average for each publication.

We have estimated this using approximately 17,000 statistical publications across government. There are a further approximately 1400 upcoming statistics that could also benefit from RAP which have not been released. We have allowed for a margin of error in our estimates.

Number of publications RAP benefits Average annual saving per publicationTotal
16,965 (100%) £8,800£149 million
13,572 (80%)£8,800£119 million
10,179 (60%) £8,800£90 million

Efficiency savings 2

This is based on the average savings from our case study per publication on an annual basis – also once RAP skills have been acquired. RAP is most beneficial for publications that are more frequent. Official stats publications that are a one-off might not benefit from RAP due to the time and effort required for an analyst to learn the skills and for the code to be developed.

Average annual saving per publication: £8,800

This is the annual monetised efficiency saving or gains (across four publications: three in Department for Education and one In Digital Culture Media and Sport) in year two once the initial costs in year one (approximately £2,500 in case studies) of learning RAP techniques in R or other programming language.

Wider benefits

Beyond the efficiency savings, other benefits come from building a process that is fully transparent, auditable and verifiable – reducing risk and improving quality. The result has been the delivery of RAP to automate the production of statistical reports across government.

Qualitative benefits of RAP include:

  1. Transfers software development best practice into the statistical domain
  2. Reduced risk of knowledge loss e.g. when people move jobs
  3. More robust and timely analysis compared to traditional methods
  4. Transparent analysis and quality assurance
  5. Builds staff capabilities enabling them to use a programming language
  6. Morale boost for analysts through automating laborious tasks
  7. RAP can allow analysts to rationalise the process and leverage datasets to speed handling of detailed parliamentary questions and freedom of information requests
  8. Online resource built called a Massive Online Open Course (MOOC) and freely available online for analysts in government and elsewhere to use – this resource also captures data as students complete the course