Releasing statistics in spreadsheets
|Publication date:||22 January 2020|
|Author:||Good Practice Team|
|Who this is for:||Members of the Government Statistical Service|
Who is this guidance for?
This guidance is for producers of official statistics who need to release data and statistics in tables and spreadsheets that are clear, consistent, and easy to use and re-use.
What is its aim?
The guidance sets out a common approach, based on recognised good practice, for releasing statistics in spreadsheets. It focuses on what to do when releasing data as part of official statistics publications. You should follow the guidance (as far as possible) in your future publications. There is no expectation that already published data should be changed to follow the new guidance.
What does it cover?
The guidance is split into three main sections.
- Part 1 is about users. Users work with our statistics in lots of different ways, so it is unlikely that a single presentation format will work for all. This section talks about different types of user and what to think about when meeting their needs.
- Part 2 is about how tables are released. It is about making sure that data are open and accessible to all.
- Part 3 is about how to format your tables for reference and re-use. It focuses on how data are displayed within spreadsheets. It looks at the information that different users will require and how to prepare worksheets for consistent data use and re-use.
Know your audience
The most important consideration when releasing statistics is to understand what your users need so that they can use and re-use data appropriately. Different users will have different requirements and these may change over time. You should actively seek feedback from your users and respond to their needs. To support use and re-use for the long-term do not add unnecessary complexity to your spreadsheets.
Here are some questions to ask yourself before releasing your statistics in spreadsheets:
|Who uses your statistics?||Do you engage with your users?||Are your statistics open and accessible to everybody?|
|What are the statistics used for? What questions do they answer? What problem is this solving?
How do users access and view the statistics?
What questions do users ask about the statistics?
What information do users need to make use of the statistics?
What tools do your users use?
Are the statistics re-used in other analysis and outputs?
|Do you encourage feedback from your users and can they easily contact you?
Do your statistics meet the needs of your users?
Do your users have new requirements or suggestions about how to present the statistics differently?
Are all of your statistics still needed and used?
What additional information is needed to support your users in their use and re-use of the statistics?
|Are your statistics presented in ways which are simple and convenient for your users?
Can all users access the data (are they available in open formats)?
Are the statistics a 3* or better open data release?
Is there open and accessible supporting information to ensure appropriate use and re-use of the statistics?
How are the data used?
To ensure that your statistics can be used and re-used consistently and effectively it is essential to understand who is using them and how. Changing technology, like the increased use of data science tools, means that users may want data in new formats. Users may be ‘information gatherers’ who want to use the data for reference. Or, your users may want to add your data into their own analysis. Fact checkers and journalists need to access data quickly and easily, so key figures should be clear.
You may need to publish your data in more than one format and/or in several places to meet user needs. The table below explores how users may be using your data:
|Information gathering users||Analysts and technical users|
|These may be policy colleagues, journalists, academics or the general public. Each type of user will have their own requirements. They may use the statistics for reference, validation of ideas or for general information. They will want the data to be presented in accessible, clear and easy to understand ways. They may be more interested in the stories or the trends in the data than the details.||These users want to access data for their own analysis. They often want detailed data in machine-readable and re-usable formats which they can use with programming tools with minimal manual formatting. Consider whether your statistics can be published in new ways to make it easier to use and re-use them with programming tools.|
Good practice example
The Scottish Household Survey (SHS) team used innovative data comics and visuals to engage with new audiences and improve the accessibility and awareness of the survey. An article in Research Matters called ‘Improving accessibility and awareness of the Scottish Household Survey: Using data comics and visuals’ (PDF, 1.3MB) provides an in-depth view of the methods and results.
Release data in open formats
Always publish statistics in formats that your users can access and use. Generally, the simpler you can make your spreadsheets the better. Consider using a flat format where you can as this is usually better for future use. For example if you are using JSON or XML do not use too many nested levels.
In June 2012, the Government Open Data White Paper set out a vision for open data – to make it easier for data publishers to release data in standardised, open formats. The paper suggested using a ranking scheme from 1 to 5 stars (*) to measure the usability of open data.
|*||Make your data available on the web (in any format)|
|**||Make it available as structured data (for example, Microsoft Excel instead of image scan of a table)|
|***||Make it available in an open, non-proprietary format (for example, CSV or XML instead of Microsoft Excel)|
|****||In addition to using open formats, use Uniform Resource Locators (URLs) to identify things using open standards and recommendations from W3C, so that other people can point at your stuff|
|*****||In addition to using open formats and using URLs to identify things, link your data to other people’s data to provide context|
Data releases from the Government Statistical Service should meet the 3* measure as a minimum. Make your data available in an open format. This could be in addition to your usual output format.
Provide context and guidance
Releasing statistics in open formats supports the public use of statistical data from government. It is important to remember that statistics should not be separated from supporting information. Context and caveats are vital to ensure users have enough information to interpret and make effective use of the data. Supplying links to supporting information is usually enough, unless there are critical issues that users must understand to use the data effectively.
Consider using multiple formats for data releases
When releasing your statistics, the priority is to meet user needs. All data should be available in an open format to support users with different platforms and devices. It may be appropriate to provide multiple outputs on different publishing platforms depending on your user needs.
Good practice example
The Ministry of Housing, Communities and Local Government (MHCLG) published data in multiple formats to meet the needs of their users. When they published the English Indices of Deprivation in 2015 they included a bulletin, an infographic and Excel tables on the GOV.UK website. As part of the same release, MHCLG published the data in Comma-Separated Value (CSV) and N-triple formats, supported by an Application Programming Interface (API) on their open data website.
Good practice checklist
Titles and signposting
- Have a clear spreadsheet title including time period and geographical region.
- Use clear titles (with a unique identifier where possible) for individual tables.
- Use standard and consistent title formats.
- Make titles obvious.
- Minimise the number of header rows and use a consistent number over time to aid machine readability.
- Ensure it is clearly indicated if your columns use different units.
Titles and headings will affect data usability. If users cannot find or understand the data this limits their use. An example of using titles and subtitles effectively can be found in the bulletin Search and rescue helicopter annual statistics: year ending March 2019.
- Have links to supporting metadata and methodology documents.
- Make it clear whether your data are provisional or revised.
- Include a summary of the main subject and breakdown categories.
- Make sure the tables in the spreadsheet clearly state the data source.
- Have contact details for responsible statistician and media enquiries.
- Link to a glossary of essential technical terms and acronyms.
- State the date of next publication and next update.
- Have a summary including a) time period, b) regularity of the data, c) type of data, d) geographical region covered, e) units used and f) adjustments made.
- Include information about each table in the spreadsheet.
- Include links to wider data series or supporting publications.
Consistent symbols and codes
- Use nationally recognised classifications and geography codes.
- Use Government Statistical Service standard symbols (PDF, 168KB).
- Link to any supporting information for any classifications or codes used.
- Put symbols in separate columns from the data.
- Put the higher and lower bounds in separate cells when you use confidence intervals.
- Show where a change is statistically significant using * in a separate cell to the data.
- Highlight any comparability issues both across time and with equivalent statistics released elsewhere in the UK.
- Make sure there are no spelling or grammatical mistakes.
- Use sensibly named tabs which open in the correct place on the sheet.
- Do not put spaces at the start or end of worksheet tab names.
- Keep worksheet tab names as consistent as you can between releases, as changing them limits machine readability.
- Consider transposing your data because its easier to read down a list than across a table.
- Format numbers and pay attention to the displayed rounding and precision.
- Set sensible zoom levels.
- Put separate tables on separate worksheets.
- Don’t use macros or zip your data because these are often blocked by organisational policies so some users may not be able to view the data.
- Keep time series or historical data in the same spreadsheet wherever possible.
- Don’t use blank rows, adjust the row heights and column widths instead to create space.
- Ensure underlying values are unrounded.
- Don’t use formatting like colour to represent information unless the same information is also available as a cell value.
- Don’t merge cells or hide columns or rows.
- Consider using an application programming interface (API) to aid further analysis.
- Don’t add summaries like averages in the body of the table as this causes difficulties for machine-reading.
- Put summaries at the edge of the table if you include them – some users do need summaries.
- Put footnotes and annotations in brackets for machine-reading purposes.
- Hard code formulas to avoid accidental errors in use.
- Don’t use indentation to indicate hierarchies of categories like types and sub-types.
Users need enough information to be able to use and re-use the data consistently. All supporting guidance should be easy to find – ideally on the same page as the spreadsheet link.
This information should be available in HTML format. You can also link to other documentation—the important point is that it is signposted and that users can find what they need. Supporting guidance should apply to the dataset as a whole rather than just for individual tables.
Your users will need:
- contact details for the responsible statistician and media enquiries
- date of publication and next update
- glossary of essential technical terms and acronyms
- supporting metadata and methodological documents
- information on the quality of the statistics
- link to wider data series
- link to any supporting publications or bulletins
Conception statistics, England and Wales, April 2019 contains some examples of linking to supporting guidance effectively.
Consistency and standards in spreadsheet use
For users to be able to use and re-use your data, they must be released in consistent formats. You should ensure that titles, information and tables are consistent across your publication portfolios. Avoid changing individual release titles or table structures between releases where possible.
Harmonise your outputs
The GSS harmonisation team is responsible for the design, development and maintenance of common statistics frames and definitions for statistics.
When producing spreadsheets you should:
- use nationally recognised classifications like geography codes whenever possible – this is mandatory in some cases
- include a guide to symbols used (providing a link is usually enough)
- help users to understand changes in classifications – for example, the Geography Code History Database helps users to track changes in area codes
- ensure that codes are put into a separate column from numbers to aid machine readability
Use standards when available
There are government data standards available on the GOV.UK website. You should stay up to date with latest standards and adopt them when it is appropriate to do so. For example, in June 2018 the Government Digital Service (GDS) published a new data standard for use when releasing data in tabular format.
Good practice example
Office for National Statistics (ONS) supported its 2018 Mid-year Population Estimates publication with several spreadsheets and included links to other data resources that could support users in making the best use of the population statistics. All spreadsheets with geographical information used the correct ONS geography codes.
When designing a statistical spreadsheet, consider how to appropriately communicate any uncertainty to your users. Bear in mind that the user may not read detailed documents or they may have copied the spreadsheet to use in another context.
- make clear if there are any potential sources of bias or uncertainty – users need to know how this impacts on their use of the statistics
- avoid the use of colour as colour can cause accessibility issues and add complications for re-use via machine reading
- place the upper and lower bands of confidence intervals in separate cells, next to the number they relate to, when using confidence intervals
95% confidence interval lower limit
95% confidence interval upper limit
- use * formats in separate cells when showing significant levels – be aware that the credibility of assessing statistics using significance levels is currently under debate
|Significant at 0.001 level||***|
|Significant at 0.01 level||**|
|Significant at 0.05 level||*|
Links to the resources referenced in this guidance:
- Statistics Authority: Code of Practice for Statistics
- Cabinet Office: Government Open Data White Paper
- Ministry of Housing, Communities and Local Government (MHCLG): Indices of Deprivation 2015 statistical release
- Ministry of Housing, Communities and Local Government (MHCLG): Indices of Deprivation 2015 open data release
- Office for National Statistics: Estimates of the population for the UK, England and Wales, Scotland and Northern Ireland, mid-2018.
- Department for Transport and Maritime and Coastguard Agency: Search and rescue helicopter statistics, year ending March 2019
- Office for National Statistics: Conception statistics
- Government Digital Service (GDS) : Data standard for tabular data
- Nature: It’s time to talk about ditching statistical significance
- Research Matters, June 2019: Improving accessibility and awareness of the Scottish Household Survey: Using data comics and visuals (June 2019 edition)
Every two years