Working together to make statistical data fit for the web

The Connected Open Government Statistics (COGS) project is a huge data transformation effort. Sure, we could throw a load of people at the task and get it sorted. Yes, that could happen, but it definitely should not happen. Turning this effort into a sustainable practice for all departments is the real goal. That requires cooperation. There is no universe where individual paths lead to a single point of convergence. To work, this must be tackled together.

One of the areas where we’ve started to influence is “CSV on the Web” (CSV-W). This is a W3C standard for metadata descriptions for tabular data. Typically, the data lives in CSV files and the CSV-W metadata is captured in an additional .json file that lives alongside the data files that they describe.

In December the Open Standards Board accepted CSV-W to be the recommended standard for government data. This is great news for the project as we adopted this approach early on and would like to think we are leaders in this space.

We want to take advantage of this new recommendation and support the use of CSV-W across the GSS by developing a blueprint for others to follow. We are working on creating some worked examples for the Government Data Architect Community (GDAC) and put these early drafts in front of the Standards Subgroup for review. The aim is to get these ratified by the main GDAC group, then create guidance and the formal standard as best practice for producing CSV-W formats. This will support greater adoption that will enable consistent data from departments that will aid more efficient analysis.

Another test area is with the open data team in the Ministry of Justice. We are looking to see how we augment a Reproducible Analytical Pipeline (RAP) process to align to the CSV-W specification. We have transformed some probation data into CSV-W and the intention is to walk this through with the team. The outcome is to understand how to supplement the part of the RAP process that creates the “tidy-ish data”  that produces the output.

There are two other areas, NHS-Digital and the Sustainable Development Goals team, where we are looking to support their CSV-W goals. We’re creating the very first worked examples of these right now and will want to share them in the very near future.

Together these co-operations will support better frameworks for departments to follow. We learn more from each other, it saves us reinventing the wheel and helps reduce duplication across our organisations. More importantly we will have a consistent specification for delivering data and metadata across the GSS – one that works on the web! I’ll write about the outcomes of these in the near future.

There are many areas COGS wants to help – bridge existing silos, breakdown organisational boundaries and lead on innovative approaches like CSV-W. We want to help be the catalyst for all of us working on this change together.

But our situation doesn’t improve if we stay as we are. We don’t improve our ability to analyse a wider group of data if we only continue making things better in isolation. To build a better community takes more than one person, one project or one department – it takes us all. Our statistics are valuable, but they are worth more when we bring them together.

For further information please contact cogs@statistics.gov.uk.

Darren Barnes
Holly Butcher
Darren is Head of Data Publishing at the Office for National Statistics.