A number on its own without context is not useful
A number on its own without context is not useful.
Without an understanding of what it is measuring it can’t be used in a meaningful way. Add context by describing what it measures and that data point may become useful for answering a specific thing. Include it in a table with other numbers, and the picture improves again. Even this is only a start to addressing the big questions we face, whether that is sustainable development, climate, social or economic change. Our society is complex and gaining new insights requires taking our table of data points and joining them with sets of data never considered or thought of when it was created. The difficulty we face is that there is a vast range of data. Knowing which data to use is not an easy step.
The Open Data Institute suggests that data infrastructure should be like our road networks:
“Good infrastructure is simply there when we need it but, at the moment, too much of our data infrastructure is unreliable, inaccessible or only available if you can pay for access. Data innovators struggle to get hold of data and to work out how they can best use it, while individuals do not feel that they are in control of their data. Data infrastructure should be as easy to use as our road networks.”
Open Data Institute
This is a sound statement and a great ambition. However, when your data infrastructure is unreliable and the road network you’re stuck with is difficult to navigate, then you’re in double trouble. The Government Statistical Service, perhaps all government data, is in this space. Our data infrastructure is built up around silos – individual organisational systems. The government data landscape is a series of monolithic blocks and the “road network” is not joined up, it is not well sign-posted, and it is hard to traverse, even for the most seasoned analyst.
The Connected Open Government Statistics (COGS) project is looking to penetrate the boundaries around our data infrastructure silos. We want to build bridges that enable greater accessibility and an easier way to navigate our way around it all. The project has carried out extensive research over the last two years. We have been delving into why users need to use our data and getting a better understanding of what they go through to discover and use the data they need. Whether it is to solve a problem, answer a question or support a decision.
What we’ve discovered is no light bulb moment for government users. There is no earth-shattering breakthrough but simple recognition of the obvious – users find it difficult to understand where the data is, who produces the data, which spreadsheet is the right one. They also have to put in a lot of effort to extract, clean and understand the data before they can use it.
The project aims to build a better data infrastructure across our community by better utilisation of standards, metadata and harmonisation techniques. Data that is web-ready and not just machine-ready, metadata that provides context, informs and helps guide users. We are building a platform that can simplify discoverability, usability and interoperability. This is making it easier for researchers and analysts to work on the right data.
So, imagine if there was a place that provided access to data that transcends who produced it. A hub that dissolves the traditional boundaries that keeps our data in silos. An opportunity to discover all the data that exists on a topic and links to other related data. A space where the information you need to better understand whether that data is right for your task is at your fingertips. Imagine!
There is a long way to go before we get to Nirvana. But after two years we understand a lot of the questions and have answers for many of them. We have a structure for the data, we are evolving the metadata and are starting to get the data out there and available. We’re happy to talk more about this and it would be great to hear from the analysis community. Get in touch with email@example.com for more information.