This item is archived. Information presented here may be out of date.

Start with users needs. It makes things better.

An old user removed

If you’ve been following this blog over the past few months, you’ll be aware that we’ve conducted a lot of user research with data users to better understand how people use data and why. This has provided us with a mountain of very actionable and compelling insights which we’ve been able to use to shape the project. Last month we decided to health check our approach by revisiting the alpha’s original research objectives and ensuring all of our activities solely focused on meeting those objectives and answering questions around them.

Research objectives

The project has a very clear set of objectives that the UR could provide insight around:

  1. Does dataset families work as a concept?
    Is the dataset family concept understood by users and does it work as a suitable method for discovering data?
  2. Does the solution meet our “Jobs To Be Done” (JTBD)?
    Does the solution meet users’ needs and support the completion of documented JTBD?
  3. Are we improving the value of data?
    Is the project adding value to the data that is currently available?

The immediate problem

We had no product to test. Due to the complexities of the data transformation, it quickly became clear that we wouldn’t have a robust enough solution to put in front of users until later in the alpha phase. So, how do you show the datasets families concept without any families to put in front of users? How do you prove the solution meets users’ needs if there is no solution? How can we prove we’ve added value to the data if there is no data to show?

We decided that wasn’t going to be a problem.

One thing agreed at the outset was that the project wouldn’t be about building yet another data tool. With the core focus on building a robust data transformation solution, our focus, as a research team, was to get a better understanding of how data customers might want to access data and not immediately assume it would be via a shiny User Interface (UI). We already know a big percentage will access it this way but there’s a decent number who want to use Application Programming Interfaces (APIs), SPARQL scripts, Python, R, File Transfer Protocol (FTP) and don’t want to go via a front-end at all.

We knew a lot but needed to get focused

As previously mentioned, we already had a lot of insight gathered during the discovery phase around who accesses the data, why they access the data, their end goals and the challenges they face. We had a big list of the datasets they use and the tools and websites they access it through. We now had to get to the detail and focus on the specific challenges the project would, hopefully, help them address. These included:

  1. Suitability of the data
    What does it look like now? How easy is it to use? What is the quality like? Do they trust it? How do they convince others to trust it and, in turn, their analysis?
  2. Mixing data from multiple sources
    How easy is it to mix data currently? What are the barriers? How do they physically do it? How often? Why?
  3. Accessing the data
    What are the tools like currently? Why do they favour specific tools and websites? What can we learn from these?
  4. Impact of change
    What if the website they used regularly was gone? The single spreadsheet they download monthly replaced by a datacube? What would be the impact on them, their role, their organisation?
  5. Linked data
    Do they know what linked data is? Do they care? Have the used it before? If yes, what was it like? What did it enable them to do? If not, with a grasp of the concept what are the benefits they perceive? What are the potential issues and concerns?
  6. Questions from the consortiums
    Who knows more about the data than the people who produce it? We asked the data publishers and data experts what questions they would want answering if they had the opportunity to speak to users.

With those focused subjects we put a plan together, booked our locations around the country and started recruiting.

Our activities

Understanding the now – experience mapping

We ran a number of sessions around the country. We asked participants to help us complete an experience map. Using the guiding light of John Waterworth’s approach at the Government Digital Service we tweaked the approach to suit our needs – specifically making the cards more suited to our aims and allowing them the opportunity to walk through more than one experience of gathering data.

We talked to participants for an hour each, filling out cards for every step of the journey they take in getting a specific piece of data. At the end we found ourselves with a stack of journeys, lots of them contrasting in approach and experience. In fact, we have so many we’re still writing them up into a digestible format.

Understanding the now – existing tools

We conducted more sessions, again travelling the length of the country. After covering some of the users’ background we had them complete a simple task “Find the total population of the United Kingdom (UK) for 2016 across a number of existing third party websites. Additionally they could look for a topic specific to their current data needs. The websites we used were:

If we had time we also asked them to complete the task on any other websites they used regularly.

Alongside this, we had them complete a series of tasks on Scotland’s official statistics – a website developed by our partners at Swirrl. The site uses Linked Data and a bespoke version of it is currently being fashioned to help us show our own data in the future.

Participants were given eight minutes per website with no pressure to complete the task in that time. While their opinion and preference was important, we really wanted to see how they interacted with the websites, the route they took (map first, search, browse), the issues they faced and what they understood.

Understanding the future – Linked Data and dataset families

Unsurprisingly we took our research on the road again, this time with the aim of setting the scene around Linked Data. We asked users to start with a simple exercise to look at a page of terms. We gave little context around the terms and asked them to circle any they understood or recognised.

The terms used:

  • API
  • Codelists
  • Concept schemes
  • CSV
  • Data cube
  • Dimension
  • JSON
  • Metadata
  • Excel
  • N-Triples
  • Observation
  • Ontologies
  • OWL
  • Python
  • R
  • SDMX
  • Tidy Data
  • Turtle
  • URIs
  • Variables
  • Vocabularies

The purpose of the exercise was not to make them feel silly, but gather a (albeit very unscientific) feel for how well-known some of the terms associated with data and linked data were. We then discussed these with them. Next we had them explain their understanding of Linked Data and showed them a statement explaining what it is. We also asked them what they thought a dataset family might mean to investigate how understandable the dataset family concept is.

We had them complete a task on another website – The Land Registry United Kingdom House Price Index which is another website built around Linked Data and talked through some of the concepts. Armed with this we then spent the rest of the session talking about the pros and cons of our approach to data transformation and their concerns and the possible impacts it might have on them. They talked about their experiences of data transformation and, in some cases, first hand knowledge of transforming data for their own internal and external data tools. Lastly, we focused on the potential benefits the linked data approach might provide.

Understanding the future – new concepts

As I write this, we’re currently putting together some online activities to support our in person research. Our friends at Swirrl have put together an interesting (and possibly unique) approach to searching data which we’re itching to take out to users. We’ll keep you posted!

The findings

We’re still in the process of collating all our findings so it’s a little soon to be able to share them in any detail. It’s fair to say we have so much insight it’s hard to know where to begin with the analysis. Once complete, we’ll share it with you and our future research plans. We already plan to do some more in-depth research using our own transformed data and we’ve identified gaps in our insight that we’ll want to address in the next phase.


We’re always looking for willing volunteers to help us with our research. Interested? Please complete this form to volunteer to help us with our research. You not only get to enjoy our excellent company, but we can compensate you for your time as well. In case you’re wondering, we really do go all over the UK (at the last count: Cardiff, Chilton, Edinburgh, Leeds, London, Manchester, Newcastle, Newport, Plymouth, Preston, Ross on Wye) so wherever you are, we should be doing something near you at some point.

Jonathan Porton and John Lewis – User researchers
Jonathan and John are both user researchers at the Office for National Statistics.