17/9/2012

What Kinds of Stories Can You Find in Data?

 
This post by Martin Rosenbaum (BBC), is an excerpt from The Data Journalism Handbook (chapter 5: Understanding Data), freely available online under a Creative Commons Attribution-ShareAlike license.
 
 
Data journalism can sometimes give the impression that it is mainly about presentation of data—such as visualizations that quickly and powerfully convey an understanding of an aspect of the figures, or interactive searchable databases that allow individuals to look up places like their own local street or hospital. All this can be very valuable, but like other forms of journalism, data journalism should also be about stories. So what are the kinds of stories you can find in data? Based on my experience at the BBC, I have drawn up a list, or “typology,” of different kinds of data stories.
 

data_stories.jpeg

Image by Flickr user ashley.adcox

I think it helps to bear this list below in mind, not only when you are analyzing data, but also at the stage before that, when you are collecting it (whether looking for publicly available datasets or compiling freedom of information requests).

Measurement

The simplest story; counting or totaling something: “Local councils across the country spent a total of £x billion on paper clips last year.” But it’s often difficult to know if that’s a lot or a little. For that, you need context, which can be provided by: 

Proportion 
“Last year local councils spent two-thirds of their stationery budget on paper clips.”

Internal comparison
“Local councils spend more on paper clips than on providing meals-on-wheels for the elderly.” 

External comparison
“Council spending on paper clips last year was twice the nation’s overseas aid budget.”

There are also other ways of exploring the data in a contextual or comparative way:

Change over time

“Council spending on paper clips has trebled in the past four years.”

“League tables”

These are often geographical or by institution, and you must make sure the basis for comparison is fair (e.g., taking into account the size of the local population).

“Borsetshire Council spends more on paper clips for each member of staff than any other local authority, at a rate four times the national average.”

Or you can divide the data subjects into groups:

Analysis by categories

“Councils run by the Purple Party spend 50% more on paper clips than those controlled by the Yellow Party.”

Or you can relate factors numerically:

Association

“Councils run by politicians who have received donations from stationery companies spend more on paper clips, with spending increasing on average by £100 for each pound donated.”

But, of course, always remember that correlation and causation are not the same thing.

So if you’re investigating paper clip spending, are you also getting the following figures?

  • Total spending to provide context?
  • Geographical/historical/other breakdowns to provide comparative data?
  • The additional data you need to ensure comparisons are fair, such as population size?
  • Other data that might provide interesting analysis to compare or relate the spendin to?

Comments