15/11/2012

Swimming in the Data Stream

 

Originally published by The Walkley Foundation. Republished with permission.

 

There used to always be an awkward moment when I got asked the question by another reporter: "What does a data journalist do?" There would be a pause while I searched for another way to explain it. But no matter what I came up with, it never seemed to really answer the question. And the reason, I've finally worked out, is that the distinction is false. All journalism is data journalism.

Every story is made up of data in the form of interviews, statistics, findings, observations and background information. Unfortunately much of that information is in an unstructured format – transcripts, notes and clippings – that can't easily be manipulated.

This is in contrast to structured information, most commonly seen in a spreadsheet. This is information that can be sorted, aggregated and combined with other datasets.

What data journalism does is take advantage of two technological shifts that have opened up that world of structured information for reporters.

The first change has been the ever-growing body of publicly available sets of structured data that journalists can explore, crossreference, mash together and manipulate.

The federal government's data.gov.au website alone lists about 1100 datasets that various agencies have released.

In many cases the data is in a spreadsheet format that can be explored using the Microsoft Excel program.

Using spreadsheets and structured data will be familiar ground to many business reporters who have been data journalists for years, manipulating aggregated stockmarket information, economic indicators and survey information to produce stories.

The second change has been a steady increase in the number of tools that nonexperts can use to explore and visualise the datasets. Tools such as Google Fusion Tables, GeoCommons and Tableau let journalists map out data and use charts to analyse data in ways previously available only to experts.

The overall process of data journalism is similar to what we would normally do as journalists – with a few extra steps along the way.

1. Get the data

This can be as easy as going to a government website like the Australian Bureau of Statistics, tenders.gov.au or BOCSAR (the Bureau of Crime Statistics and Research) and downloading a spreadsheet. Or it can be as difficult as having to manually enter information that has been uploaded to a government website on a PDF of a photocopied page.

A story I worked on with The Australian Financial Review’s accounting reporter, Agnes King, about auditor independence turned on the definition of audit and non-audit work. There was no other way than to go through each annual report and record the data manually. Often being a data journalist really translates into being a data entry journalist.

2. Analyse the data

This is where you "interview" the data to see if it has anything interesting to say. It can involve finding the aggregates of the information, ranking the fields or visualising the data.

We usually use Excel, OpenOffice's Calc or Google's Spreadsheets to find the key features of the data, such as averages and totals, and for charting. If we need more complex visualisations, especially if there is some geographic element involved, we'll use GeoCommons, Google Fusion Tables or Tableau to map out the data.

Another story I worked on at the AFR involved looking for the suburbs with the highest percentage of unoccupied properties across the country. The lightbulb moment came when I mapped the data out using Google Fusion. It clearly showed that the most unoccupied areas tended to be those coastal regions popular with sea-changers. It was like a nice tip-off.

A few calls later Ben Hurley, one of our property journalists, had a story that showed that many sea-changers were heading back to the city because the coastal towns were too far away from family and friends and lacked many of the services they needed as they got older.

3. Audit the data

This is the most painful part of the process, but it is critical because one wrong cell or figure can make everything you have produced worthless. I often redo my analysis a few times to make sure the end result tallies up, and talk with internal and external experts about how I went about my calculations to try and find a flaw in the process.

4. Report

Like any story, there is often no correlation between the amount of effort that goes into getting and analysing the data and the value of the results. Most often, the analysis simply provides a pointer in the direction that the reporter should head. But there are also times where hours, days and weeks of analytical work produces... very little that is newsworthy.

I've carried out many Census data crunches that led to conclusions that were completely obvious or already known. The challenge there was to take a breath, let it go and move on to the next story. Often the information became useful in another context as part of a series of charts building an argument or narrative.

5. Deliver the data

The most straightforward approach is to simply do it the old-fashioned way: write a story, create a graphic and commission a relevant photo.

A recent analysis I did of detailed employment data centred on two charts showing where jobs were being created and where they were being lost. In this case, the charts were the heart of the story.

These days there is also the opportunity to tell stories in a number of more exciting ways. You can provide the primary documents, publish interactive graphics, or publish interactive tools that give the readers the opportunity to explore the underlying information in your story.

This is important because the outlier information – the unusual or unexpected fact – is often the core of the story, while readers are more interested in the data that directly impacts their lives.

A story I did with another AFR writer, Katie Walsh, focused on the suburbs with the highest number of rich singles (the top suburb was Mosman in Sydney, if you must know). We published the full set of data. This provided readers with the critical bit of information – the suburbs nearby with the highest number of rich singles. After all, it's only fair that readers can do their own bit of data journalism as well.

 

DIY Data Mining

Tools

Books

Web resources

 

Image credits: Kate Hudson, The Data Journalism Handbook

Comments