Data journalism’s finest


A study of projects nominated for the Data Journalism Award.

The phenomena of “big data” and an increasingly data-driven society are doubly relevant for journalism. Firstly, it is a topic that needs to be covered so that the related developments and their consequences are made understandable and debatable for the public. Secondly, the “computational turn” has already begun to affect news production and is giving rise to novel ways of identifying stories of public interest and new ways of telling them. What we can see is the emergence of a new journalistic sub-field most often called “computational journalism”, “data(-driven) journalism”, or “DDJ” for short. While it is often presented as “the future of journalism”, we still don’t know much about the current state of DDJ. To shed some light on the status quo we turned towards what may be considered the gold standard in data-driven reporting: pieces of work nominated for the Data Journalism Award (DJA), a prize issued annually by the Global Editors Network. In a classic, standardised content analysis we examined 120 stories nominated in 2013 and 2014. We identified, among other things, the types of topics covered, what kind and what sources of data they drew upon and which visualization strategies and interactive features they used to convey a story. The complete study carried out by researchers of the Hans Bredow Institute for Media Research and the University of Hamburg can be downloaded here.

What struck us first is that this inherently digital reporting style is clearly practised best by natively non-digital media –more than half of the data-driven pieces nominated for a DJA were published on the websites of well-established newspapers (43 %) or magazines (13 %). The second most important players are non-profit organisations for investigative journalism like ProPublica, who account for just over 20 percent of the stories nominated. Online-only publications, broadcasters, and news agencies were represented with only about six, eight and three percent of pieces, respectively.

Data journalism seems to be dominated by political reporting – almost half of the stories we examined covered a political topic. This is interesting because, at first glance, politics appears to be a field more suited to qualitative reporting about opinions and debates rather than number crunching. However, a lot of these pieces deal with elections, which tend to generate vast amounts of quantitative data through frequent opinion polls in the run-up to the ballot as well as the final voting figures themselves – as analyzed in this article. Political data-driven stories will also, for example, compile and present the distribution of politicians’ opinions on a particular issue – like this project on U.S. representatives’ points of view on gun control – or check the validity of their arguments based on statistical data – as can be seen on this website that fact-checked the statements of presidential candidates in Costa Rica. Pieces like the latter may go some way to explaining why DDJ is often associated with fulfilling journalism’s watchdog role. Other topics frequently covered, sometimes combined with reporting about the related political debate, derive from areas where data are routinely available. Societal issues such as census results and crime reports account for one third of cases while a fifth of the stories are concerned with health and science subjects, like “Canada’s Pill Problem”, or business and economy matters, respectively. Sports, besides education and culture, is one of the topics covered least often. This is surprising since it is perhaps an area predestined for data journalism due to its traditional predilection for statistics, and one that gave us Nate Silver, one of the better known and most influential data journalists.

The stories analyzed are mostly based on financial data, geodata or measured values, like aircraft noise, weather data, train speeds, and these numbers are usually collected at a national level. However, DDJ also adapts to smaller scales because data from a regional or even local context is used frequently too.

Most data comes from official institutions like Eurostat or from other non-commercial organizations. Not even one-fifth of stories are based on data generated by the authors themselves. Related to this is the finding that most data sets were publicly available –in only 18 percent of the pieces surveyed did the data have to be requested from the source, through a Freedom of Information request, for instance. This suggests that the link between data journalism and investigative reporting is not as strong as is often assumed.

Nearly all of the stories (85 %) use their data to show differences and similarities between different entities, for example people of different gender or different neighbourhoods. Another common purpose of data-driven analyseis is to show changes over time.

One of the most distinctive elements of data-driven pieces is the utilization of visualization techniques. Stories in our sample combine, on average, two types of visualization, usually images with maps or simple static charts. More complex visualizations, animations for example, are relatively rare.

Interactive features are a common element used in data journalism pieces as well. More than half of the cases studied allow the user to zoom in and out of a map or provide further information on particular places or on cells in a table when the mouse hovers over them. Just as common is the ability for users to filter data displayed in diagrams, tables and so on using pre-defined categories. However, nearly one-fifth of stories contained no interactive feature at all.

This leads us to conclude, that while data-journalistic pieces do show some similarities, DDJ covers a diversity of topics and takes a number of forms. It is an evolving and flexible reporting style because of the fact that different types of data, analyses and visualization strategies can be combined, or omitted, when it suits the topic and the story.

However, our sample has particular limitations as it is based on pieces that have a double bias. Firstly, as nominees for a data journalism award they represent a somewhat special group. Secondly, these pieces are based on self-selection as any data journalist is able to hand in his/her data-driven pieces to be considered for nomination by the organising committee.

Despite these limitations the sample also has a particular advantage. We can assume that the analyzed cases, as nominees for a DJA, fulfil certain quality criteria and that the awarded pieces in particular are seen by experts in the field as the gold standard, and as such, could influence the development of the field into the future.

Photo: janneke staaks