The future of data journalism
In this Q&A, Data Journalism Handbook editor Jonathan Gray explains what data journalism is, how the handbook came about, and which data journalism trends to watch.
What exactly is "data journalism"?
Jonathan Gray: Broadly speaking, "data journalism" is a fairly recent term that is used to describe a set of practices that use data to improve the news. These range from using databases and analytical tools to write better stories and do better investigations, to publishing relevant datasets alongside stories, and using datasets to deliver interactive data visualizations or news apps.
Precisely where one places the emphasis depends on what one thinks is important. This is why in the book we have several sections in the introduction where we've asked leading practitioners, advocates and scholars what data journalism means to them, what makes it distinctive and why they think it is important.
Is the field focused on access to public information rather than data gathered and held by private enterprises?
Jonathan Gray: It is often the case that information from the public sector is simply easier to get than information from the private sector, partly due to access to information laws and information reuse -- "open data" -- policies. There is no reason why public sector data should be intrinsically more interesting to data journalists than private sector data.
We'd love to see more private sector data being opened up for journalists and others to use. OpenCorporates is doing great work to gather basic information about companies around the world into a single open database. Sourcemap is doing interesting work around supply chains and carbon footprints. AMEE is working hard to aggregate environmental and energy data from a wide variety of sources. Perhaps one day we'll see more "carrots and sticks" to support private sector transparency akin to those we see in the public sector.
How did the book come about, and why is there a need for it?
Jonathan Gray: The book was born at MozFest 2011 in London, the annual event for Mozilla. The European Journalism Centre and the Open Knowledge Foundation organized a book-sprint session at the event. We had contributions from the BBC, Financial Times, Guardian, New York Times, Wired, Zeit Online and many others. More than 60 contributors drafted more than 20,000 words in under 48 hours. The current version of the book is the result of a further six months of soliciting for additional material, shaping, and polishing.
Regarding the need for the book: Quite simply, data can help us to answer questions about the world. While it certainly isn't a panacea, or an objective reflection of the world, data is an increasingly important part of our information landscape. Rather than relying on the analysis of public bodies, public relations agencies, or experts for hire, journalists and their readers should be able to explore, interrogate and critically analyze databases for themselves. The handbook is our attempt to encourage journalists to increase their own data literacy, and hopefully the data literacy of their readers.
Who is the book for? Journalists? Developers? Readers?
Jonathan Gray: As we write in the preface, the book is for "anyone who thinks that they might be interested in becoming a data journalist, or dabbling in data journalism." This might include journalists who want to expand their repertoire and learn a bit more about how to work with data. And it might include developers, designers, or data wranglers who want to turn their hands towards journalism. We have a broad church of contributors, and expect to have a similarly broad church of readers.
The main thing we want to do in the book is to give readers a sense of what data journalism is, why it is valuable, provide some good examples, offer insight into how newsrooms and journalists from around the world do it, and explain how to get started.
The first half of the book basically provides inspiration and shows what is possible by looking at what others have done. We hope that on perusing it, readers will think things like "perhaps we could do something like that" or "they've looked at X, but I wonder whether anyone has done anything on Y," or "we could take this even further and do a more detailed investigation into Z." The first three sections are intended to whet our readers' appetites, to grease the wheels of their imagination and to spur them on.
The second half of the book is intended to give readers a sense of how they can get started, and is divided into three sections: getting data, understanding data, and delivering data. Each of these could be a series of books unto itself, so we just want to show them what steps are involved and where they can go to learn more.
At this point, are there any rules or principles of data journalism?
Jonathan Gray: I'm not sure you can have general rules or principles for data journalism any more than you can have general rules or principles for literary composition. But we've made sure to include several sections for more general guidance and advice on some of the core themes, such as anecdotes and "war stories" on getting hold of data, tools of choice for interpreting data, and tips on how to present data to the public.
You could learn how to do some of the things that are covered in the book in a course: For example, how to use software tools to analyze and visually represent data. But there are also lots of things that you can't really formally learn about in a class, such as how to hire a hacker or how to scrape and crowd-source data. We've tried to capture some of these things in the book.
What data journalism trends or needs are you tracking?
Jonathan Gray: Having recently been a member of the pre-jury for the Data Journalism Awards coordinated by the European Journalism Centre, I noticed that, in addition to broader tools that let you generally explore and browse through datasets, some of the best applications let you do one simple thing very well. While I believe that media organizations and NGOs have a duty to cite and provide access to their data sources -- just like scholars should -- some of the most effective examples were surprisingly filtered and focused. Rather than big powerful dashboards or sprawling interactives, these mini-apps would just explain or deal with one variable, often in quite a creative or unexpected way. Data journalists can help to guide our attention to one thing, one issue, one question in a big dataset. Discarding all kinds of other potentially interesting material requires discipline, and I think this is where journalists' editorial and narrative skills can really play an important role, as opposed to the "discard nothing" instinct of researchers and information scientists.
I'll briefly mention two other things that I'd really like to see more of: firstly, short or micro-short videos accompanying data-driven news apps or interactives. A voiceover with some narrative showing you trends and pointing to important developments is a really powerful way to pull people in.
Secondly, augmented datasets cross-linked through to stories: for example, taking a public dataset, combining it with other information sources, or crowdsourcing further details, and then linking items through to relevant news stories. This is a great way to give readers the sources behind the headlines and context around the data in one fell swoop.