Save the Data: Future-proofing data journalism


From interactive visualizations to easy-to-use databases, storytelling with data has revolutionized modern reporting. Some famous examples of this genre include the Panama Papers, “The Color of Debt” by ProPublica, and “Gun Deaths in Your District” by The Guardian. These paradigm-changing investigative projects made issues accessible to the public, brought down corrupt government officials, and prompted international changes in transparency, security, and human rights.

Projects like these are disappearing. Why? Because libraries and newsrooms have no process for archiving these stories. Look for important works of data journalism from a few years ago, like the Guardian’s MP expenses scandal, and you’ll see that many of the sites are dysfunctional or have disappeared. Early interactive news projects, such as “Fatal Flights” by the The Washington Post, or “Blackhawk Down” by The Philadelphia Inquirer, are in shambles. These sites were state of the art when they were first published. Fatal Flights was built in Flash, a technology that’s now staring down the end of its life. The RealPlayer videos embedded in “Blackhawk Down” are gone, and good luck finding a RealPlayer codec for an iPhone. Data journalism has become the fragile ephemera of our time.

We want to save data journalism for future generations. That’s why we’re conducting the first survey of the underlying technology used in data journalism. If you are a data journalist, you can help by taking our survey now or spreading the word in your networks. We’ll use the survey results to create new tools for preserving data journalism.

When you think about web archiving, you might first think of the admirable work done by the Internet Archive. So do we! Unfortunately, complex data journalism projects can’t be preserved by the Wayback Machine for a variety of technical reasons. To understand why data journalism disappears over time, it helps to imagine each page of a data journalism project as an object. All data journalism objects sit on a server somewhere, waiting for users to ask to see them in a web browser. The more advanced stuff is custom work that sits on a server outside of the news organization’s main content management system (CMS). Servers, like any other computers, have a lifespan. In the cloud computing world, someone has to pay the bill every month to make sure the servers stay on. Servers are not forever. People unplug servers or stop paying the bill for the cloud service. Journalism organizations are bought and sold; people change jobs; content management systems update. Digital news history has to be preserved manually, just like IRL historical objects.

The survey is open until December 2017. We’ll be talking about our preliminary results at Computation + Journalism, then we’ll share what we found in a larger context after the survey closes. Ultimately, we plan to make a scholarly archive of data journalism.

If you have created a news application or worked on a data journalism story please take the survey here.

Image: University of Illinois Library.