NICAR12 presents: Six promising projects that take data journalism to a whole new level
The original version of this post was published by Dan Sinker on his blog on 28 February 2012. This article has been edited and republished with permission.
I spent an intense 23 hours in St. Louis last weekend at the NICAR 2012 conference. For those who don't know, NICAR stands for 'National Institute for Computer-Assisted Reporting,' and, as the slightly antiquated name might suggest, was founded long before the commercial Internet, back in 1989. Traditionally, the organization, which is run by IRE, Investigative Reporters and Editors, has been about helping reporters use computers to comb through data. Over the years, it has become the de facto organization and conference for news app developers. This year, it felt like the journo-coders in attendance took it to another level.
There was an amazing amount of information thrown around at NICAR 2012 and Chrys Wu from Hacks/Hackers did an incredible job capturing much of it. I just want to write about a few projects that really stood out as exemplifying some of the best that the developer community within journalism can do.
The PANDA project officially launched into Beta in St. Louis, threw a 'provisioning party' to help people get their data spelunking appliance up and running. The tool, which allows for collaborative searching and sharing of data, offers to unlock data across a newsroom, but has a lot of applicability among anyone who has some data that they want to be able to search across. The tool is being built by Christopher Groskopf, Brian Boyer (who put on a panda suit for the occasion), Joe Germuska, and Ryan Pitts. PANDA is looking for beta testers and collaborators, so check out the demo or grab PANDA on GitHub now.
Image by Jonathan Stray
For sheer 'blow-my-mind' value, it didn’t get bigger than Overview, which makes the process of digging through giant piles of documents significantly easier. Creator Jonathan Stray showed Overview off throughout the conference and helped walk people through the install process to get them up and running. The project, which is super powerful, is still in early stages - Stray calls it a prototype -, but he has already used it to comb through 4500 pages of reports filed by US security contractors in Iraq. As it gets built out, it’s going to be an amazing tool for many. Stray even offers a great step-by-step guide for installing Overview on your machine.
Tabletop.js is one of those things that you can’t quite believe doesn’t already exist. It’s a simple tool that allows you to painlessly use a public Google Spreadsheet as the back-end for web content. I spent the train ride home from St. Louis playing with it, and it does exactly what it promises. It’s such a simple tool, but it has all kinds of powerful possibilities. It was built by Jonathan Soma at Balance Media with guidance by John Keefe of WNYC. You can download the tool on Github.
The LA Times Datadesk team gave a presentation as to why it is important to turn many of their Django applications into flat HTML files before deployment. By not relying on the server to generate pages that may not need to be dynamically generated for every user, the Datadesk team is able to save a lot of headaches (not to mention money) serving up all sorts of web apps as straightforward HTML pages. Django-Bakery, their code for making this happen, is now up on GitHub.
Node Web Scraping
I missed this talk, but when I asked on Twitter for recommendations of great things from NICAR, Al Shaw’s talk on using Node.js for scraping web pages got the most recommendations. And for good reason, his straightforward presentation that steps through the process, makes it look like a data scraper’s dream come true.
Campaign Finance API
Although it ended up shipping a couple days after NICAR wrapped up, it’s worth pointing out the amazing work by both Derek Willis at the New York Times and the team at ProPublica in bringing the NYT Campaign Finance API up to near-real-time speed. This kind of work is vital this election season, and it’s truly inspiring to see collaboration between two incredible news organisations. The full documentation of the API is on the NYT Developer Network.
It’s not even March yet and the amount of awesome coming out of the journalism code community is already overwhelming. Let’s keep it going.