The New British Invasion: Scraper Wiki hits the Big Apple


At our New York data camp, we set out to liberate data, teach people to liberate data, and find stories in data. About 100 people showed up for the event, and about 40 of them attended the 'Learn to Scrape' sessions.

The hacking was punctuated by talks by Tom Lee of the Sunlight Foundation and Jake Porway of Data Without Borders.



Dan Nguyen scraped Florida mugshots from and used face.com's API to analyse each photo to tell you the arrestee's mood.

Michael KellerMarc Georges et al. related the NYPD stop, question and frisk data to nine mosques referenced in an NYPD report on surveillance in order to see whether there had been unusual changes in stopping activity around these mosques.

The dataset is insanely messy, but they fortunately had access to a relatively clean version that Data Without Bordes had developed in November.

They were still going strong after the data camp. Refusing to leave, they moved to a different room after getting kicked out of the data camp space.

Mike Caprio and team cleaned a spreadsheet of 80,000 records from the New York lobbiest website to power a site on New York lobbyists based on the Chicago Lobbyists site. It appears that $120 million was spent on New York on lobbiests in 2011.

I helped one team relate contracts from Open Book New York to data that they had scraped by hand (!) from hand-written forms in order to identify pontential conflicts of interest.

I helped another team identify potential stories (outliers) in the NYC Open Data graffiti locations dataset.

Susan McGregor was "clearly hooked" because she liberated lobbyist contract details the next evening instead of watching the Superbowl.

Technical Awards  

Mike Caprio won *Best Data Liberator* for liberating the Iowa accident reports database.

Michelle Koeth won *Best Creation of an API* for scraping New York, NY hospitals from Medicare Hospital Compare.

Jeremy Baron, from UN peacekeeping team, won *Best Use of ScraperWiki* for scraping United Nations PDFs. This team also scraped peacekeeping statistics and contributions.

Honorary ScraperWikian  

Susan McGregor was awarded "Honorary ScraperWikian". We haven't decided what that means yet. 


Teaching the Learn to Scrape sessions and working with many of the project teams, I got the impression that we had opened participants to thinking more about how data can be scraped, transformed and analyzed to identify unusual subsets and potential stories. Split-applied-combined, if you will.

Our 'Learn to Scrape' sessions seemed to work as well; I found several participants who had claimed no knowledge of webscraping prior to the sessions to be creating reasonably complex scrapers by the next afternoon.

What Next?

More data camps are coming up, and several groups plan on contining to work on their projects. But in the mean time, we now have lots of data for you to analyse!