The New British Invasion: Scraper Wiki hits the Big Apple
At our New York data camp, we set out to liberate data, teach people to liberate data, and find stories in data. About 100 people showed up for the event, and about 40 of them attended the 'Learn to Scrape' sessions.
Michael Keller, Marc Georges et al. related the NYPD stop, question and frisk data to nine mosques referenced in an NYPD report on surveillance in order to see whether there had been unusual changes in stopping activity around these mosques.
The dataset is insanely messy, but they fortunately had access to a relatively clean version that Data Without Bordes had developed in November.
Mike Caprio and team cleaned a spreadsheet of 80,000 records from the New York lobbiest website to power a site on New York lobbyists based on the Chicago Lobbyists site. It appears that $120 million was spent on New York on lobbiests in 2011.
I helped one team relate contracts from Open Book New York to data that they had scraped by hand (!) from hand-written forms in order to identify pontential conflicts of interest.
I helped another team identify potential stories (outliers) in the NYC Open Data graffiti locations dataset.
Susan McGregor was awarded "Honorary ScraperWikian". We haven't decided what that means yet.
Teaching the Learn to Scrape sessions and working with many of the project teams, I got the impression that we had opened participants to thinking more about how data can be scraped, transformed and analyzed to identify unusual subsets and potential stories. Split-applied-combined, if you will.
Our 'Learn to Scrape' sessions seemed to work as well; I found several participants who had claimed no knowledge of webscraping prior to the sessions to be creating reasonably complex scrapers by the next afternoon.