11/7/2012

Data Journalism Awards Worthy Mention - Bus Subsidies in Argentina

 

The winners of the 2012 edition of the Data Journalism Awards were announced on 31 May during a ceremony held at the News World Summit in Paris. In this series of posts we will present each of the winning projects and honorable mentions to understand their relevance to the field of data journalism and provide an overview of the tools and methods used by participants. 

The project to be showcased here, Subsidies for the Bus Transportation System, received an honorable mention under the Data-Driven Investigations, national/international category. To see the jury comments click here. 

ishot-22.05.126.jpg

An interactive visualisation to contextualise numbers and entertain the audience

The aim of the game

This project aimed to kick start data journalism in Argentina and in La Nacion in particular - an Argentine daily newspaper - breaking the paradigm that in a country with no Freedom of Information Act (FOIA) and with a general lack of transparency, data retrieval is equatable to Mission Impossible. The team at La Nacion however, proved that the impossible is possible.

How it all started

In October 2011, the re-elected Argentinean government announced cuts in subsidies for public services (including transportation). This would impact the daily lives of more than 5.800.000 bus passengers. Since February 2012, the national government tried to transfer the metro and bus system to the city of Buenos Aires government, without being clear about the amount of subsidies being transferred.

To get to the bottom of the subsidy transferral issue, La Nacion´s Financial Section Editor, Diego Cabot, needed to calculate the amount of subsidies granted to the bus transportation system as well as identify monthly payments per bus company. To do this, Diego referred to the Transportation Department web site which contained more than 400 PDFs of monthly cash payments to more than 1300 companies - since 2006. By teaming up with a senior programmer the team developed an awe inspiring database with 285.000 plus records - a database which can now be accessed by every Argentinean citizen.

ishot-02.05.128.jpg

One of the interactive charts created for the project

When strange things happen

During the data checking phase of the process, the team suddenly discovered that PDFs had mysteriously disappeared after being published, even though the same name and URLs were kept. They also found that some PDFs had been overwritten but this time without totals, which meant that no citizen or journalist could cross check totals. The cases of February, March and April (2011) were captured explaining this. 

To solve the mystery, the team took this to the 2011 hackathon of Hacks/Hackers in Boston where, with the help of Matt Perry, a “PDF Spy” was created. “Never be fooled by ‘government transparency’ again” said Perry. The team took the knowledge gathered at the hackathon to rebuild their own PDF Spy version which checks overwritten files and downloads modified ones - in the event that files disappear into thin air.

From A-Z

To put it in a nutshell - the project took a gruelling 13 months to complete and is still a work in progress. To complete the project, a thorough knowledge of the bus subsidy system was required. Other necessary skills included: journalism, web scraping, and coding skills, as well as data analysis, manual skills (for cross checking data on a monthly basis), interactive data visualisation, interactive design and last but not least – team work.

What techniques and tools were used?

The team used Visual Basic for applications, RegEx, Excel Macros, Nitro PDF, Tableau Public and Junar Open Data Platform which creates visualizations with Google Charts API and exports to Excel and Google Spreadsheets. ActionScript for Flash interactive visualizations, as well as Ruby on Rails and MySQL were used.

ishot-22.05.125.jpg

One of the interactive graphics used to visualise the data

The top four results

  1. Stories – a lot of stories and visualizations were extracted from the datasets
  2. Original and exclusive information - La Nacion were the only media in Argentina that could extract, convert, merge and analyse these datasets
  3. Thousands of comments and page views - one example (of many) includes: 33.400 page views, 3000 comments and 204 Facebook Likes. See here.
  4. Investigations - some of the investigative possibilities of the subsidies data case are explained in this article.

The challenges 

This included building a database from scratch by merging three different databases and keeping this updated on a monthly basis. Additional challenges consisted of data conversion, updating and creating new information which nobody could use before. Another process included the creation of visualizations, stories and an upcoming data app. For more info refer to “More Data challenges” in their blog presentation. 

Advice for aspiring data journalists

"Our advice is to learn to ask for help, gather people with different skills and become a problem solver. Check information, not only of the last month but changes which may appear suddenly as time moves on. Never give up, frustrations are normal. There are techniques, technologies and skills that can be used. Ask for help locally, in forums and by joining groups using specific techniques. Be proactive - share your problems in hackathons and with innovative groups that have different skills. Visit your BI department. Think big - dream global but act small and celebrate each step. Reinvent yourself: with these new materials you can be creative. Every step we take is part of something bigger. Love to learn and innovate with passion."

 

The Data Journalism Awards is a Global Editors Network initiative supported by Google and organized in collaboration with the European Journalism Centre. Please visit the Data Journalism Awards website for the full list of winners.

Comments