Giannina Segnini :“Data journalism empowers journalists with better tools to tell better stories”


Originally published by the Global Editors Network (GEN). This article is republished with permission.


Giannina Segnini is the founder and coordinator of the investigative unit of the daily La Nación, where she has worked since 1994. She is currently experimenting investigative data journalism with her team with great success. 


GEN: First some background and context information about La Nacion: how many journalists are you in total in the newsroom? What is the circulation of the print and web editions?

Giannina Segnini: We have a total of 90 journalists in the newsroom.
The print version circulation is 87.900 in average and we have 120.000 unique visitors on our web page.


GEN: What is the size and organisation of your investigative unit? How many journalists? Are they working full time on investigation stories? Do they have a specific background or education?

Giannina Segnini: We are three journalists and two computer science engineers, all of us are working full time on investigation stories. The three journalists have university degrees in journalism and have been trained in computer assisted reporting and in investigative journalism. The engineers both hold a university degree in computer science and software development. One of the engineers is fully dedicated to the data mining process and the other is developing interactive applications to visualize the data.


GEN: What particular techniques do you use during your investigations? What is, for example, the importance of computer assisted journalism or data visualization?

Giannina Segnini: During 16 years, the Investigative Unit has been very successful in unfolding traditional investigative pieces, most of them related to high-profile corruption issues. More than fifty criminal cases against politicians, businessmen and officials have been originated by the unit's revelations. The Alcatel bribery case (revealed by this unit) triggered judiciary investigations in the United States, France and Costa Rica. (The telecommunications company agreed to pay more than $137 million to settle U.S. charges).

It has been only during the last year that the Unit started to focus on database journalism and data driven journalism as our main output. Now, more than 70% of our stories are being generated by these techniques.

We are extracting data from Internet 24 hours a day, gathering bits and pieces from public databases, cleaning and consolidating them. We then analyze the data to produce information using intelligence-based techniques. This information leads to investigative stories that we report further.

There is no waste of information during the process, as every dataset is permanently stored in a server linked to the rest of the data.

As an example, last December, before the regional elections in Costa Rica, we took the whole list of candidates from all the parties and built a very particular database. We cross-referenced their names with all Court criminal records and with those who had been forbidden to occupy public positions, as well as with the social security debtors' database.

One week before the elections, we revealed that five of the candidates had been condemned of crimes such as kidnapping, fraud and robbery; another 27 were registered as nominees although they had been legally prohibited to serve as public officials as a result of administrative processes against them. We built an interactive map with the group of candidates who had some kind of record. That way, citizens could personally search our database for their local’s candidates.

Of course we had to manually verify more than 400 records, one by one, checking the original files and asking each candidate for their version of the story. Database journalism doesn’t replace journalism’s best practices; neither does it do away with on-the-street reporting. Data driven journalism empowers reporters with more tools to tell better stories.


GEN: Could you cite the softwares and tools you use to treat, modify and analyse the databases?

Giannina Segnini: We are using a mix of open source and licensed software. For cleaning the data we are using Google Refine’s algorithms, to Extract, Transfer, Load (ETL) we are using Talend, and to analyze part of the data we are using I2 the intelligence-based software (Analyst’s Notebook and iBase). We are just starting to learn R to analyze big datasets.

To visualize the data we use the Google Visualization API as well as applications developed by our team in JavaScript and Action script (Flash).


GEN: Could you give us examples of data driven journalism you admire right now?

Giannina Segnini: Simon Rogers at The Guardian is doing a great job on data driven journalism. During the last riots in England, he mapped the suspects' addresses with poverty indicators to see what part did poverty play in the riots that spread across the country. Using Fusion Tables, a Google mapping tool, he showed the world, hours after the riots, the correlation between the two variables.


GEN: Would you say that investigative journalim is being revolutionized by the new tools and trends we see around data driven journalism? Or is it still the same activity since the Watergate investigation: reading and checking thousands of documents to find leads and manipulations?

Giannina Segnini: Database and data driven journalism enhances credibility and independence.

For the first time in history, investigative journalism has powerful tools to discover complete stories, not just the pieces that some source wants to disclose.

If we continue to work in a filtering-dependence mode, we could become rabbits chasing a carrot. There are many experts who can seduce our sophisticated journalistic sense of smell to their advantage. They know exactly how to entice our instincts to persuade us to start an investigation, take us where they want and achieve what they need.

Database journalism allows reporters to investigate in literally a whole new universe of unfiltered data. If we know how to ask the right questions, this data will expose unimaginable stories, never revealed before even by  our most effective source.

I believe quality and “gourmet” journalism, empowered by data, is not only the watchdog needed to keep our leaders accountable. It is also the hope for the development of new and successful business models for journalism.


GEN: Is the "open data" trend helping investigative journalism and transparency? Or is it only a communication tool for governments and administrations?

Giannina Segnini: Data, as a capital resource, is also the most precious raw material for investigative journalism nowadays. There are more than 800 exabytes of data available on the web now. The problem now is not having access to data, but using it intelligently. We are still learning to explore the data –and its buried treasures– with new eyes; we are still experimenting to ask the right questions to it.

The Open data trend is not only helping journalists to investigate, it is also enhancing transparency on the reporting process. Every time we journalists put a complete dataset on the audience hands they can see the sewing of our stories, the decisions we made and the hierarchy of concepts we applied.