From zero to hero: How data journalism helped establish the ICIJ as a top investigative newsroom


Over recent years, data has become an indispensable source for journalists and news organisations, providing excellent material for investigative work as well as storytelling. This has led to the emergence of data journalism, which, broadly speaking, uses information science and analytical techniques in conjunction with journalistic workflows to produce compelling stories rooted in data. Despite the relative maturity of data journalism and the growing application of data in editorial workflows, there is still a lot to learn about the systematic, seamless and effective integration of data and computational tools in newsrooms. It is time time for a holistic assessment of this emerging field by looking deeply into the ways newsrooms across the world have adopted data in their day-to-day workflows, the formation of their data teams, their best practices for producing high quality data driven investigative work, their success and failure stories, and emerging training requirements.

The Global Data Journalism study is aimed at bridging this gap. It is a multi faceted project, investigating newsroom practices in terms of their data driven workflows and requirements, as well as the educational and academic aspects of data journalism. This project is being carried out by Bahareh Heravi, from the School of Information and Communication Studies at University College Dublin, and Mirko Lorenz, Innovation Manager of Deutsche Welle and co-founder of Datawrapper. As a part of this study, we are conducting a series of interviews with industry experts in order to learn about current practices in data driven journalism and the discipline’s future directions. The expert interviews are being complemented by a survey on the state of Data Journalism globally, which we encourage our readers to participate in.

For the second interview in our series, following our discussion with Megan Lucero,  we talked to Mar Cabra. Mar is the Head of Data and Research Unit at the International Consortium of Investigative Journalists - or in short, the ICIJ - the organisation known for the Panama Papers, Swiss Leaks and Luxembourg leaks, amongst other investigative work.

Bahareh: First of all, could you give us a little bit of an introduction to yourself and to your role.

Mar: Absolutely. I am a Spanish Data Journalist and I work for the International Consortium of Investigative Journalists (ICIJ), which is a non-profit organisation, based out of Washington DC in the United States. We are known for being the organisation behind the Panama Papers. My role is the Head of the Data and Research Unit at the ICIJ, which is a team of around six to seven people, that is a multidisciplinary team of programmers and data journalists that help the ICIJ do better investigations through the use of technology and data.

How do you describe your role at the ICIJ?

I am an editor or a project manager. So basically I always describe my role as being the air controller of this very powerful team that uses data and technology to help us do better investigations. My role is the role of a typical editor in a newsroom. I hear the stories that we are working on, I suggest things that we should do from the data perspective, I coordinate the work of the team when they start working on stories and then there is an interactive processes throughout the work of the story. I’m also the one that is him kicking people’s butts if they don’t do the work on time, or actually reminding them to meet deadlines, I’m making sure everything happens on time and at the end of the process fact checking and reviewing what the team is doing for the stories that we produce in the ICIJ.

How would you describe data driven journalism to someone who has never heard the term?

We live in a world that is electronic. Everything goes through a computer, pretty much. So in journalism we have been looking at reality around us, filtering it, and then telling a story about it -- a condensed story about what’s the best of what we have seen -- to the people. So data journalism is just basically one of the areas in journalism where we look at those electronic records, data, rows of information, thousands or millions of rows of information with numbers and data, and we use computers to basically crunch those numbers and filter the best of those lines of data, and condense them to tell a story about what trends have we seen, about what stories have we found that weren’t that obvious. So data journalism is about filtering the electric reality around us -- the electronic records, specially the data around us -- to tell stories about this new world we live in.

What do you see as the main benefits and added values of data driven journalism?

I think that there is no way to do good journalism today without looking at electronic records, and many of those electronic records end up being lines of data, with figures, with amounts of money, with detailed information about people doing something, or companies doing something. So the main value is that it helps us understand the reality we live in today, in the era of big data. And it helps us tell better stories, thanks to that. I also like the fact that data journalism is a very broad field.  So it not only allows us to gather a lot of information and condense data and analyse it and to find stories that would not be so obvious otherwise, it also helps us tell stories in a different way, with much more depth than before, because not only you could publish a story with findings, but also you could publish an interactive databases. Or you could do interactive graphics or interactive applications, where the users can explore that data in a more interactive way than before, and in a much more detailed way than before. So it also adds value there in the publication phase, where we can have a much more in-depth connection with the reader, and the reader can have several layers of exploration inside the story.

What is the most important aspect of data driven journalism in your opinion?

One of the best ways to convince editors to do data journalism in the newsrooms is through clicks and through interactive visualisations. In my experience, in Spain for example, I have seen that interactives -- stories that have maps for example -- “the map of” ends up being the top story. In case of the ICIJ the interactive applications that we have done are more viewed and more used by our readers than the stories that we produced. So it is true that the way into editors head, or the way in grabbing the attention of readers, many times is interactivity, it’s visualisation. That means that a lot of newsrooms are actually creating interactive graphics teams, or equaling data journalism equals visualisation or interactive apps. I think that’s good to some extent, but ... I think that data journalism is not just visualisation, or interactivity, it is also to me analysis of complex realities that are hidden into thousands and millions of rows of data, that we can crunch together thanks to technology, and that would allow us to tell better stories, which is what journalism is about. I think we should not forget the analysis part of the data journalism field.

So you would say that maybe the analysis and data visualisation parts are of the same importance?

Yeah, I would say they are both important. I think however that the analysis part is really forgotten in many newsrooms by many editors and that’s why it requires a higher emphasis in my opinion.

What is your opinion about the relevance of data, statistics, visualisations, and coding in newsrooms?

Let me tell you a story. Two and half years ago the ICIJ did not have a data team. Two and half years later we are 50% of the ICIJ staff.  Every single investigation that we do has a data component. There’s no investigation that we do, it doesn’t matter if its a leak or an investigation based on public records, all of them always have a data component. Because there is no way for us to do investigations around systemic issues without data. So as you can see [from] the importance of our team in the organisational structure of ICIJ, data journalism is very very important. And seeing that that’s also happening in other newsrooms, it is the same that a few years back a lot of people were looking at the implementation of social media and social media people inside newsrooms and that was like the new thing, and now nobody disputes that there needs to be community managers in newsrooms. Well it is the same thing [with data journalism]. Data journalism’s importance is growing and it’s here to stay and whoever doesn’t have data journalists in their newsroom is going to be lagging behind in a few years.

How do you see the distinction between data, statistics, visualisation and coding impacting newsrooms?

Data skills can be used in many things in the newsroom. Of course analysing analytics and how the traffic is coming, who sees what she know, SEO techniques and so on, is the very important, and a lot of the data skills are going there. But the field I know is not that one, so I can’t comment on that. The field I know is about using data skills to tell better stories, [it is about] the content production. To me it doesn’t really matter where you are coming from. The main goal of any people in my team is to tell good stories, and to do great investigations. So I have somebody in my team that has a data analysis background, another person in my team is more of a web developer, another person in my team is a data journalist, who now knows how to use data and even started to know how to code. The goal is to really have a good team that can bring an added value to the newsroom and in our case the newsroom is investigative journalists that are very good at their job, but may not be that good at using data and using databases. So I don’t mind where people are coming from, as long as they can add value to the stories. I do think, however, that there is a value in having a coordinated team, that works across the newsroom and helps different desks or different teams or different projects in different ways. I don’t know if I would separate coders here, and journalists here; I actually think we really need integration, and how a team coming from different backgrounds and disciplines can help in a transversal way to the newsroom and newsrooms needs.

If you would be the editor-in-chief, what would be your first action with respect to data driven news reporting?

That question is very difficult to answer because it depends on your are the editor-in-chief of what publication? What are your needs? How are people getting to you? What are people doing when they access your website? Do they access your website? Do they come to you through Facebook? How engaged are your readers with what you do? What is your beat? I think it is a very general question.

Let’s say if you were the editor of a national newspaper, for example.

So I guess my first action would be to see who is in my team, and what are their skills, and whether there are any data skills and how are they being used. So I guess I would do a survey or a study about the people I already have in my newsroom. I think that sometimes editors-in-chief do not know what the value of their newsroom is, or whether there are any hidden talents. So, I would say that my first action would be to scope out what I already have in the newsroom, and whether there are any a data skills, or whether we have done any data stories? How did they originate? So that I could try to organise some team around that person or foster the skills of that person. I guess after that it would depend on how much budget I have. Because if there is nobody maybe I would need to bring somebody from the outside, but one person can only get done so much. So I always also believe in teams. Whenever I talk about setting up data teams, I have always tried to push for having a team, where there are two or three people. I guess to me the initial team would have to be a Data Journalist, and maybe a programmer with coding background, maybe a web developer or a front-end developers, that would help us tell better stories. So if I had money, and I didn’t have those skills inside the newsroom, I would try to hire this small team, that would spend a lot of time the first year trying to show the newsroom what they do, train others and advocate for more data journalism in the newsroom.

How should we teach data driven journalism to others - younger and older, less experienced and more experienced?

Many ways. I remember when I came back to Spain 5 years ago from the United States where I first learned what Computer Assisted Reporting was - I didn’t know what that was - that’s how they called data journalism in the US. I came back just knowing excel and trying to find journalists in Spain who were using Excel to tell stories. And I couldn’t find anybody. I found programmers and activities that were very interested in this happening, and journalists using data. So we started just doing sessions in a place called Medialab-Prado in Spain, where we would just show the skills that we knew. I knew Excel so I taught what I knew in Excel. I was not a super expert in Excel, but I knew how to use Excel a little bit, so I shared with others. There were other people who would show how to extract information from PDFs and etc. And those were very informal sessions where people would come and would get the skills for 30 minutes and then the videos would be online. That that was very useful to start creating a community. So instead of just going to talks of people saying how they did things, we would teach skills. That was very useful. On the other hand I co-created a Master’s degree with two other journalists in Spain. It is a Master’s degree in Investigative Journalism, Data Reporting and Visualisation. That has been going on for four to five years. That was very helpful because we had the undivided attention of 15 journalists for a year, and we taught them many things. From investigative reporting and data visualisation; we taught them coding, but also Excel and databases. And then we also taught them how to visualise data. So that was very good. So of course having a Master's degree is also important because that gives a higher level of skills than if you do a one afternoon session, and they can develop projects. Then the good thing out of that Master's degree was that they then ended up being interns of newsrooms, because that was part of the Master’s degree. That planted a seed in the newsroom to show why data journalism was important and five years later most of the media organisations in Spain have a data team, mostly formed of former students of this Master’s degree. So I think that having Master’s education in data journalism is very important, and of course teaching that in undergraduate programmes would be very very helpful too. I think that Excel and basic analysis of spreadsheets should be in any journalist's backpack. Because it could be useful for anything, even if you are not a data journalist full time. So I would say that there are many ways to teach. Of course online training has also been very helpful. The MOOC that was done by European Journalism Centre on data skills a few years back was very helpful because it helped people around the world to get trained remotely. So I would say that there are many ways to be trained in data journalism and there is no one answer like this is the way. However, what is for sure is that universities have to start adding that to their curriculum, because if not they are failing their students.

Being involved with this area of work, is there an observation you made in the last few years that was a surprise to you? For example, were there any unexpected areas of application or unexpected barriers to adoption?

One of the most surprising things to me is how technology can help us cross borders easily and and work collaboratively across borders. And that involves data and data journalism in that process. I remember around 3 years ago I was assigned at ICIJ to work on a leak that we had – our first leak of 260 Gigabytes, which ended up being the offshore leaks investigation. When I was assigned to work on that project my role was, “hey, here is your hard drive make searches in that herd drive to journalists around the world, and send them the documents around the world”. Immediately within two weeks I was already overwhelmed because I am not scalable, and [I was thinking] we need to hire more people like me. Back then we started thinking about technology, and how technology could help us do that work in an scalable way. We started thinking about cloud-based platforms that could help us share that data with reporters, so the reporters could login and make the searches themselves. Three years later with the Panama Papers, the biggest leak in journalism history, we are talking about very sophisticated platforms that we used, so that reporters could search through documents or even explore the data doing social network analysis themselves if they wanted to. So I think that technology is moving very fast and being very accessible and more accessible every time, and I am surprised how fast this is going and how fast this is allowing us to do better work for cross-border reporting, but also for making data journalism more accessible to people that are not that tech-savvy. We had this platform [wrt Panama Papers] called Linkurious, which is a software by a French start-up, where reporters could just login, enter a name, and see graphs of people connected to companies offshore and see who is connected to who in a very easy way. In some way they did data journalism. Of course some other people could actually write queries to interview that data, but it is fascinating how the tools are making data more accessible, and making it accessible to people who are not tech-savvy. So that is surprising how fast this is happening. I feel like a long time has past, but it is only like 3 years. So I guess that's one of the most surprising things. Also one of the most exciting things. Because it makes me wonder how much more can technology and algorithms and data analysis techniques help us in the future with our stories. So one of the dreams I have is to be able to apply techniques of recommendation engines in search engines to journalism. So you go into Amazon, and Amazon tells you “hey, maybe you are interesting in buying this, maybe you are interested in buying that”, and that is done based on previous searches that you have done, previous purchases that you have done in Amazon. Why don’t we apply that to journalism, so that there is some technology and some algorithms that know what we have done in the past in general, or applied to a project, like the algorithm would know what I have searched in the Panama Papers data, and would tell me “hey, you found the prime minister of Island, maybe you are interested in this other name, which seems to be the president of Argentina”. Right? And algorithms would help us find stories that we didn't know were there. So I think that there is a very exciting future ahead if we keep applying technology to journalism. I don’t even know where we are going, but it is very exciting and I think we should be applying those techniques that are being used in the corporate world for analysing and working with big data to journalism.

If you enjoyed this interview please take part in the 2017 Global Data Journalism survey, and let us know about your experience in this area.