How to watchdog algorithms: Q&A with Nick Diakopoulous


Algorithms are powerful. They can decide who goes to jail, who gets offered employment opportunities, what videos are suitable for children to watch, and more.

But, with such power, what happens when things go wrong? 

This is a question that Nick Diakopoulous thinks journalists need to explore. In his Data Journalism Handbook chapter, The Algorithms Beat, he shows how the traditional watchdog role of journalists can be refocused on holding powerful algorithms to account.

It's an emerging field of reporting, and can be quite technical, so we thought we'd give our newsletter subscribers the opportunity to ask Nick questions on the subject. A roundup of their questions and Nick's answers are below. 

Reader question: Watchdog journalists can hold powerful people to account by catching them out doing something wrong. How is reporting on algorithmic power different?

Nick: Watchdogging algorithmic power is conceptually quite similar to watchdogging other powerful actors in society. With algorithms you’re looking out for wrongdoings like discriminatory decisions, individually or societally impactful errors, violations of laws or social norms, or in some cases misuse by the people operating the algorithmic system. What’s different is that algorithms can be highly technical and involve machine learning methods that themselves can be difficult to explain, they can be difficult to gain access to because they’re behind organisational boundaries and often can’t be compelled through public records requests, and they’re capricious and can change on a dime.

Reader question: How can journalists investigate algorithms when the methodology used to create them is kept secret?

Nick: There are a growing number of approaches that are being developed to investigate algorithms including reverse engineering, auditing, and crowdsourcing. But the basic idea is that every algorithm has an input and an output. If you have access to the algorithm you can poke the inputs in a million different ways and then look at the outputs to see how the system reacted. You can test things like bias this way, or look for mistakes in the inputs when you know what the output should be. You can also do lower-stakes critique of algorithms this way. One of my favourites in this genre is a story from Katie Notopoulos at Buzzfeed which illustrates a recent change to the Facebook Newsfeed algorithm, not by reverse engineering it, but just by poking it and telling a story about how it reacted and the problems exposed by that reaction.

Reader question: It seems like all the content on Google Alerts these days is automated. Beyond reporting on algorithms, how can journalists investigate content aggregator algorithms so that more human-written stories are picked up?

Nick: Investigating content aggregation algorithms is something we’re really interested in at the Computational Journalism Lab at Northwestern University. We have a chapter just out where we look at biases in the sources surfaced in Google search, “top stories”, and images, particularly with respect to the candidates during the 2016 US elections. In Google’s “top stories” we found there was a high concentration of impressions (44%) to just two outlets: CNN and New York Times. In general, there’s a lot more auditing that could be done around content aggregation sites to understand their diversity and the impact of personalisation, including Google Alerts as the question alludes to, as well as other algorithmically driven news curators.

Reader question: Dis- and misinformation is a widespread phenomena and a method of influencing in today´s internet environment. We are trying to map attempts to reveal and show ‘fake news’ using artificial intelligence (AI). What is your opinion about the need for journalists to confront dis- and misinformation, and the role of the AI in these efforts?

Nick: Combatting dis- and misinformation online needs an all-hands-on-deck approach. That includes strategically using algorithms together with expertly trained people, like journalists, to sort the disinformation from the misinformation and everything in-between. I think that journalists need more computational tools to sort out synthesised (i.e. fake or fabricated) videos and other media, but they also need more training and to wise-up so they’re not duped by astroturfing and bot-driven manipulation of ideas online. AI is not going to magically solve the problem. But it can inform well-trained journalists so they can work on a larger scale and do more verification and factchecking of online media.

Reader question: To say what is or is not ‘true’ is usually easier than detecting ‘manipulated’ information or ‘unbalanced’ news, which could be a matter of opinion. Do you think AI-related algorithms could eventually be more effective than journalists in the fight against fake news? Or could they also be biased in some direction?

Nick: In general, I do not think AI or algorithms are going to be more effective than journalists in the fight for truth. Algorithms are always limited by what is quantified about the world. And they’re not yet smart enough to ask questions about what they don’t know. Oftentimes not-yet-quantified contextual information is essential to understanding what is true or not, and so algorithms are always going to be limited in how far they can go with debunking information online. The key here is knowing the limits of the algorithms and how to feather that into the expertise of computational journalists so that the hybrid system is more than the sum of its parts.

Reader question: Since data journalists often use algorithms in their reporting, do factcheckers need to investigate these algorithms too?

Nick: Investigative journalists are some of the most careful people I know with their methods, and their efforts to open-source those methods so that others can inspect them. This kind of algorithmic transparency is important— if we’re to thoroughly trust evidence we should be able to see how that evidence was produced. I believe that factcheckers should always get to the bottom of how a fact or any piece of knowledge is known. If a fact was produced from a bit of code running some machine learning model, they should check that too. Open sourcing models make it easier to verify them.

Reader question: What level of technical ability do you need to effectively investigate algorithms? Can non-data journalists also investigate them?

Nick: Non-data journalists can absolutely participate in algorithmic accountability reporting! Daniel Trielli and I wrote a piece last year for Columbia Journalism Review that outlines some ways, like looking at who owns an algorithm and how it’s procured, as well as reporting on the scale and magnitude of potential impact. That said, there’s a lot more you can dig into with algorithms if you approach them with the eye of a computational or data journalist. Then you can really start reverse engineering them, or auditing them for bias and discrimination. Those types of studies often benefit from advanced statistics techniques and from knowing what algorithms could do, so that you can orient an investigation towards what might go wrong.

Like what you read? Subscribe to our newsletter for more tips like these and the exclusive opportunity to ask other experts questions about data journalism.

Image: Gene Han.