Giving data soul: Best practices for ethical data journalism


The author and scholar Brené Brown once said, “Stories are just data with a soul.” For centuries, journalists have been affirming this statement by combining data and interviews to create content that informs and inspires change. It is no surprise, then, that data journalism, which transforms numbers into impactful graphics and enlightening narratives, has become a popular medium for journalists.

Growing alongside the burgeoning computational technology industry, data journalism has shown its unique ability to provide a wealth of information on topics that affect society. In fact, when published along with the proper context, data journalism has the power to expose societal disparities in education or aid, while bringing more voices to the forefront – an essential step in understanding the current state of affairs and ways to improve it. By holding a figurative magnifying glass up to the systemic flaws that permeate our society, data journalism can encourage positive change. My research suggests, however, that when published without context and without consideration for ethics, data journalism can cause harm to those in the news through the perpetuation of stereotypes and biases.

Driven by a desire to comprehend the role of data journalism in combating stereotypes and other systemic issues, I began to study the current impact of data journalism, as well as the ethical guidelines and educational models used within the field. Through interviews with data journalists, sociologist and scholars, it became clear that some data journalism is playing a role in the perpetuation of systemic issues and that improved education and communication in this area is necessary if data journalists are to gain the interpretive and critical thinking skills they need to avoid ethical pitfalls.

Based on interviews and the review of more than 20 publications on the topic, I developed a list of best practices and ethical guidelines for data journalists and journalism educators. My findings and suggestions in this regard are just one step towards a solution. I hope that they will not only bring awareness to the issues, but also serve as a catalyst for a larger industry discussion of data journalism ethics and education. The full report can be found here.

Best practices

1. Review the data to uncover inconsistencies and missing pieces

According to the thesaurus, the word data and the word fact are synonymous, a seemingly harmless association that can cause a great deal of trouble for data journalists who equate data with facts. While peer-reviewed scientific studies and ethically sourced data are great starting points for trustworthy information, the journalist is ultimately responsible for reviewing even the most reliable data to uncover inconsistencies, potential missing pieces and, ultimately, what data is factual.

As data journalist Eric Litke pointed out in an interview, “Data cannot be taken at face value, as doing so requires a litany of assumptions.” He continued, “When such data inconsistencies surface, the journalist at a minimum has to reveal this prominently to the reader, but more likely the data should be limited to a group that is internally consistent or new data should be gathered.”

A crucial step towards responsible and ethical data journalism, identifying inconsistencies and missing data minimizes the potential for inaccuracy, while ensuring that the data is representative of all populations. However, doing so can be a challenge in itself, as it requires journalists to know and understand the methodology used to gather the data, including the risks and benefits of the survey methodology. 

“When we talk about surveys there’s a lot of concern about the methods for sampling,” Kathleen Culver, the director of the Center for Journalism Ethics at the University of Wisconsin-Madison said. “For example, surveys are often set up for landline phones but we’ve discovered that ethnic minorities are more likely to have cell phones, which leaves them out of these surveys.”

Michelle Robinson, a data analyst for Race to Equity in Madison, Wisconsin and a dissertator in the Department of Sociology at the University of Wisconsin-Madison, agreed with Culver, but also stated that there does not seem to be any one methodology that is truly capturing the impact behind the numbers.

“There seems to be a mismatch between racial inequality and the types of methodologies I’ve been trained in,” said Robinson. “I’m very skeptical of the by-design reductionist methods that tend to flatten out an interactive environment.”

Given these inherent issues with data collection, journalists should always consider the methodology used, specifically reflecting on who may have been left out of the data and what impact, systemic or otherwise, that has on the resulting data.

A great example of this challenge occurred in 2014 when the Canadian government began gathering data on unemployment rates in an effort to understand the impact that temporary foreign workers were having on the jobless rates among citizens. The end results showed that the jobless rates in Manitoba, Alberta and Saskatchewan were under the high employment rate of six percent, which suggested that the presence of the temporary foreign workers would not hinder the hiring of citizens in that area. It turns out, however, that the data used to calculate the jobless rates in these areas did not include information from Aboriginal groups living in those areas. This is a serious gap in the data, considering that Saskatchewan alone hosts more than 30 First Nation reserves.

The misstep is clear; eliminating such a large amount of people from the survey resulted in an inaccurate and potentially damaging account of the employment rate in Canada, something that had the possibility of disproportionately affecting a marginalized population. Fortunately, journalists in this case, specifically those from The Globe and Mail, utilized their data interviewing skills to discover that something was missing from the data. This example showcases the power that data has in shaping policy and the equally powerful role journalists can play in dispelling stereotypes and ensuring the accuracy of the data reported.

2. Unpack the concepts

While validating the data is a crucial first step, proper contextualization of the data through systematic reporting is equally essential for ensuring accountability and inclusion. This contextualization requires journalists to unpack the concepts. In other words, to go beyond a basic analysis of the numbers and incorporate interviews with the individuals behind the numbers, as well as information on how systemic issues have affected the statistical results. This systemic reporting is particularly important when it comes to publicly available data on the education gap, housing, unemployment, incarceration rates or other data that can disproportionately affect disadvantaged subgroups and feed into stereotypes.

“I was recently looking at the achievement disparities in education and there were a lot of stories about test scores, particularly in the wake of No Child Left Behind when many districts realized they had giant gaps between white kids and pretty much everyone else,” said Sue Robinson, Ph.D, a professor of journalism at the University of Wisconsin-Madison, who recently completed a book on race and the media.  “So, what you see is an article that discusses the fact that there was a gap, but what we need to see is more understanding of the system. For example, what are the schools doing to eliminate the systemic issues and what are the factors that might be contributing to the gap? We need to have a broader conversation about these issues.”

Though structural constraints on time and resources exist for journalists, making this type of reporting difficult, the fact remains that the proper contextualization of data through systemic reporting is essential to avoid ethical missteps and the oversimplification of complex systemic issues.

This best practice certainly puts a lot of pressure on journalists to fight the perpetuation of stereotypes, but the reality is that data and the way journalists report that data has an impact on societal views, policy decisions, rules and regulations.

In fact, a recent study by Stanford, Racial Disparities in Incarceration Increase Acceptance of Punitive Policies, revealed “that exposing people to extreme racial disparities in the prison population heightened their fear of crime and increased acceptance of the very policies that lead to those disparities.”  

As Michelle Robinson said, “people often come to data with their own preconceived notions, so we need to think ethically about how to approach data,”

While it is difficult to predict the preconceived notions or prejudices of every reader, it is important to remember that they exist and that proper context remains essential in dispelling stereotypes.  

3. Beware of the bell curve

Providing context to the data often requires interviews with the people behind the numbers. This may include those who gathered the numbers, as well as those represented by the numbers. While interviews with individuals can strengthen a data journalism story, they can also lead to the misrepresentation of communities and perpetuation of stereotypes, especially if sources represent only the tail end of the data in the curve. In other words, an interview with only one or two individuals from a community may result in a misrepresentation of the median or average population, and instead highlight only the extremes.

To represent a larger population and avoid the outliers, Michelle Robinson suggests interviewing several members of a community who represent a range of different areas, age groups and financial backgrounds. In other words, showcase what it means to be a member of that data. She also suggests that reporters become “embed in a community, build relationships with folks and really diffuse an understanding of what it means to be a member of that community.”

In doing so, experts hope data journalism can avoid past issues, such as those faced during the onset of the mass incarceration movement.

“Journalism was really caught flat-footed with the mass incarceration movement,” said Dr. Culver. “I guarantee that movement was felt in those communities. It was felt economically, it was felt in schools and in relationships, but because largely white newsrooms were not connected to those communities, they weren’t hearing what was happening.”

It is because of cases such as the one listed above that Culver and many other professionals are promoting a call to arms for increased diversity in the newsroom and for journalists to break out of their bubble and engage with individuals and communities outside of their own. Experts agree that these improved community partnerships will help journalists to avoid the oversimplification and misrepresentation of community issues represented in data. 

4. Follow the golden rules of journalism

Although data journalism has its own set of challenges, the traditional rules and ethics of reporting still apply. The Society of Professional Journalists Code of Ethics, which encourages journalists to seek truth and minimize harm to those in the news, is just as relevant to the reporting of data as it is to more traditional forms of journalism. Data journalism may have its own unique set of hurdles, but despite the challenges presented by changing newsrooms and ever-expanding datasets, an accountable and ethical press is still the best defense against misinformation.

5. Become educated

Of course, the ethical execution of data journalism requires time, resources and journalists trained to interview and report on data. Unfortunately, despite the fantastic efforts of many scholars and data journalism organizations, studies indicate that education in this area is still lacking. According to the recent report, Teaching Data and Computational Journalism by Charles Berret and Cheryl Phillips, only a small percentage of schools are offering education in this area; with just 59 of the 113 institutions they surveyed hosting one or more data journalism courses.  This suggests that scholarship has not yet caught up with the expanding data journalism field.

In addition to increased higher education initiatives, it is also important to consider education for those journalists who have already graduated or who have chosen not to pursue a formal education. In that case, short courses, reading material and other forms of education will be key in promoting an educated and ethical press.

As Dr. Culver said, “technology is across the board giving us new ways to tell stories and data journalism is an important development, but if it’s not done ethically it’s a huge problem. We need to be teaching common statistical reporting to all journalism students so that they are not dismissing this as an ‘I can’t do this scenario’.” She continued, “We need to look at, not just how we train journalists, but also how the deadlines are structured and how the job is set up. Right now, it doesn’t encourage systemic reporting.”


Although there are many challenges and ethical considerations to make when reporting on data, this form of journalism is the key to an accountable and aware society. By exposing data and providing context, data journalists not only keep organizations and government accountable, but also help to expose a wider, more inclusive view of the world’s population. Through additional education initiatives and a broader discussion of the issues, data journalists will play a vital role in this new age of investigative reporting, ultimately, creating “data with a soul” that informs and inspires change.  

Read Rebekah McBride’s full report here.

Image: amk713.