“But its already public, right?”: The ethics of using online data


In this rapidly changing and rich media landscape, there are multiple, new and emerging ways in which people communicate their opinions regarding contemporary issues, presenting new opportunities for social research.  Social media has become a major platform from which people source, share and discuss news and current affairs.  It not only provides new opportunities for researchers; it presents significant ethical challenges.

These challenges should concern anyone working with online data as the consequences of not addressing them can be serious. There are economic, as well as moral and legal arguments for internet research ethics.  In their new data ethics book, Gry Hasselbalch and Pernille Tranberg argue that companies and organisations who consider ethics a social responsibility have a competitive edge over those who do not.

Starting out as a social media researcher, I was of the view that the public nature of social media data meant I did not need to concern myself with research ethics; that, much like a published news article, I was free to analyse Facebook pages and public groups.

It was not until I came across Michael Zimmer’s article on the ethics of Facebook research that I began to think seriously about internet research ethics.  This article led me to investigate further, firstly by contacting other social media researchers.  I asked, “is the ‘it’s public’ defence arguable? Do I need ethics approval? Should I consider strategies like paraphrasing of data?”

My inquiries pointed strongly to ethics as a serious and growing issue of concern for internet researchers and others working with online data, but I received no definitive answers to my questions.

Desktop research uncovered guidelines and other resources which quickly expanded my knowledge of the topic.  These guidelines outlined various ethical issues, posed questions and provided case studies.  I wanted something prescriptive, but my research only led to further research questions and conversations with academics.  It felt like I was going around in circles.

During this process my concerns about the legal implications of my research and my reputation as an emerging social media researcher were joined by genuine concerns for the privacy and safety of social media users.  The process led me to consider my values as a feminist researcher and how these, along with other considerations, might inform any decisions I made around ethics. 

Below I present some of the key ethical issues and possible ways around them.  This discussion is by no means exhaustive, so this article also provides links to further reading.  In addition to the concerns listed below, researchers and data driven journalists should consider any relevant ethical guidelines, laws and platform policies provided by the social networks themselves.

Is the information public?

One of the biggest concerns with social media data is whether it should be considered public or private.  The public accessibility of social media data is used to justify its use for research purposes, however, as with other forms of data collection involving human subjects, this poses ethical concerns.  In the end, these issues come down to consent.  For example, Facebook members consent to the use of their public data (like posts and comments published on pages and public groups) for research purposes when they sign up to the platform (see Facebook Data Policy, section III), but it is questionable whether informed consent  exists given that social media users commonly report that they do not read terms and conditions.

According to the British Psychological Society, the question of whether social media data can be considered public or private depends on whether social media users can reasonably expect their data to be public.  It would be reasonable, for example, to expect that those tweeting Twitter, a public platform, are aware that they are being observed by strangers.  Conversely, members of a closed Facebook group may expect some degree of privacy since members must request to join and typically cannot see posts until they have been accepted by an administrator.

In such cases, it is advisable to contact site administrators, or gatekeepers of password protected sites, for permission to access data, in addition to seeking the consent of individual users to republish their data.  However, Leanne Townnsend and Claire Wallace of the University of Aberdeen argue in their guide to social media ethics that ethical concerns regarding the reuse of public data need not apply to public figures and organisations whose information is intended to be widely disseminated.

Complicating the British Psychological Society’s guidelines, there is some evidence that social media users feel that, although the substance of their communication may be public, the context in which their data is published implies restrictions on how it should be used by others. In one instance, high school students were angry that their Facebook profile images were used by teachers and law enforcement officers in a school presentation to highlight the dangers of posting private information online.  One student called it a “violation of privacy”.

From this perspective, the publication of even publicly available data can be problematic.  As such, the Association of Internet Researchers argues that binary concepts of public and private do not hold true in digital contexts – an argument that has a number of implications for online research. If we are to place the thoughts, feelings, privacy and safety of users at the centre of our research, which I argue we should as ethical journalists and academic researchers, then we need to consider both the public nature of the data and the context in which it has been communicated.

Re-identification of data

Data anonymisation is one way around the ethical challenges presented by privacy concerns.  But even if data is anonymised in research outputs by removing identifying information such as names, there is a risk that publicly accessible data can be re-identified, raising ethical concerns around privacy, the protection of sensitive information and the vulnerability of users. 

Re-identification of data has implications for the republishing of social media quotes in online news stories, conference papers and academic journals, and for data sharing and reuse, if consent has not been obtained. By simply searching for the quote, the user may be found. Re-identification can also occur by revealing the names of Facebook pages and public groups. To combat these risks, the names of users and groups may need to be anonymised, in addition to the aggregation or paraphrasing of posts and comments.

Sensitive information and ‘vulnerable’ users

If social media users are re-identified through their data, they may become exposed to a range of risks even if their data can be considered public information.

Ethical concerns are compounded when dealing with sensitive data and vulnerable individuals and groups online.  Vulnerable groups include young persons under the age of 18 years and those with intellectual disabilities, among others.  It is difficult to know whether participants belong to vulnerable groups.  Personal profiles provide limited information about the identity of users and people frequently shield their true identities online. 

Sensitive data - for example, controversial political opinions or discussion about illegal activities - can be excluded from qualitative analysis, but it can be difficult to predict the harm that might arise from a user becoming re-identified through her data.  The nature of these risks might range from embarrassment to more serious harms for users, such as reputational damage and loss of employment.

Addressing the ethical challenges of social media research

Informed consent

As with traditional research methods, social media researchers can obtain the informed consent of users to avoid ethical issues relating to re-identification and privacy.  Ethics guidelines and consultation with researchers have suggested strategies for obtaining consent, but seeking informed consent online can be particularly challenging. For example, researchers cannot guarantee that participants have completely read and understood information about the research.

Furthermore, it can be impractical and ethically questionable to contact individual participants when working with online data. For example, I am using Netvizz, a tool developed by Bernhard Rieder of the University of Amsterdam, to extract large quantities of anonymised Facebook data published more than two years ago. 

It could be argued that consent is only required when using verbatim quotes, given the ease of re-identifying this data through search engines. However, since it is difficult to determine whether social media users are part of a vulnerable group, researchers cannot determine if greater protection should be provided.  Leanne Townsend and Claire Wallace argue that data suspected to originate with vulnerable persons can be eliminated from the research, yet this can still be difficult to ascertain.


Anonymisation techniques, such as paraphrasing, may offer an alternative to seeking the informed consent of users, but they too have limitations.  It is important to consider that some users may prefer acknowledgment to anonymisation.  From this perspective, informed consent is preferable to paraphrasing data.  Also, meaning can be lost in paraphrasing.

Leanne Townsend and Claire Wallace suggest paraphrasing data published in research outputs if there is a risk of harm to participants and consent has not been obtained.  They also recommend putting in place steps to ensure the paraphrased data does not lead interested parties to the online profiles of users.  Similarly, Annette Markham makes a case for ‘fabrication’ as an ethical and creative method for qualitative researchers and others using online data where vulnerability or potential harm is not easily determined.  Fabrication is the “creative reproduction of information gathered in the field” for the purpose of protecting the privacy of research participants. It includes a range of practices, such as creating composite accounts of persons, events or interactions (for example, online forum interactions).

There do not appear to be many social media studies that utilise paraphrasing or fabrication, so I abandoned this idea in favour of changing my research methods and aims.  In academia, fabrication is an “oft-feared practice” because it contravenes traditional (positivist) methods of representation.  As Annette Markham points out, this is despite the influence of feminist, post-structural and post-positivist thinkers who have “long and
persuasively held that the entire enterprise of social inquiry is one of invention with more or less degrees of systematicity, integrity, and rigor”.

One researcher I spoke with suggested gated access to the original database as a solution to concerns of data integrity and rigour.  Access to the original quotes could be granted to editors and peer reviewers so as to verify that the paraphrasing retains the original meaning.  As a further measure, gated access could be provided following publication to those who can prove a reasonable need for the data and good reputation for maintaining the privacy of users. 

Data aggregation

Aggregation or the gathering and expressing of information in summary form, such as through data visualisation, offers another approach to data anonymisation.

Informed by a feminist perspective of ethics, my research takes a cautious approach to the ethical challenges of qualitative research.  As such, in my upcoming research I will conduct a computer assisted content analysis (as opposed to qualitative analysis, which I originally planned to do) of my data, which includes the posts and comments of Facebook users. The names of groups that may be considered ‘vulnerable’ will also be anonymised to further ensure the privacy and safety of participants. 

I will take a less cautious approach with the data of public figures and organisations and will qualitatively analyse the posts of these participants, using verbatim quotes as examples in my published research. 

Having only begun to consider ethics after I devised my research plan, I found myself having to adapt my research aims and methods in order to proceed ethically as a feminist researcher.  Going through this process, I learnt that ethics cannot be ‘tacked on’ to social media research - it must be core to all your data decisions.  That is, the particular challenges of doing online research require researchers to think about ethics from the outset of the project and throughout the entire research process.  Part of this process involves reflection on the values and beliefs, or paradigm, underpinning the project, in conjunction with the myriad of factors that inform the ethical use of online data.

Image: Sebastian Sikora.