A primer on political bots - part two: Is it a Bot? It’s complicated!


In the first of our three part series, we introduced a brief history of political bots. Texifter’s research on bots began with a group effort to create the Bot or Not: A Briefing Book. This second post presents our preliminary findings.

We interact daily with human-enhanced bots, vendor-enhanced humans, full scale autonomous bots, would-be hobbyist bots, social post scheduling services, and humans who behave like bots. All of these incarnations make the task of identifying bot accounts a tedious process. With our online experiences increasingly guided by automation and bot-like social network behaviors, the distinction between human and bot is becoming increasingly fluid. What may initially seem like an obvious bot is not, upon closer inspection, actually a bot.

Compounding the challenge, not all bots are inherently malicious; many bots offer utility and provide real-time information about traffic, weather, and news reports. For instance, the project was focused on finding tweets about cryptocurrency, and we found numerous bots that function as news aggregators, which share tweets from other organizations or actors who talk about cryptocurrency or use related hashtags. There were a number of these accounts in our datasets. Thorough examination of the bot’s intent and purpose cannot be overlooked in any analysis. We began the project closely adhering to the initial coding scheme detailed in our briefing book, assuming we would be able to identify if an account was a bot or not. The coding scheme was developed by relying, initially, on a number of “how to identify a bot online” guides from DFRLabs, the MIT Technology Review, How-To Geek, The New York Times, botcheck.me, and Mashable. Yet, at the end of research, we had the sobering realization that identifying bots is an incredibly complicated and laborious process.

Our findings

In DiscoverText, tweets as well as their metadata are collected. For this specific project, tweets and profiles that contained terms like “cryptocurrency” and “bitcoin” as well as “altcoin” were collected and sifted through to identify bots attempting to influence discourse around cryptocurrency. There are certain characteristics that may indicate whether an account is a bot, or if it is what we refer to as an “actual person”. For example, bots are less likely to:

  • post real-time photos of events
  • post from mobile devices (e.g. “Twitter for iPhone”)
  • display human-like idiosyncrasies in their posts (e.g. humorous or sarcastic hashtags)
  • less likely to engage in conversations with other users (e.g. asking and fielding questions)

Some of our initial findings correspond to what an average user might suspect as bot behavior. In the image below, we see that the account “sunneversets100” has posted 203,000 tweets despite existing only for roughly a year. Also note that this bot is following only four other users, yet has a high number of followers - many of whom may not be in the same country in that the bot account claims to be in. This is typical of bot accounts, and bots tend to have no geolocation information.

Image: An Obvious Bot - High Volume of Tweets, credit @DFRLab.

In our analysis, confusion and disagreement arose when accounts appeared to be run by multiple people (e.g., the accounts of large organizations), or when services such as SocialFlow, Dlvr.it, or HootSuite (which help users gain more followers with Tweet schedules and other forms of content creation and delivery) were used. These services automate certain aspects of the tweeting process, ensuring that the tweet contents follow a consistent pattern and posting schedule, thereby creating confusion as to whether the account should be defined as automated or run by a human. However, once it became clear that these services/techniques had been used, it was easy to determine that the account was indeed run by a human using one of these tools to enhance their online presence. This example did not fit our definition of “bot” for the project, although it did provoke debate about “bot” qualities in the virtual sphere.

Bots are also more likely to exhibit consistent linguistic texture - meaning they typically have identifiable patterns of punctuation, language, and an overall hyperfocus on one type of content. However, Jenna Abrams, the fake Twitter account that appeared to be a real individual, exhibited some behaviors that were contrary to these characteristics. The account utilised a complex algorithm, allowing it to reply to others. It also had some human manipulation “helping” her to appear as an authentic account. Social media bot accounts, in the purest sense, tend not to exhibit the typical “human” habit of interacting with other users, making Jenna unique in that aspect.

Other linguistic elements are useful in distinguishing between a bot and human. For example, humans and bots tend to use emojis, hashtags, images, and rhetorical devices like humor and sarcasm differently. Empirical research into the differences in both the semantic and visual communicative practices of humans and bots may help uncover specificities in this linguistic divide.

Overall, we determined that one of the most useful methods to identify a bot was through the metadata that was collected with each tweet in the DiscoverText dataset. Although, as mentioned above, there are some intuitive ways of determining if an account is a bot (proportion of followers and following; no photograph for the user profile; fake followers; a robotic consistency in the tweets’ contents), the metadata proved more useful with complex accounts.

The shift to examining the metadata demonstrated that bots are not as easy to spot as many may assume. However, it is clear that relying solely on the account’s content or metadata will not provide a definitive answer as to whether it is a bot, and therefore should not serve as the standalone means by which automation is assessed.

Image: Bots will often use fake photos and have scrambled usernames.

In conclusion, it is our belief that the issues and points of contention that emerged from our coding and adjudication sessions serve as an adequate point of departure for a much more rigorous discussion on how to identify and label automated accounts. We hope that our briefing book, along with further research on this topic, may contribute to minimizing the influence of bots in shaping online political discourse and public opinion. The final post in our series will explore potential solutions to some of these issues, and will also outline directions for future research.

Read Part 1 here.

Image: Jenn and Tony Bot.