Illuminating 2016: Helping journalists cover social media in the presidential campaign


As well as stump speeches, campaign events, and TV ads, political reporters must also cover what is happening on social media. Covering it increases transparency and accountability of the campaigns, and is a way to take the pulse of the electorate. But, the sheer volume of information that flows through social media makes it challenging to report accurately and comprehensively.

To combat this information overload, a team of faculty and students at Syracuse University have created Illuminating 2016 - a computational journalism project to empower journalists covering the 2016 presidential campaign.

The project aims to help journalists by providing a useable yet comprehensive summary of the content and sentiment—that goes beyond counting likes or retweets.


The Illuminating 2016 project has been collecting Facebook and Twitter messages from the official campaign accounts of all of the major party presidential primary candidates through Facebook and Twitter’s Application Programming Interface (API). Data are collected and analyzed in real time, and the Illuminating 2016 website refreshes once an hour with these updates.

The project works off a system that automatically classifies each message into a category based on what the message is trying to do: urge people to act, change their opinions through persuasion, inform them about some activity or event, honoring or mourning people or holidays, or having a conversation with members of the public.

Data currently presented on Illuminating 2016 is accurately categorized approximately 75% of the time. For some categories, the accuracy is up to 85%, such as call-to-action, and the persuasive message (advocacy, attack). For call-to-action sub-categories, digital engagement, media and debate appearances, giving money, and vote, the accuracy is about 80%. For image and issue, the accuracy is 76%. For image, issue, and endorsement, the accuracy is 76%. For the informative and conversational categories, the accuracy is 70%. For the ceremonial category, the accuracy is lower, at around 40%. The reason for the lower score for this category is that there are far fewer of these messages and they often express a wider range of features making them harder to classify.


Image: Machine prediction performance for Types of Strategic Messages in Tweeter and Facebook. Source: Illuminating 2016.

Some messages may contain multiple categories. For example, it is possible for messages to be both strategic messages and call-to-action. Currently, they are classified into one of those two categories based on the strength of the features that distinguish the categories. In the future, we hope to enable messages to receive multiple categories.

The project's data is also available for download in CSV format.


Data is presented through interactive visualizations that allow website visitors to derive their own insights. Visitors can filter data based on platform (Twitter or Facebook), candidates, time frame, and message categories. For each filter setting, a graph illustrating total message activity is generated, as well as the top retweeted messages and liked Facebook posts. These are then further disaggregated by candidate.


Image: Total attack message activity by Clinton and Trump.


Image: Top retweeted and liked attack messages.


Image: Attack message data disaggregated by candidate.

Explore the Illuminating 2016 project here.

Extracts of this post and images are published under CC BY 4.0.