14/5/2017

Using Twitter to identify and visualize Republican and Loyalist communities in Belfast

 

Belfast is the capital city of Northern Ireland, wherein two opposing groups, Republicans and Loyalists, have been in conflict for the better part of the past century.  In simple terms, Republicans want Belfast to be a part of Ireland, while Loyalists want Northern Ireland to remain a part of the United Kingdom.  One sentence is a poor summary, so if you aren’t already familiar take a look at this article.  It’s already well known that religion is a good indicator of whether an individual is likely to be Republican (Catholics) or Loyalist (Protestants).  In the rest of this post, we’ll explore whether an individual's social media behavior can also be an indicator of these political tendencies.  I’ll show that by looking at who follows whom, we can identify Republican and Loyalist communities on Twitter.

Briefly, a quick overview of what you’ll see in the post.  I’ll begin with data collection and analysis.  Then, I’ll pull it all together and show the results of this methodology for community detection.  To finish, I’ll discuss potential broader applications of the method.

The data

Before diving into the data collection methodology, I’d like to say thank you to Dr. Kevin Scannell for gathering the relationships between Belfast Twitter accounts (more on that in a moment).  The main goal of collecting the dataset was to a) gather all the Twitter accounts in Belfast, and b) keep a record of the connections between accounts (that is, which accounts are following whom).  To start, a number of Belfast-based seed accounts were selected. These seed accounts all have a large number of followers and were chosen to try and cover as many Belfast Twitter users as possible.  

Following the selection of the seed accounts, all Twitter users who indicated their location as Belfast and followed any of the seed accounts were collected. Then, the followers of the followers of the seed accounts who indicated their location as Belfast were gathered, and so on and so on. In fact, as I’m writing this post, the program gathering all of this data is only three generations deep from the seed accounts, meaning that only the followers of the seed accounts and the followers of their followers are present in the analysis.

So, at the end of all this, what are we left with? The dataset includes which Belfast accounts are following whom, and some basic information associated with those accounts, such as outgoing tweets. In total we have about ~24,000 active Twitter accounts that we think represents a high percentage of all Twitter users who list Belfast in their profile.

So what?

From all this data, we can build a graph consisting of two main components:  Nodes, which represent some piece of information, and edges, which represent connections between those pieces of information.  Envision this graph as a network of roads in a city. Think of the locations where roads intersect as nodes, and the roads that lead to those intersections as edges. In our graph, each Twitter account is a node, and the relationships between accounts and followers are represented as edges. Our graph is also an example of a “directed graph”, meaning that if a Twitter user is following another account, the other account doesn’t necessarily follow that Twitter user back.  If we return to our road network analogy, this would be akin to the road network having both one-way and two-way streets.

Now that we have our graph, we can do some neat things with it.  We’re trying to identify Republican and Loyalist communities in Belfast, so we use a graph clustering.  In broad terms, clustering is a machine learning technique that is used to group similar things together.  In this case, we’re trying to group together similar Twitter users. I use the Smart Local Moving (SLM) algorithm to do this.  I won’t get into the weeds, but if you feel so inclined you can read more here.  

The results!

After running the clustering algorithm, we end up with eight clusters. The next step is to see if any of these clusters represent the Republican or Loyalists communities we’re looking for.  I attacked this problem by looking at word usage within clusters and tweet geolocation data (more on geolocation later).

First, I looked at the word frequencies in the tweets coming out of the eight clusters.  I began by looking at intercluster word frequencies, or how often a word was used by Twitter accounts inside a cluster versus how often that word was used by Twitter accounts outside that cluster.  Using this method, I identified two clusters that I think represent Loyalist and Republican communities in Belfast. The analysis I did was more in depth, but here are a few of the intercluster word frequencies to give a quick snapshot of the process.  One of the eight clusters had words like “orange” and “protestant” show up 2.3 and 7.1 times more respectively inside the cluster than outside the cluster.  I label this cluster as the potential Loyalist cluster. In another cluster that I hypothesize is the Republican cluster, various irish language words, such as words like “Béal Feirste” and “agus”, were often used over 15 times more within the cluster than in the other clusters.  

Labeling these clusters based on word usage is all well and good, but we can also look at where tweets from the different clusters were sent from. We use these tweet geolocations to validate our proposed clusters.  

Approximately 5% of twitter users have a geolocation feature enabled in their settings. This means that whenever they send a tweet, the location from which the tweet is sent from is included in the form of a latitude and longitude coordinate.  I took these geolocated tweets from both the suspected Republican and Loyalist clusters and plotted them on a map of Belfast.

Below, you see a map of Belfast obtained from Prospect Magazine showing Catholic and Protestant areas of the city, and by extension Republican and Loyalist areas (since religion is a good indicator of whether one is a Republican or Loyalist). You’ll also find a map showing the locations of tweets sent by the Republican and Loyalist clusters below. The Republican tweets are green triangles, and the Loyalist tweets are orange squares. Each point is slightly transparent, so in areas where the points appear darker, more tweets were sent from that location.  Additionally, the figure is overlain with a heat map to show which areas had the highest density of tweets sent from them.

To my eye, these two figures look remarkably similar.  Many of the tweets sent from the Republican cluster match up with Catholic areas, while tweets sent from the Loyalist cluster match up with the protestant areas of the city.  This, I think, confirms that we have correctly identified Republican and Loyalist communities.  Also, and before I say this I want to make it clear that I am neither a sociologist or psychologist, but it’s interesting that there doesn't appear to be any Republicans going into areas of core Loyalist support and visa versa.

Additional directions

So, having shown how Twitter can identify the Republican and Loyalist communities of Belfast,  it is also extensible to other cities with ideologically opposed groups living within them.  This could be especially useful in places where there is not already an established indicator (like religion in the case of Belfast) by which communities can already be identified by.  Additionally, there’s no reason why the methodology can’t be applied to an area just to see what kind of groups you can find.  Earlier, I neglected to mention the other six cluster that the SLM algorithm identified, but word frequencies gave a good indication of what these clusters might be.  Many were groups that I wasn't necessarily expecting to find beforehand. I won’t go through all the groups, but to give you an idea one of the clusters appeared to be a group of hockey fans (‘hockey’, ‘Giants’) while another looked like it was a group of artsy individuals (‘theatre’, ‘museum’) who, based on geolocations data, spend most of their time in downtown Belfast.

Takeaways

  1. Groups of Twitter users can be identified just by using tweets and connections between twitter users (which is really cool!).
  2. Geolocation data shows there isn’t a whole lot of interaction between Republican and Loyalist communities.
  3. This methodology can be applied to locations outside Belfast.

For those interested, all the libraries and code used on this project are open source and available at the following locations:

 

 

Comments