18/12/2017

Stranger Things: Analyzing scripts to understand emotion

 

As a graduate student in a statistical field, I quickly realized that people who don’t work with data often have one of two responses to the word statistics: “Oh, I hated that class!” and “You must really like math!”

After a few years, I’ve gotten the sense from these conversations that while many people believe statistics and data science are important, most think the techniques are too dense to consider useful for the questions or curiosities they come across in their own lives.

Two methods I’ve tried recently to chip away at this opinion are 1) using data to investigate questions that non-statisticians may find interesting, and 2) presenting data in appealing and easily digestible visualizations.

A recent project I worked on is a good example of both efforts. In this project, I used publicly available scripts for the popular sci-fi show Stranger Things to attempt to understand how emotion is used by the show’s creators, and how that emotion may impact the viewers’ experience. Along the way, I learned a great deal about how to obtain, analyze, and communicate data effectively.

1. Data is out there, but it can be messy

The rise of the internet and the rapid growth of methods for analyzing huge datasets often makes it seem like any question can be answered with enough web searching and the right tools. The truth is often a little more complicated. I typically find that while lots of data can be found for free online, you often have to make some sacrifices.

When I set out to analyze the scripts of Stranger Things, I envisioned a rich dataset with all of the show’s lines, labeled with the character who was speaking them and the timestamp of when they occurred. I was hoping to look at the speaking time for each character, the characters’ individual emotional profiles in each episode, and maybe even patterns of conversations between any two characters. Instead, this was all I could find:

I made peace with the lack of speaker information, but was still determined to look into how emotion progressed over the course of episodes. Luckily, statistics finds a way. I strung the words in each episode out into long vectors, and assigned each word a sentiment value based on a publicly available dictionary. Voila, I had my time streams of emotion. Unfortunately, Netflix wasn’t kind enough to spell out all of the characters’ emotions in their dialogue, nor were the characters predictable enough to produce smooth transitions from one emotion to another. This led me to my next realization:

2. Don’t be afraid of semi-arbitrary decisions

The use of public data, unlike well-curated private datasets, often requires that the analyst makes many (sometimes subjective) decisions along the path to results. With episode vectors that included non-dialogue information (“Joyce chuckles”, “Mike gasps”) and rapid emotional transitions from word-to-word, two such decisions arose almost immediately.

First, do I include words that weren’t spoken? I decided that, yes, words like chuckle and gasp contain relevant information. Second, how do I smooth out the trajectories so they’re more interpretable? I decided to use the sliding window approach (visualized below), and chose a window width of 40 words. Why 40? Well, episodes typically had between 350 and 400 words with a known sentiment score, so using around 10% of the episode’s data seemed like a good middle-ground between using too little data and getting drastic fluctuations, and using too much data and ignoring important small-scale changes.

Decisions like these can be risky, so depending on the situation there are generally two good options for how to handle them. For important analyses, it’s best to try out a few options to make sure that your choice doesn’t drastically change your results. If it does, it’s probably a sign that the decision should be made in a less arbitrary fashion. For less important analyses (say, a weekend project about a TV show), it’s best to at least make your decisions known to your readers. In either context, being up front about these decisions can open the door to new ideas and techniques. 

In my post, I succeeded in mentioning the window width I chose, which became the topic of an interesting phone conversation with one reader, and although I forgot to include my decision about non-dialogue cues, another careful reader was curious enough to ask. In both cases, readers’ awareness of these decisions led to valuable discussions and ideas for future research.

3. Ugly figures obscure interesting results

A lot of statisticians and data scientists are highly skilled at data visualization, but many don’t bother taking the time to make the visuals attractive until the analyses are complete and the final product is being created. With the cleaned emotion data in hand, that was my misguided approach. My first look at the data was a spaghetti plot with all seventeen of the emotional trajectories stacked on top of each other. After straining to look at each one, I concluded that they all seemed to move around, but not in any notable patterns. I quickly colored them by season to see if the two seasons tended to show different episode trajectories, but patterns from both seemed mostly random.

Luckily, before moving on I decided that even if the trajectories themselves weren’t very informative, readers would probably want the chance to look at them. Because the stacked trajectories were hard to look at, I created an animation that highlighted one episode at a time.

After watching the figure loop through the episodes a few times, I started to notice that there actually seemed to be a few common patterns in the trajectories. Having already created a network of episodes based on their pairwise similarities, I applied a network-based clustering technique to the data, and found three highly interpretable groups of episodes: One group of episodes that started on a high note and ended on a low note, one group that fluctuated around neutral, and one group that started on a low note and ended on a high note.

This became one of the most interesting findings, and it would have been missed if I hadn’t taken the time to produce a visually appealing representation of my data early in the analysis process. Though there is a time and place for quick scatterplots and histograms, it’s usually important to remember that the better the figure, the easier it will be to understand what your data are trying to say. Along those lines:

4. Let your data tell their own story (but be careful)

When I set out on this project, I was envisioning a study of how episodes’ emotional characteristics were related to the way they were critically received. I chose to use the AV Club’s ratings, since they were the most well-recognized outlet that published a separate grade for each episode. Unfortunately, they had yet to publish their final two reviews at the time, meaning I had a few days of waiting before I could run those tests. While I waited, I created an animation of the network structure, watched the episodes bounce around, and noticed what seemed to be surprisingly large jumps across the network from episode to episode.

This apparent dissimilarity between adjacent episodes became the inspiration for an analysis showing that adjacent episodes were significantly less similar to each other than you would expect to occur randomly. That result then led me to hypothesize that shifting the emotional structure of episodes might be a writing or directing strategy to keep viewers engaged. In the end, the emotional trajectories showed no clear relationship with the AV Club ratings, but the ‘keeping viewers hooked with shifting trajectories’ finding quickly became a favorite among readers, even though I didn’t begin my analyses with that test in mind.

However, there is a reason I put ‘but be careful’ in the heading to this section. Namely, it’s never a good idea to try any test that seems promising simply because other analyses haven’t panned out, as there is a fine but crucial line between letting the data tell their story and demanding that the data tell stories that don’t exist. If the data don’t seem like they have a story to tell, don’t keep digging for positive results. Instead, think about where your assumptions might have been flawed, and write up your negative results. You never know what role those negative results might play in inspiring someone else down the road. But perhaps most importantly:

5. Remember your audience

Looking back on this project, I have to admit that in spite of my conscious effort to make my work accessible to a wide audience, I don’t think I was very successful. While the work was popular among data scientists and statisticians, close friends in non-statistical fields admitted that they were not able to easily follow my analyses, and the post was generally viewed as being fairly technical.

If I could do it again, there are a few things I would do differently. First, I would take more time to consider which population I was hoping to reach with the work, and to carefully evaluate what is and is not interesting (or even readable) to someone in that group. Second, while not always possible, I would aim to create a separate piece describing my methods in greater detail, freeing up space in the main post for more conceptual explanations and interpretations.

Overall, this project was a fantastic learning experience, and a great way to meet and share ideas with programmers, data scientists, and Stranger Things fans. If along the way I was able to inspire more non-statisticians to consider the relevance of statistics in their daily lives, or to introduce data analysts to new methods they can use for their own research, then it was well worth the time.

About the author

Jordan Dworkin is a biostatistics PhD student at the University of Pennsylvania. His academic work focuses on computational techniques for medical image analysis, but in his spare time he enjoys applying statistics to politics and pop culture.

Explore the Stranger Things project here.

Comments