18/8/2016

The Most Timeless Songs of All-Time: A data driven approach to music journalism

 

Until recently, it was impossible to measure the popularity of older music. Billboard charts and album sales only tell us about a song’s popularity at the time of its release. On a journey to prove, with data, that ‘No Diggity’ by Backstreet is timeless, Matt Daniels, from Polygraph, created The Most Timeless Songs of All-Time.

To find out more about The Most Timeless Songs of All-Time, Enrico Bertini and Moritz Stefaner from Data Stories interviewed Matt Daniels about the project and how he developed it.

Data Stories: Can you briefly introduce this project and tell us how you generated the idea, and then how you realized the project?

Matt Daniels: Yeah, this was actually the first project on Polygraph, and Timeless project was actually the impetus for the site. And the story behind that project was, in September of 2015, I was walking down the street listening to, I think, "Back That Thing Up” by Juvenile -- and I was like this is a good song, but I wonder if my children's children will hear this song and think about it as fondly as I think about songs from the '50s, like by Frank Sinatra or Etta James. Will they think it's really absurd that this was a thing in the early 2000s? So from there I thought it would be really interesting to see what's still popular today from maybe 10 or 15, 20 years ago, as a way to start predicting what is standing the test of time. And will there be an instance where, in 2030 or 2040, people will look at our music and it will be just as much in the zeitgeist as, like, a Frank Sinatra is today? So that was kind of the spirit of the project, and then the way it manifested into the article was getting Spotify data for a full year. We used 2014 data, since this was published in late 2015 and looked at what is the most popular music today from the '90s, the '80s, the '70s, '60s, et cetera, on Spotify, which was a really interesting measure when you consider that music from the '50s is now 65 years old. We probably are at a point where it's reached an equilibrium where, yes, our children, our children's children will probably listen to the same '60s music as we did. And, yeah, that was the spirit of the project. So, you can go to the site, and look at what is the most popular song from the '90s, and also see where "Back That Thing Up" stands.

Screen-Shot-2016-07-01-at-14.07.14-1024x666.png

Image: Matt Daniels at Polygraph.

You have one chart, called "What's Remembered from the 90s”, where the songs are placed in a graph and you have each song represented by one dot, with the face of the singer behind it. And you can see on the far right, there is Kurt Cobain with the most popular song ever - more than 50 million plays. And what I really like about the analysis that you made and how you comment on this is that there is a difference between songs that are still popular today after many years, and how they scored back then in Billboard and similar charts, right? And not necessarily the most popular songs back then are those that are still popular today.

I think that's the most interesting part from a story perspective. Ignoring the visuals for a second, there's definitely a tension between past popularity not necessarily correlating highly with present-day popularity, and the implication of that is you can take "Single Ladies" by Beyonce, or a Taylor Swift song, and say, okay, that's so culturally pervasive, we can't imagine a world where your grandchildren would not know that song. But there are also plenty of examples from the '50s and '60s and '70s where you had effectively the "Single Ladies" of their day, the number one songs that were so pervasive and charted at number one for so long, that you couldn't imagine a world where people in 2016 would not be listening to it, but there are actually plenty of examples. Then, actually, the inverse is also true. You have underground songs, kind of like a Lana del Rey song from today, that surely is popular, but isn't Taylor Swift-Beyonce popular, and they have actually grown significantly in popularity over time, far outpacing the popularity that they had when they were released. So, a good example of this is Etta James' "At Last," which actually charted on Billboard; however, did not chart very well and has, for some reason or another, slowly grown in popularity over time. And, I think, it's now a popular wedding song. So those are the types of interesting trends to find in the charts, and because they're coded, you can look for the songs and look for the trends and see things that you would have never been able to see on my own.

Screen-Shot-2016-07-01-at-14.07.04-1024x629.png

Image: Matt Daniels at Polygraph.

And you used the Spotify API for the current plays, right? Is that an API that gives you a lot of access to the low-level data, or did you have to do a lot of tricks to extract the relevant information? How is it working with that data source?

Spotify only releases lifetime plays for the top 10 songs for each artist. So this was done via a private data dump from one of their data partners who is now owned by a different company. However, I had access to the data via them, and then they went, and then once the project was finished we had to go to Spotify and say, hey, we made this thing with your data, are you cool with us publishing on the internet? And they could be like absolutely not and shut it down, or they could be like this is great. They fortunately said this is great, and we went public with it.

How do you decide on what visualizations to use, how to design the page itself, and the narrative structure?

A lot has changed over the past nine months in terms of designing these articles. With the one that we're talking about, generally I try to avoid any initial analysis. I have an idea of what the data looks like, but I don't really know what it says until I make the chart, which is a little bit counterintuitive. I think most people from a data visualization perspective will do the sequel queries; do the analysis in Excel, run the models, run the regressions, and then have an answer, and then try to visualize the analysis and say, okay, here's the thing that I think is interesting in the data. They have an insight and they try to represent that insight via some chart. I don't do any of that, and it's very problematic in many ways because I don't know what I'm trying to visualize. Instead of trying to visualize the insight, I'm actually trying to visualize the data and the story. So I avoid any sequel queries or Excel analysis. I don't know what the data says until I actually see the D3 visualization in the browser. For example, I wanted to visualize the top '90s songs, but I didn't know what would be number one. I mean, I had an idea. I peeked, but I didn't know what would be number two and number three until the chart was made. And I was, like, oh, this is interesting. Look how far number one is from number three. Or I would make the chart and it would be, maybe a boring table, and I'd be, like, oh, here's what's number one and number three, but I don't see anything interesting in this, so I would try another chart until I really see something interesting. So, really, keep trying different visualizations until you see answers to your initial question, which was, in this case, what is still popular today from past decades?

And will you write the text and do the sequencing of the charts afterwards, basically, when you have a good idea of what seems to be interesting, and what seems to be a good way to present the topic?

Yeah. The first chart is really the answer to the question, 'cause you expect nobody's gonna scroll past the first chart, which is generally always true. And then the narrative structure is more of a necessity. It's a burden, in my mind. And this is a very divisive thing that I do as well - I'm trying to kill all prose in my work. Which is weird, because you need prose to explain the story. But actually, I think it's a little bit of a crutch. It means that I need the prose to explain what the visuals say, rather than the visuals to explain what the visuals say.

But you can also give background, or talk about causality behind, you know, the plain surface information that everybody sees, right? I mean, I think that can be quite valuable.

Yeah you're right, in the narrative I went through exactly what I've talked about with you, which is that you have songs that were effectively the "Single Ladies" of the '50s, that aren't popular today. And it's really hard to say that in a chart, and really easy to say in prose; however, I am trying to get to a point where I can carry that story with just visuals and as little sentences as possible. From a work standpoint, it is making that first chart and making sure it answers that initial question. And then fleshing out the nuances in the question, such as that disparity between historic and present-day popularity, in further charts. And then obviously some prose to connect everything together.

I just want to briefly mention that you seem to make the charts so the first one typically is the one that tries to answer the narrow question that you started with, but then as you progress, you give more freedom to the reader to explore some aspects on their own, right? I think this has been called something like the martini glass structure.

Yes, absolutely. That is a thing I've done on every project, which is not starting with the whole data set. If I were to start off with a chart that is just purely about the present-day popularity of older music - I think the whole data set was tens of thousands of songs - it would just be an overwhelming visualization. People would walk into it and say, this is too complex, I don't know what it says. So I purposefully narrowed the data set just to the '90s, and I've done this on almost every project, just to get, almost like an amuse-bouche for the article. This is an article about whether "No Diggity" is still popular today, relative to "Smells Like Teen Spirit." And that, I think, builds the mental model to then go look at 50,000 songs' play counts by year, and then also their historic Billboard data, which is, again, hundreds of thousands of data points, but once they have that mental model built with that small chart, it's a lot easier to process. So, you're absolutely right – that is a visual trick I've tried to employ on almost all the articles.

Visit the The Most Timeless Songs of All-Time project here.

This piece is an edited version of an interview that was originally broadcast on the Data Stories podcast by Enrico Bertini and Moritz Stefaner. Listen to the full podcast here.

Comments