Untangling Tennis: A visual and data analytic exploration of success in tennis


The life of a professional athlete is not a smooth ride, it is full of ups and downs, life-changing victories and crushing defeats, serious injuries and awe-inspiring recovery. It is also glamorous, athletes are cherished, admired and often criticized as celebrities. The data driven project, Untangling Tennis, looks at both sides of this sport, and asks if there is a relationship between success and popularity.


To find out more about Untangling Tennis, Enrico Bertini and Moritz Stefaner from Data Stories interviewed Kim Albrecht, the project’s creative direction and data visualization lead.

Data Stories: Can you first describe roughly what the project is about? And, also, if you could let us know how it all started?

Kim Albrecht: For sure. So the project is a part of a bigger project in the lab, which is about success and trying to use data to understand success. So we had to come up with some kind of definition, how to frame success. And in our eyes there are two components to this. One is the performance - so you need to do something ‘good’; you need to do a good podcast, you need to be very good at writing, you need to do a great data visualization.

But this also needs to be reflected from the society. So there needs to be some kind of component to reflect that; if you're like an amazing writer, but nobody knows about this, and you're not really successful.

So it's these two components and how they play together to create them, this kind of success.

In the Untangling Tennis project, it's the first time we actually looked at success and sport, and it started I think like two years before I came - so it's a very long project, it's going on since like three years. And when I came in, they already had the data and the analysis. My colleague, Burcu Yucesoy - she is a physicist, it's her project, it's her and my project - had already done a lot of analysis. And she actually created a model. So that was the point when I joined the team.

And she already had a few hypotheses of findings? Was your role more to take this like ready inside and illustrate them, or was your role also to help with the exploratory data analysis and finding out what can be said with the data?

It was purely exploratory. The idea she had was for visualizations that were on a very high level, so like scatter plots of all the players. And my role when I came in was actually first of all to find outliers, to find things that don’t match -- where is the model? So this model is predicting Wikipedia page use, based on your performance data. She used different measurements of performance to predict the page use you're going to have tomorrow. So that’s the idea. And I joined the team to find people where this was not fitting. And then we used the visualizations to understand why this is not fitting. So that was my big role in this project.


Image: Kim Albrecht for Untangling Tennis.

You already said that you started by creating some exploratory charts to find outliers. But what I really like of the project is that it’s very well polished and beautiful and insightful. So can you describe how you went from say one single scatter plot to the final product

Yeah, so I mean it's a big project; all the projects here in the lab run for a long time. László, the project’s coordinator, aims to have projects which run like two to three years. And so that’s a lot of time, and we did a lot of charts; I’ve got like a hundred different things that we did and we threw out. One way this came about was we created a botshot kind of thing where we compared the model and the actual page views.


Image: Outliers - As in anything, there are exceptions. Some players seem to be much more popular than what their performance indicates. Here we see a few examples of players whose fame supersedes those of the same best rank. Kim Albrecht for Untangling Tennis.

And then it was like oh okay, but this is from 2009 to 2015, aggregated data, so what if we look at it per year? And then we created these little charts. And then it was like okay, now we’ve got this visualization set, show us how it goes by year. But what about a closer view? And like that. So the design process was very much like doing simple visualization and then adding more and more data into it.

What were the lessons you learned, like when you published I guess you had some ideas of what would be the most successful part, or what people will click on. Is this what happened or were there surprises in terms of the reception? Which were the parts that people referred to the most, like what’s the bigger picture here in terms of reception?

The reception was not that good, I have to say.

It was very interesting. And I am trying to figure out what the problem is, so what’s going on here and I think it's not really something that you can easily digest; so you have to go there and you have to spend some time to understand what’s going on; and there a lot of charts that interact and that play around.

But that’s a typical problem in science communication, you always have something way too complicated and somehow you need to think about “Ah, what do we do?”.

Yeah exactly. And we’ve got a lot of projects coming up or going on, so I am trying to figure out how we can make this – I would love to give people access to the entire project, I mean this is a reflection on the entire project – also accessible. I think what we need is this first “aha” moment which might be missing a little bit. So we tried with this long page and trying to tell the story and then also have this exploratory view tool that you can use. But we didn’t nail it completely in this project.

The outcome of the project is a little bit yes you need to perform well to have a lot of popularity, so that’s also like okay. We found some outliers, and there are some interesting people in there who are behaving very differently and there are these things you can do to record this stuff; so, for example, if you have a long match, if you hit the hardest ball, if do something very crazy on the court, then that gives you a lot of popularity, which is not reflected in the actual performance data. So there are some interesting things, but it's not like this big splash you find there.

Visit the Untangling Tennis project here.

This piece is an edited version of an interview that was originally broadcast on the Data Stories podcast by Enrico Bertini and Moritz Stefaner. Listen to the full podcast here.

Image: russellstreet.