28/4/2018

20 Years 20 Titles: A data analysis of Roger Federer’s career

 

By Angelo Zehr, SRF

Roger Federer is probably one of the most famous Swiss people – and definitely one of the most loved. When we realised his 20th anniversary as a professional tennis player was coming up, we wanted to add something nerdy to the discussion. That’s why we developed and published the interactive story 20 Years 20 Titles when he won the Australian Open in January this year.

Those who watch tennis know: every millimeter of a match is being measured by sensors and cameras, analyzed and stored. So it shouldn’t be a very difficult task to get access to this treasure trove of data, right? Well, unfortunately not. The Association of Tennis Professionals (ATP), which organizes the world ranking, does not offer any machine-readable data, let alone a public API.

Fortunately, they do offer quite a lot of information on their website that can be scraped. And that's exactly what a talented programmer based in Serbia, Mileta Cekovic, did when he created Ultimate Tennis Statistics. Mileta helped me set up the project on my local (windows!) machine where I could then export the PostgreSQL database. On my Mac, I used Postgres.app and PGAdmin to import the database backup.

When you google "tennis data", you immediately run into Jeff Sackmann and his great GitHub repo packed with a lot of interesting csv tables. Ultimate Tennis Statistics already use some of Jeff Sackmann's data. For other data, and for those who are too lazy (including me) to download the data directly from him, where it is separated into multiple tables, you can use Stefanie Kovalchik's repo deuce. This repo is optimized for use in R, our statistical programming language of choice.

Preprocessing

With the dplyr package, it was easy to connect to the Postgres database and query the data in its tables. For the data analysis, we relied heavily on Hadley Wickham's tidyverse packages for R, especially ggplot.

As a public broadcaster, we see it as our duty to publish not only the result of our work, but also the source code that got us there. If you’re interested in reproducing the preprocessing part of this project in R, you can find it on our GitHub page, alongside our other projects.

Translations

We wanted to publish the story in multiple languages at the same time. To accomplish this, we didn’t hardcode any label names or texts, recording these in a google spreadsheet instead.

One important note: in order for this workflow to work, you need to publish the spreadsheet to the internet. If you don’t do this, the script below is not able to access the data. This setup has the disadvantage that the spreadsheets would be visible to anyone with a link. But so far this hasn’t been a problem for us (it is almost impossible to guess the URL of the spreadsheet). In a small makefile, we set up the download process for the translation sheets using the npm package gsheets. Then, we input the spreadsheet’s hash and a destination into the package:

With this setup, you just need to type “make gsheets” in the command line interface to create the four json files and store them in the correct folder.

Frontend

For the use of template strings on the frontend, we used translate.js and stored the translate function in our react single page application (in the context to be precise). Using the react (hash) router, we were able to link to a specific language in the url, for example https://srf.ch/federer/#/it goes to the Italian page.

For a good responsive and interactive user experience, we needed to re-implement the charts using d3. To have a clear vision of how we want the charts to look like in the end, we exported the ggplots from R into an svg or pdf file and imported it into Sketch. In Sketch, we styled the data points and axis to fit our concept.

To make the charts responsive, we built a small react component called ChartWrapper around react-measure. This made it really easy to work with a CSS grid layout and, for example, place a chart on eight of the twelve columns. The library measured the width of the space available and passed it down to the chart as a property. We also decided not to hard code the height, instead calculating it using an aspect ratio. In some cases it makes sense to have a higher aspect ratio for mobile devices, so that the charts don’t get shortened. You can read the technical implementation of this in the full code of the component. In the end, we used the component like this:

Annotation layer

When building responsive charts, another important aspect is getting the right annotations. While it's cool to have small arrows with descriptions on a desktop, these do not fit on a small screen. That's why we decided to replace them with small numbered labels and moved their descriptions to the bottom of the charts:

You can read the full details of this in the full implementation of the react components. In short: on mobile, the arrows get replaced by numbers in the SVG chart and at the bottom of the chart (in the regular HTML document flow) the texts are shown based on standard CSS media queries. If this is too manual for you, there are also some javascript libraries out there to help you with this.

If you have a question regarding some other part of the piece that has not been covered in this article, or if you’re a data journalist yourself and have a found a different (maybe even better?) approach to some of the stuff we tackled here, please comment or write me – you can find me as @angelozehr on twitter.

About the author

Angelo Zehr, 28, studied Multimedia Productions and is now one of three data journalists at Swiss Radio and Television SRF.

Explore the project here.

Image: Justin Smith.

Comments