A data visualization that is invariant to the data


Kaiser Fung explains how design decisions can make or break your data visualization.

This map appeared in Princeton Alumni Weekly:


Here is another map I created:


If you think they look basically the same, you got the point. Now look at the data on the maps. The original map displays the proportion of graduates who ended up in different regions of the country. The second map displays the proportion of land mass in different regions of the country.

The point is that this visual design is not self-sufficient. If you cover up the data printed on the map, there is nothing else to see. Further, if you swap in other data series (anything at all), nothing on the map changes. Yes, this map is invariant to the data!

This means the only way to read this map is to read the data directly.

Maps also have the other issue. The larger land areas draw the most attention. However, the sizes of the regions are in inverse proportion to the data being depicted. The smaller the values, the larger the areas on the map. This is the scatter plot of the proportion of graduates (the data) versus the proportion of land mass:


One quick fix is to use a continuous color scale. In this way, the colors encode the data. For example:


The dark color now draws attention to itself.

Of course, one should think twice before using a map.

One note of curiosity: Given the proximity to NYC, it is not surprising that NYC is the most popular destination for Princeton graduates. Strangely enough, a move from Princeton to New York is considered out of region, by the way the regions are defined. New Jersey is lumped with Pennsylvania, Maryland, Virginia, etc. into the Mid-Atlantic region while New York is considered Northeast.

This article was originally published by Junk Charts, republished here under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License. Read the original article here.