29/1/2013

Data-Driven Documents, Defined

 

By Scott Murrayauthor of Interactive Data Visualization for the Web and Assistant Professor of Design at the University of San Francisco.

 

D3, or d3.js, is a magical chunk of JavaScript code that helps you express data visually on the web. D3’s full name is Data-Driven Documents, and it’s so named because it connects data values to document elements, thereby “driving” the document from data. For example, consider a typical scatterplot, with little circles placed in 2D space along the x and y axes. Think of these dots as being “driven” by some underlying data: they are “pushed” into place, based on their corresponding values.

D3 is exploding in popularity right now, partly because it works with any modern browser, including mobile Android and iOS devices, but also because it is extremely powerful. That said, it does take some time to learn and master; if you’re a journalist on deadline, D3 may not be the best tool for that chart that’s due today. For common visual forms, you may prefer Datawrapper, which is designed to generate visuals quickly. D3, on the other hand, is best for more complex visuals, or more complex interactive designs.

This flexibility turns out to be D3’s greatest strength. D3 is great for journalists because it doesn’t limit you to a specific visual form. You’re free to explore beyond bar charts and invent new means of visual storytelling. The New York Times has been gradually phasing out Flash in favor of D3 and other JavaScript-based interactive pieces, even hiring Mike Bostock, the primary author of D3, who is now on the Times’s graphics team. The result has been a series of groundbreaking interactive graphics, including the recent 512 Paths to the White House, by Mike Bostock and Shan Carter.

paths_to_the_white_house.png

 

In this brief tutorial, I’d like to introduce how to make a very simple visualization with D3. This will illustrate some of the basic concepts behind D3, without going into too much detail.

This tutorial assumes a bit of prior knowledge of HTML, CSS, and JavaScript. If you’re brand new to these ideas, you may want to start with these beginner-level tutorials or with my forthcoming book, Interactive Data Visualization for the Web.

Making a Dot Chart

Every visualization begins with data. Your data set could be whatever you want. For our purposes, I’ll use an array of pairs of values — that is, an array of arrays, in which each sub-array contains two values.

//Define data
var dataset = [
    [ 25, 95 ],
    [ 15, 92 ],
    [ 37, 77 ],
    [ 25, 36 ],
    [ 34, 46 ],
    [ 31, 83 ],
    [ 25, 66 ],
    [ 65, 35 ],
    [ 58, 55 ],
    [ 68, 31 ]
];

I want to make a dot chart in which these values are represented by dots placed in space. So the first value in each sub-array will be my x value, and the second value will be for the y axis.

Next, it’s helpful to set up some variables we’ll reference later.

//Define variables for size of chart
var width = 500;
var height = 300;

Then we use our first bit of D3 code to create a new SVG element. SVG is a great image format because it’s written as plain text (just like HTML) and, when rendered as part of a web page, its elements exist in the DOM (also just like HTML).

//Create a new SVG element
var svg = d3.select("body").append("svg")
            .attr("width", width)
            .attr("height", height);

This string of code selects the body of the page and appends a new svg element. Then, using method chaining, we tack on two attr() statements. These set the attributes of the selection, which happens to be the svg element we just created. So now svg’s width and height are set to, well, width and height.

Next, let’s add a simple background to this SVG image, so we can see how big it is.

//Create a single rect as the background
svg.append("rect")
   .attr("x", 0)
   .attr("y", 0)
   .attr("width", width)
   .attr("height", height);

This takes the svg selection from earlier and appends a rect inside that selection. As you’ve guessed, a rect is a rectangle. Rectangles need x, y, width, and height values, so we specify those with attr() statements. Here is our 500 by 300-pixel rectangle:

rect.png

I know, beautiful, right? (One step at a time…)

Finally, by referencing the original data, we create, position, and style one circle for each value in the dataset array.

//Create one circle element for each value pair in dataset
svg.selectAll("circle")
    .data(dataset)
    .enter()
    .append("circle")
    .attr("cx", function(d) {
        return d[0];
    })
    .attr("cy", function(d) {
        return d[1];
    })
    .attr("r", 3);

This is a bit weird, but basically we select all the circles (which don’t exist yet). Then we bind the data to those empty placeholder selections. Then, for each value, we enter() a new element, to which we append() a new circle.

Lastly, we set the attributes of each of those new circles. In this case, we set the center x (cx) and center y (cy) values to the first and second sub-array values, respectively. Notice that we can access the data values specific to a given element by using an anonymous function:

    .attr("cx", function(d) {
        return d[0];
    })

The current datum is passed into d. That means that, for each circle, d references the individual sub-array from our original dataset array. So, for the first circle we draw, d is [25, 95]. For the second circle, d is [15, 92], and so on.

Within the anonymous function, we use d[0] to grab the first value in that sub-array (position 0) and d[1] to grab the second value (position 1).

dots.png

Aha, dots! We have liftoff!

Yet the x/y values from the original data don’t make for optimal pixel values, so all our dots are scrunched up in one corner. Fortunately, we can use D3’s scales to map the original values to something more conducive to display.

A basic D3 scale looks like this:

var scale = d3.scale.linear()
                     .domain([0, 10])
                     .range([0, 500]);

Scales are function generators. So now scale is a custom-built function that accepts values in the domain of zero to ten, and will output values in the range of zero and 500. So scale(0) will return 0. And scale(5) will return 250.

Now we can create a separate scale for each axis. And instead of using hard-coded values, we can specify the domains and ranges dynamically.

//Define scales
var xScale = d3.scale.linear()
                     .domain([
                        d3.min(dataset, function(d) { return d[0]; }),
                        d3.max(dataset, function(d) { return d[0]; })
                     ])
                     .range([0, width]);

var yScale = d3.scale.linear()
                     .domain([
                        d3.min(dataset, function(d) { return d[1]; }),
                        d3.max(dataset, function(d) { return d[1]; })
                     ])
                     .range([0, height]);

d3.min() and d3.max() may look scary, but they are simply quick ways of calculating the minimum and maximum values of an array. So for xScale, we set the input domain to the smallest and largest x values. We do the same for yScale, but with y values.

Lastly, when setting cx and cy, we wrap those data values in the new scale functions.

    .attr("cx", function(d) {
        return xScale(d[0]);    <-- Wrapped!
    })
    .attr("cy", function(d) {
        return yScale(d[1]);    <-- Wrapped!
    })

scaled.png

Looking good, but the dots with minimum or maximum values are getting cut off. Let’s add a little padding around the inside of our SVG image. I’ll make a new variable to store this arbitrary amount of padding:

var padding = 50;

…and then I’ll rewrite how we specify the ranges to include that padding:

var xScale = d3.scale.linear()
                     .domain([
                        d3.min(dataset, function(d) { return d[0]; }),
                        d3.max(dataset, function(d) { return d[0]; })
                     ])
                     .range([padding, width - padding]);

var yScale = d3.scale.linear()
                     .domain([
                        d3.min(dataset, function(d) { return d[1]; }),
                        d3.max(dataset, function(d) { return d[1]; })
                     ])
                     .range([padding, height - padding]);

padded.png

Perfect! Notice how the dots have all been pushed in from the edges toward the center, yet their relative positions remain the same! That’s the power of scales — now you can change the width and height, adjust the padding, or even load in completely different data values, and the entire visualization will adjust accordingly!

 

Note: The code that accompanies this tutorial is available on GitHubYou can also download all the files at once as a ZIP here.

Comments