14/6/2013

Hate Spreadsheets Formulas? Meet Drag and Drop Data Analysis Tool ‘QueryTree’

 

By Daniel Thompson, software developer working on QueryTree and CEO of D4 Software.

During the course of this decade, the amount of digital information being stored by the human race will increase 50 fold. Every aspect of life now leaves some form of digital trail and this trend has implications for all areas of life, not least for those of us who seek to hold governments and large institutions to account. The job is changing; the stories are now hidden in increasingly large sets of data and to find them we must roll up our sleeves and start digging through these datasets.

Does that mean that all journalists need to sign up to Code Academy, take an evening course in programming or finally get around to memorising the parameters for Excel's VLOOKUP function? After all, while tools like Tableau or Statwing are really good at visualizing single sets of data, the data you need for your story is probably spread across multiple files, mixed in with all sorts of stuff you're not interested in and released separately for each year by each local authority. And the tools for people who want to stitch all that together and find the interesting numbers usually involve a bit of programming. Excel has its formulas and OpenRefine has its expression language. But for people who just don't think in that way, the tools for manipulating data are a little thin on the ground.

Until now that is.

Having spent most of my career as a programmer seeing people wrestling with spreadsheets and taking days to do things I knew the computer could do in seconds, I started to wonder if there was a more visual and interactive way to work with data. It was that pondering which eventually led to QueryTree, a drag and drop tool for working with data.

My company created QueryTree to be a simple and visual environment in which tools that manipulate data can all work together to give the user a lot of power and flexibility. QueryTree makes it easier to apply a series of changes to a set of data by representing each step in the process as an icon (referred to as a “tool”) on a graphical design surface. Each icon or tool represents a function such as sorting, filtering, selecting and grouping. Each tool takes in some data, or gives out some data, or both. Each tool has some settings and displays its results in the result area on the bottom half of the screen. And by sticking to those simple rules, each tool can work with all the other tools, but still stay very focused on the one job that it does. As a user, you can connect a number of tools together in a chain to achieve quite complicated results, or just learn the one or two tools that you need and ignore the rest. 

For example, say you were investigating violent crime but the only data available is a CSV containing details of all reported crimes for a particular year. You'll want to filter out the non-violent crimes, then group them up by type, counting the total number of incidents for each, and then sort from highest to lowest before exporting into another tool. Each step of that process would be represented in QueryTree as a separate tool, each one taking the output from the previous step and changing it in some way. So, our example would look like this:

DDJ_-_What_Is_QueryTree_-_Image1.png

To build this, a user drags and drops each tool into place from the toolbar. Each tool does one relatively easy to understand job like loading from a file, filtering or grouping. Clicking on each tool in the chain displays the data set at that point in the process in the results area.

However, investigations into data are not usually that linear. During the course of your digging you may want to take a look at all the violent crimes on a map to see if anything jumps out at you, or merge several other datasets together. You may want to try plotting various graphs to get a sense of what the data contains. In fact, your QueryTree worksheet would probably look more like this:

DDJ_-_What_Is_QueryTree_-_Image2.png

Other tools tend to only present one version of the data at any one time. If you sort your table in Excel, then your table is now sorted. This is a problem because going back and forth, comparing different views, or just explaining to your editor what you've done, requires you to keep a record of each step. With QueryTree not only is each tool simple and code free, but each tool also creates a separate version of the data, and the path the data has taken is laid out for all to see. 

QueryTree is free to try and can be accessed at querytreeapp.com. Over the coming months we'll be working on more tools for loading in different types of data, for cleaning data and for creating more impressive visualisations. If you'd like to hear about new features and updates you can follow us on Twitter at @QueryTreeApp, or sign up to our newsletter.

Comments