8/11/2011

5 tips for getting started in data journalism

 

Originally published by Troy Thibodeaux on Poynter.org on 6 October 2011. This article is republished with permission.

 

Data journalist. Computer-assisted reporter. Newsroom developer. Journo-geek. If those of us who work in the field aren’t quite sure what to call ourselves, it’s little wonder that sometimes even the people who work beside us are puzzled by what we do. Part of the confusion (and one reason for all the competing labels) lies in the sheer variety of tasks that can fall under this heading. We may be fairly sure that some jobs lie within the boundaries of data journalism, but we’d be hard-pressed to say what can’t be jumbled into this baggy monster of a field.

In its current state, data journalism describes neither a beat nor a particular medium (unlike photo journalism or video journalism), but rather an overlapping set of competencies drawn from disparate fields. We have the statistical methods of social scientists, the mapping tools of GIS, the visualization arts of statistics and graphic design, and a host of skills that have their own job descriptions and promotion tracks among computer scientists: Web development, general-purpose programming, database administration, systems engineering, data mining (even, I hear, cryptography). And the ends of these efforts vary as widely as their means: from the more traditional text CAR story to the interactive graphic or app; from newsroom tools built for reporters to multi-faceted websites in which the reporting becomes the data.

It’s difficult, finally, to define what data journalism is precisely because it’s difficult to say what data is. After all, anything countable can count as data. Anything that a computer processes is data. So, on some level, all journalism today is data journalism (certainly it’s all “Computer Assisted”). Real data journalism comes down to a couple of predilections: a tendency to look for what is categorizable, quantifiable and comparable in any news topic and a conviction that technology, properly applied to these aspects, can tell us something about the story that is both worth knowing and unknowable in any other way.

So, it’s a field brimming with promise but vaguely defined, which is part of what makes it so exciting. On a near-daily basis, I find myself faced with the task of learning something new and putting it into practice immediately. And that aspect is, for me, the single greatest thing about working in journalism in general: we get paid in large part to figure things out. This trait among journalists — the willingness to launch ourselves headlong into an alien world with the expectation of emerging with more than a conversational understanding of its inner workings — gives us the moxie or naivete to try things that a programmer with a clearer job description might simply wave away with a “not my job.”

But this lack of defined parameters can also lead to a bit of confusion for someone wanting to get started in the field. Should you start by learning a programming language? Which one? Is it OK if your stats knowledge is rusty or non-existent? What should you know about mapping? I’ve laid out five tips below that should start you thinking. In a future post, I’ll concentrate on the tools you’ll need.

Be mercenary.

Completists may believe you have to be able to build a computer from a bag of wire and lights and write your blog posts in binary before you’re ready to call yourself a coder. Sure, there is value in expansive knowledge, and we’re all trying to gain a deeper understanding of the technology we use. But we also have a clear goal: we’re storytellers, through word or pixel, and the story won’t wait for us to finish our self-imposed curriculum. So, pick up what’s at hand, learn what you need to get to the next step in your project and get to something real as soon as possible.

I’ve seen many well-intentioned efforts to “learn programming” be pushed aside by real-world obligations. So, make learning to code a real-world obligation. Ask yourself whether there is a task you do routinely (and mindlessly) that you could automate. Is there a data set locked in a website that you would love to scrape into a handy spreadsheet? Once you’ve identified the task, then the outline of your research is clear: What do I need to know to get this job done? And for now, don’t worry about anything that doesn’t move you toward that goal.

Read the full article on Poynter.org.

 

Image credits: The Library of Congress

Comments