Beyond code: What you learn turning from a journalist to a data journalist


After learning programming at Columbia Journalism School, Gianna-Carina Grün makes a case for starting a European Data Journalism School.

The Lede program is a postgrad course that aims to give journalists the necessary skills to work in data journalism. There's a 14 weeks-long summer intensive course and an optional fall continuation. The cost for the summer course is approximately US$ 14,000.

It's mind-blowing what you can learn over the course of three and a half months: Python with pandas and matplotlib, databases with psql and postgres, scraping the web via APIs, with BeautifulSoup or Selenium, let servers do our (cron)job, building web-apps with Flask and Twitter bots, analyze large amounts of text, visualize data in pretty charts and also add some interactivity.

Apart from those very specific skills, we also learnt:

How you spend your time as a programmer: Copy Pasting!

Learning a programming language was to me just like learning any other language with grammar rules and vocabulary. However, learning to code perhaps requires a higher frustration tolerance than if you are learning French, for example. We'd thought it was more of a joke when the Lede's director Jonathan Soma told us we'd spend only 10% of our time actually writing code. Some other 10% we'd try to fix it, by putting brackets around things or changing curly braces to parenthesis. The remaining 80% percent you'll be googling for solutions and end up copy-pasting code bits from Stack Overflow. It's so much of a thing that people came up with fake book covers on this phenomenon.

Data are not neutral.

Most of the time, we'd take data as a neutral information source -- it's objective numbers, after all. But as our instructor and programmer Allison Parrish pointed out, data is always gathered with an intention and sometimes by a person (who naturally come with biases they might not be aware of). Depending on its resolution, data captures only a part of reality. In order to at least be aware of that incomplete depiction of reality, you should always ask yourself who produced the data and for what purpose, why it was digitalized, whose labor produced the data and what biases might be built into the system that holds the data.

Anscombe's quartet: When doing statistical analysis, always make a viz

That correlation is not causation is pretty well known. But even if you understand the idea of correlation and the meaning of a correlation coefficient you can be fooled. Anscombe's quartet shows you in what way: If you only look at the statistical summary, all four graphs will have the same values for the correlation coefficient. So you might see a correlation by numbers, but your assumption of what the data looks like may be completely different from reality.


Image: GNU License.

Pile of trash: When there's no story to tell

After having acquired the necessary coding skills, you start your quest for suitable datasets to apply them to. You'll find yourself digging through data sources and sets for hours - just to discard them because there's nothing surprising or revealing in them. You'll be constantly adding to your own pile of data trash that will grow way too high over time.

If you want to keep this pile as small as possible, always go into dataset research with at least a rough story idea in mind: challenge common places, cross-check facts, or investigate geo-distribution patterns.

Cookies for the monster!

Most of us certainly dream of the big investigative stories. However, they don't really go well together with a day-to-day newsrooms schedule, but still take the time to get the data, to work through it, to find story angles, and to do the reporting.

That's why investigative reporter and Director of the Master of Science Data Concentration Program at the Journalism School at Columbia Giannina Segini advised us to always have some cookies for the monster: some easy to realize – but still fancy – projects you can satisfy your editor or desk manager with.

Image: via GIPHY

I'm out of the black box -- with an idea!

When we started the program we were told it's like a black box: you go in, the Lede Program somehow changes you, you come out as a different self. The most apparent change is that at some point you start thinking in code structure. And of course, you acquire a lot of skills that help you add another perspective to your journalistic work. In my case, it also formed an idea:

I'm grateful that I had the chance to take part in this program. But I strongly believe the selection happening via the high tuition fees is not a good one.

Data journalism is too important as a tool and visuals are too powerful as a media type for these skills to only be learnt by a very specific subset of journalists (read: the ones who can financially afford it - especially in light of the decline in pay for journalists)

Starting a European Data-J-School

And I think there's a way to offer the program at a lower cost here in Europe. Find a sponsor, find partners, run it in a city that's not among the most expensive in the world.

After all, we have enough expertise here in Europe that no one should have to switch continents to learn the latest skills in this field.

It's not that you can't learn data journalism at all in Europe: there are university classes and workshops offered by journalism associations. But there are hardly any offers that go beyond a weekend or so for (young) professionals. As much as I enjoyed learning the twists and tweaks of Excel in one of them, I now know how much else there is to learn and how much more you can do with those skills.

That's why I'd like to start a European Data Journalism School.

What do you think? Would you be interested in taking part in such a program - as a teacher or as a student? How much would you be willing to pay? What would you like to learn? Would you prefer a seminar en block or a short-module based schedule? As you can tell, I'd be super interested in your thoughts!

Image: Markus Spiske.