The Iraq War by the Numbers: Extracting the conflicts’ staggering costs


By Michael Li, The Data Incubator

Data Journalism is a fast evolving field, with an ever growing multitude of open data sources and an ever evolving set of data tools.  At The Data Incubator, we provide data science education, whether that be through data science corporate training or our free data science fellowship, which help candidates with masters and PhDs transition into data science careers.  Many of our fellows work on projects that leverage data for journalism.  Take, for example, Jeiran Jahani’s project, The Iraq War by the Numbers, which harnesses data from documents on the Iraq War. In her own words:

The beginning of the twenty first century is marked with a war that killed 151,000 to over one million humans and cost over six trillion dollars: the Iraq war. This war is not only important due to its staggering costs (both human and financial) but also on account of its publicly available and well-documented daily records from 2004 to 2010.

These documents provide a very high spatial and temporal resolution view of the conflict. For example, I extracted from these government memos the number of violent events per day in each county. Then, using latent factor analysis techniques, e.g. non-negative matrix factorization, I was able to cluster the top three principal war zones. Interestingly these principal conflict zones were areas populated by the three main ethno-religious groups in Iraq. Moreover, adaptive Bayesian smoothing approaches revealed statistically significant jumps in the underlying temporal trends within each cluster —  the so-called spike alert days. These spike alert days coincided with well-documented changes in the way the war was handled. Although the algorithms used to analyze the war memos were blind to the historical, geographical, and political context of the conflict, they were able to shed light on decisions that exacerbated it.


Image: The Iraq War by the Numbers.

Or watch Jeiran explain it herself:

If you’re interested in what learning how to conduct a similar analysis, Jeiran has offered the following advice:

Get started with tools of the trade before you apply. Take up a mini project, use an open data set ... familiarize yourself with databases and database query languages, e.g. PostgreSQL. Start your github account by committing [a] mini projects.

Check out The Data Incubator’s blog for more information on free open data sources and free data science tools to help you get started on your next data journalism project.