Are you sure about those numbers? An application of Bayesian methods to MSF child mortality rates


New research reveals how advanced statistical methods can mitigate data shortcomings.

In July 2012, the BBC reported that about eight children die per day in South Sudan's Yida refugee camp. Similar statistics were also reported by the Guardian and The New York Times, with all reports relying heavily on Médecins sans Frontières (MSF) data that indicated child mortality rates were double the emergency threshold. The problem? MSF's statistics were derived from one household based survey that employed a 95% confidence interval (CI) of 0.55 to 7.39 deaths/10,000 children/day - a margin of error that Heudtlass et al argue is wide enough to misleadingly frame statistical abnormalities as conclusive evidence.


Image: MSF's 2012 Household Based Survey data

However, this is not to say that MSF intentionally produced misleading statistics. Collecting data costs money and, in a humanitarian system that is largely considered "broke", organizations are sometimes forced to derive insights with little data.

Yet, without a large enough dataset, random outliers may appear as trends and skew the results. For example, rolling a six on a dice two out of three times does not mean that there is a two out of three chance of this occurring every time - there just simply is not enough data to accurately predict the probability of rolling a six.

To minimize margins of error that occur in smaller datasets, Heudtlass et al's recent analysis of the MSF child mortality statistics, suggests that Bayesian methods could provide a solution.

In essence, Bayesian data analysis works backwards. Rather than assigning causation after the data has been analyzed, researchers begin with a hypothesis. They then apply Bayes' theorem to evaluate the likelihood that the data fits this hypothesis, and update the belief as more data becomes available.

Utilizing the South Sudan example, the researchers showed that a Bayesian analysis did indeed indicate that child mortality rates were not as high as MSF predicted.

"A Bayesian analysis of the death rate in the Yida camp then yields an estimate of 1.85 deaths/10,000 children/day. This death rate is still elevated and of concern but far below MSF’s estimate of 3.98. It also signifies that a skeptical audience, based on their own expectations and the data that MSF has provided, would not be convinced that the internationally agreed emergency threshold of 2.1 deaths/10,000 children/day has been crossed," write Heudtlass et al.


Image: Comparison between MSF's mortality rate and the Bayesian version

Read Heudtlass et al's full research report here.

Watch this video to learn more about small sample sizes and Bayesian methods:

Photo: United to End Genocide