Death by clinical data: A model that accumulates medical knowledge for more comprehensive reporting


When you see a doctor, the doctor treats you according to their knowledge gained through studies, using experience gained during practice, and according to established guidelines. So, doctors have a cognitive model in their head that allows them to predict what may happen to you. What if we could "copy" that model into a computer?

Computers have been increasing their involvement in medical practice and public health reporting. In fact, the role of a doctor has changed significantly since computers were introduced into the medical system. Much of a doctor's work today involves interaction with Electronic Medical Records (EMR). The abundance of such data helps to generate predictive models.

In the past, these models were generated by many groups and published multiple times in the form of risk equations. Those equations predict the probability of having an event such as death or a heart attack for a person of a specific age, cholesterol levels and other characteristics.  To support medical practitioners in making a decision, these equations have been introduced to medical practitioners as support tools to inform their cognitive models.

However, copying the cognitive model of one practitioner will not be enough; after all, doctors differ in their practices and there are many situations where doctors operate with a number of unknowns in mind. Even disease models differ since they have different purposes and are based on different clinical trial data.

Nevertheless, the accumulated knowledge amongst practitioners has improved medical practice significantly through centuries and decades. So, to capitalise and report on this knowledge, is it possible to “accumulate” data on these cognitive models? Or at least accumulate the computational risk equations?

The Reference Model is a modest step in that direction, focusing on long term predictions. It accumulates two types of data: 1) Disease Models, 2) Clinical trial reports; to help make health data more accessible for reporting.

The first type of data are risk equations regularly published by many groups, such as the Framingham Risk Score  or the UKPDS, which analyse clinical study data. Typically, such medical data cannot be shared for security and privacy reasons, yet the equations that are extracted from the data can be published. So, accumulating risk equations allows us to merge and extract information that would be restricted otherwise.

The second type of data accumulated by the model consists of clinical trial reports published in medical literature. These reports expose trial data in the form of baseline population distribution and clinical outcome data.  This information is used for validating the models. 

The Reference Model implemented the idea of validating multiple models against multiple populations in 2012. The first implementation was similar to a score board of a sports league where the model competed to achieve better fitness results within populations. This "score board" was called a fitness matrix and visually displayed output as a colour coded Matrix, as demonstrated below for 2012 results or in the following video:


This fitness Matrix allows you to compare the behaviour of multiple models against multiple populations. But the Reference Model is not a single model. Instead, it is a league of models that compete amongst themselves to extract the best fitness results from clinical trial data. In a sense, this process is similar to asking the opinions of multiple specialists about a disease and figuring out who is the best specialist for particular conditions.

However, choosing the best expert may not be the best strategy in all cases.  Sometimes, we might want to merge the ideas of all experts and build a model that accommodates the combined prediction of a team.

The Reference Model has recently advanced to have this capability. Instead of having to choose the best model between A and B, it now allows you to find the most accurate combination of A and B that can be validated against the accumulated clinical data. The computational mechanism that conducts this analysis is called an assumption engine, since it allows the modeler to "throw" assumptions at it while it makes sense of the information. It works by comparing data from tested models and rejecting incompatible models. Considering that medical models are mere assumptions on how reality will behave in the future, the assumption engine is a powerful tool.

The Reference Model requires a lot of competition to deduce the best model.  Fortunately, computing power is cheap these days and High Performance Computing (HPC) techniques allow for such a model to exist.  In fact, it is possible to run those computations on the cloud. The Reference Model uses the MIcro Simulation Tool (MIST) to run those simulations. MIST is a free software, available here. As a free software, MIST helps to merge data in a credible reproducible manner, which can be tested and verified by other reporters or researchers.

For health reporters, this technology provides a view of data that is many times restricted and not shared.  By accumulating facts, providing methods to verify assumptions and cross reference information, and merging these into a single story, the model can help journalists extract and report on stories found in overwhelming or large sets of clinical data without the need to have access to the raw data.  As such, this technology creates a reference for journalists to compare clinical datasets, while also accumulating a repository of knowledge on these cases without causing information overload.

But this technology is just an initial step towards accumulating known data in order to guide our perception of disease progression.  The Reference Model currently holds information for diabetic populations from 47 cohorts from 9 studies. This is already more data than a human can easily remember, and it keeps on growing. The question is: when will accumulated data and models reach a point where predictions become better than the cognitive model of a physician?

If this point is reached, the advantages of copying a computerised model to multiple computers and devices may impact future healthcare delivery and public health reporting. Since the model’s predictions are long term and focused on predicting events, it is far from an actual virtual doctor, yet a small step in that direction.

After all, with advances in machine learning and the greater accumulation of medical data, we should expect medical profession to change drastically during this century. It is not unlikely that in 50 years a medical device will monitor our health and we will use models to maintain our fitness.

Image: NASA.