On Data, Leadership and Confusion Matrices
In a data-driven business, managers do not stop making decisions – but decisions are no longer about what to do, but what models to use to decide what to do.
By Espen Andersen, associate professor at BI Norwegian Business School, and leader of the Business Models work package in CityZEN.
The world is becoming increasingly data-driven. Being data-driven means making decisions based on data, not intuition. This is a problem for many executives, since they are used to making decisions based on intuition – what they believe – and not what facts (or at least data) say.
However, it can produce significant financial results. Finn.no, for example, has a data-driven management and decision-making process that delivers 1,000 updates a week and an EBITDA of 37%, year after year.
But which decisions remain for management in a company where the data, experiments, and models decide what is to be done?
The answer is that they will mainly determine the criteria for how models and data should be used.
Here is an example from a medical situation: Suppose we want to use a machine learning model to determine whether a mole contains cancer cells or not. We have lots of recorded observations of moles (length, width, color, regularity, shape, etc.) and other data on the patients from previous studies, and we also know for these observations whether it was actually cancer or not. We take this data, give it to a data analyst and ask him or her to create a model or algorithm. The analyst comes up with a model that correctly predicts 90% of cases.
And now it is up to management: Is this model good enough and should it be used?
At first glance, it seems simple: If the model is better than what is being done now? After all, if a doctor examining a patient finds the right diagnosis in just 80% of cases, it’s simple: Then you use the algorithm. The fact that simple algorithms are often more accurate than expert assessments is a known issue. That is why the assessment of whether your new-born child should be incubated or not is based on a simple point system (APGAR) that anyone can use, as opposed to an obstetrician or nurse taking a look at the child and saying what they think based on their experience.
But things are more complicated than that. All models make mistakes, but what kind of mistakes do they make? What you have to do is create what is called a confusion matrix. A confusion matrix compares reality and model prediction, and can e.g. look like this:
This matrix shows a model that is 90% accurate – it has found 80 cases that were not cancer and 10 that were. But it has been wrong in ten of the cases: Two where it was cancer but the model predicted non-cancer (false negative), eight where it thought it was cancer but it wasn’t.
If a doctor has an 80% error rate, it seems this model is better, so you should use it.
But wait a second – all mistakes are not created equal. It is obviously much less serious to have a false positive (that you think a patient has cancer but is wrong) than a false negative (it is cancer, but you cannot find it). In the first case, the patient has an operation on a mole that turned out to be harmless. In the latter, the cancer can metastasize and cause a much more serious illness.
But machine learning algorithms can change. Suppose you find another model (or change the current one a little, for example by setting lower criteria to consider something to be cancer) and get a model that looks like this:
This model 82% accurate – inferior to the first model – but it finds all the cancer cases. Probably this is a much better model to use than the first one.
To be able to choose the right model, you must have knowledge of the consequences – both what the model should be measured against, but also what the consequences will be if the model is wrong (which it certainly will be). It will require you to make a decision about the relationship between types of error – a price tag, if you will. In fact, you have to put a price on the entire model – what does it cost (or save) when it does something right, and what does it cost when it goes wrong.
Finding models that work is a job for specialists and technologists – in this case, data analysts. Adding value to consequences is a managerial responsibility you cannot run away from – and a prerequisite for creating the data-driven business.
Article was originally published at digi.no (Norwegian language) here.