I have to confess that I did not know about the “cruchiness” vs. “sogginess” distinction laid out by British journalist Nico Colchester until now. Unfortunate, since this captures my idea of variance-centric thinking quite well.
The premise behind variance centric thinking is that the world is soggier than we’d like to believe. Consider the following scenario where we have some input into a system that generates a distribution of outcomes.
X -> [Black Box] -> Y
We want to learn how this “black box” works by working through the relationship between X and Y. But, while we’d like to believe that a neat “functional” relationship exists between X and Y, i.e. where every value of X is associated with exactly one value of Y, this is not really true most of the time. Every value of X is associated with a range of the values of Y, a probability distribution, if you will. Instead of a neat and “comfortable” formulaic universe where pushing a red button will produce a can of Coke with certainty, we might get a situation where you get a can of Coke only with 75% probability, a ball of yarn with 15%, and a bowl of worms with 10%, or whatever. Even if you do exactly same thing again, you will probably get different outcomes, and you’d be crazy only if you expect the same thing to happen each time, to turn a well-known saying upside the head.
An article in Computer Weekly from a few years ago, it seems, arrived at a similar conclusion: even if you have huge data piles, sometimes, the insights they offer have such large variance that you still don’t know what will happen if you push the red button. All the that data tells you is that you don’t really know what you will get (exactly, at any rate) if you do press it. The data about red buttons, if you will, is not sufficiently crunchy in that the relationship between the input and output is not clearly and neatly defined. Perhaps better to seek out crunchier data where you know what you will get if you do exactly the same thing over and over again.
The problem with the search for crunchier data is that, often, crunchy data does not exist. Even if some crunchy data might exist at some time (e.g. Christmas rallies in stock markets), human strategic behavior, e.g. arbitrage, quickly wipes them out once they become known. Sometimes, besides, you have to deal with problems you have, not the problems you wish you had–and red buttons may be all that you have to deal with.
Or, in other words, you need to understand how crunchy or soggy your data is. This is variance-centric thinking comes in. Variance tells you how soggy your data is, and how to deal with it once you figure it out. Maybe you do want to learn its mean, if the data is crunchy enough…but with the proviso that, depending on its crunchiness, the means may not be good enough. If the data is very soggy, means may not be worth knowing.