We Don’t Need to Understand. We Can Predict.

I write this in reaction to this observation made by Chris Arnade:

“The bigger issue is the shift towards arrogance of many policy folks who have come to believe they “explain & predict” via data”

In most cases, there is no real “explanation.”  They simply point to the data and say that this is “true” because the data say so.  There, the data is self-explanatory, sometimes coupled with some naively simple platitudes.

I’d like to claim with confidence that this is fundamentally wrong, but I can’t.  I know the history of data science and pattern recognition too well to say so.  Google Translate was the first seriously successful attempt at machine translation precisely because it did not bother to “explain,” i.e. its algorithm did not bother to learn the nuances of grammar and linguistic logic for different languages that it is trying to translate between, but simple absorbed the common patterns seen in a massive volume of presumably identical documents in different languages.  The patterns it recognized in the data, it applies to other contexts, in the classic “data science” fashion.  No theory.  No “model.”  Just the patterns in the data.  And it works better than anything that tries to presuppose a “structure.”  It will fail in certain highly nuanced circumstances, it is good enough in almost any workaday situation.  You are not trying to write love letters in a language you do not know via Google Translate most of the time, for example.

This makes me wonder if the problem with data journalism/science/wonkism is not the overreliance on the data, which is often implied in the criticisms thereof, but the attempt at “explanations.”  Data is what it is.  Its collection is probably biased, but it is, for most part, always “true.”  It just so happens that it is rarely the complete and full “truth.”  If the data shows that the average height of Belgians is 183cm, then that is the truth, as far as the data shows.  But it tells us nothing as to why the Belgians are 183cm tall on average, or, indeed, if the Belgians are even “tall.”  (Yes, it tells us that the average Belgian is taller than the average person around the world.  But is the average Belgian taller than the average NBA player?  an average Dinka tribesman?  an average Martian?  And, regardless of whether any of these is true or false, what would that mean?)  Trying to force the limited data into context constructed by one’s agenda and call it an “explanation” is a dangerous thing, especially if the “explainer” believes that the use of the data–facts–suffices to make the explanation equally “factual.”  There are many ways to interpret the facts, even the same facts.  It is not important whether my interpretation is “right” or the other is “wrong.”  It is far more important that we are on the same ground as to why I’ve interpreted the facts as I did and why the other person interpreted them as she did.  That way, we can at least understand where they come from and how they think, and likewise, they can understand why we think as we do.  Maybe we can learn from each other.  Maybe we can agree to disagree, if their assumptions and premises are no less or no more valid than ours.  The bottom line is that the hows and whys, the explanation, is not a simple one way street with clearly defined right and wrong answers.  The universe is complicated after all.  The elephant is not just a spear, a rope, or a wall.  It is none of them, and all of them:  the point of “explanation” is not to browbeat the other into believing that the elephant is a spear and not a wall, but to understand how it can be all and none at the same time.


