I’d been harping about problems with concepts with poorly defined meanings for a while: for example, “ideology” and more broadly problems with “translation.” The genius of DW-Nominate and Google Translate is that they were able to produce “good enough” measures of these concepts without really learning about the underpinnings of the concepts themselves: Google Translate does not know what English or Japanese are and DW-Nominate does not know what liberal or conservative mean, but the former will spit out an approximate translation and the latter will pop out numbers, based on statistical parsing of the data.

The problem is that their outputs produce an appearance of precision that overstates the preciseness of the underlying definitions, which, paradoxically, defeats the whole purpose of using these agnostic (and quite frankly, dumb) methodologies in the first place. The concepts are too complex and too shrouded in uncertainties to be measured meaningfully in the first place, or, in other words, they are characterized by a lot of fundamental uncertainty inherent in the process. You can measure the concept with utmost precision and generate numbers (or translations), but if the concepts are themselves uncertain, the apparent precision of the measurements are meaningless.

This is not just a “mechanical” problem that can be solved with more data and better algorithm, but of the underlying data itself. Consider two populations with different height distribution: one has the mean of 165cm and the standard deviation of 10cm; the other has the mean of 162cm and the standard deviation of 20cm. What are the odds that you will find someone from the second population who is taller than someone from the first population? The short answer is a lot higher than one might think: If the heights are distributed normally, the difference between the heights will have a mean of -3cm and a variance of 500, or a standard deviation of about 22.4cm. The area to the right of 0 for this distribution is about 45%. The average person of the first population is DEFINITELY taller than the average person of the second distribution. The larger the sample you have, the more certain you will be of this truth. But no matter how big the sample is, it won’t change the fact that, a random person from the second population will be taller than the first population with 45% probability. Appreciating this requires being aware of the underlying distributions and the *nature* of the data, not just being able to sic fancy algorithms on a very big and messy data. Sometimes, the messiness of the data is a useful clue indicative of its nature, not something to be wiped away with fancy computer tricks.

The defenders of DWNominate or Google Translate will point out that, in a lot of data applications, they are pretty good. But that is itself an interesting question. One can’t expect any algorithm that disdains the truth, quite literally in these cases, to be 100% accurate. That they are good at making appropriate predictions in many applications begs the question: when aren’t they good at predictions? Is DWNominate good at distinguishing liberal and conservative Republicans? (it isn’t, by the way: one remarkable change in the recent years is that, if WNominate (DWNominate is a bit more demanding to sic on data quickly) is applied to just subsets of Republicans–and Democrats–rather than the full House, the ordering that it returns is quite different from that you obtain by siccing it on the full House, and the difference has been growing since 1990s. If there is such a thing as “liberalism” and “conservatism” that transcends parties, the orderings ought to be nearly identical.) Is Google Translate good at translating love letter and subtle literature? (Apparently, it is not, and there is no good reason to expect that it would be given its nature.)

The problem is that peddling statistics involves subtle lies that latch on to people’s usual inability to make proper sense of statistics and probability. In the height example, for example, more and more data, better and better algorithm will indeed show with greater conviction than the average person of the first population really, really, really, truly IS taller than the average person of the second population. But we never deal with the average persons, and for that, we need a different framework. The same logic applies to making sense of politics: in a typical election, we get more or less “average” voters. Their variance is relatively small. We can develop algorithms that show how much taller or shorter different groups are that are reasonably insightful. Not so in “strange” elections, when high variance voters show up. Our usual categories might still work on average, but you will still get the person from the allegedly shorter population being taller than a person from the allegedly taller population far more often than you’d expect. The algorithm (and the data it analyzes) isn’t wrong. It is just answering the wrong questions. The trouble with the obsession with meaningless jargon is that it obscures what questions are being asked and what these techniques are really “measuring.” For that, you need real “science” and an understanding thereof.