There is something interesting about where the data journalist types were led astray in the 2016 primaries: it’s not (as much) the data. Rather, it’s the theory about politics that many of them began coverage of the primaries with–especially the now-infamous The Party Decides book. In fact, while I have not examined the data systematically, a quick glance through the “polls plus” versus “polls only” forecasts by fiverthirtyeight.com provides a potentially quantifiable clue to this problem: polls plus, which took into account institutional effects (e.g. endorsements) seem to have been systematically less accurate than polls only, which agnostically took the polling data as the guideline.
This should be good news for “science,” but it will come as a black eye instead. We have systematic, quantitative data that clues us in on where our previous theory, one on which The Party Decides was grounded on. This gives us the opportunity to rethink, refine, and update the theory, which is exactly how “science” is supposed to work hand in hand with data. Instead, however, it will be accepted as evidence that the old theory, mistakenly accepted as “political science” was just wrong and provided misleading clues to the future. Rather than an opportunity to rethink, refine, and update, it will be used as the means to further denigrate social sciences, already under attack as is.
The title of this post was poached from an essay by Stephen Jay Gould, who, in turn, took it from a Charlie Chan movie. The context in which Gould was using the quote was the use of data by racial “science” and eugenics in early 20th century–not an insignificant attack, since racial science and eugenics drew interest of some of the best classical statisticians of the period and many of their most important work took place in this context. Gould argued that, because these people were so enamored with their theories, they found increasingly creative ways to fudge the data and analyses when the data did not fit their worldview, thus, the fog of theory obscured the insights shown by the data. Of course, this is true in every empirical context: the old fashioned baseball people were not stupid. They had good reasons to subscribe to their “theories” about baseball, backed by some data–but without systemic understanding of data analysis, they had little or no opportunity to rethink, refine, and update their theories, until many of their theories were just barely better than random chance. Some of the more ardent sabermetricians were, I think, truly barbarians at the gates who did not know nor care much about baseball–just data, and concocted whatever that looked fine statistically, and as a baseball fan, I am happy to see that they did not win out. Their attitude, however, has still prevailed in some aspects of sabermetrics and data analyses in general–as per the current popularity of “data science” attests to.
There is substantial amount of blame to be had by the social scientists in this failure to: they did not approach the problems scientifically, although, with all the pressure to get things “right” as there is, the more leisurely approach to analyzing “lessons learned.” This is unfortunate: the success of US Navy during World War 2, in 1943 and beyond, came from the thorough “lessons learned” research after disastrous battles of 1942. Failures help us learn. Throw away failures, we learn nothing. Unfortunately, if failures are used as the criterion for dismissal and punishment, we will throw away all our failures and learn nothing.