The Follies of “Data Science,” Nate Silver Edition

I think this attempt at self-justification by Nate Silver with regards the failure of his prediction models during the primary race, with respect to Trump (and while he does not say so often enough, Sanders) is a lot of bunkums.

The term “punditry,” like old fashioned baseball conventional wisdoms, has come to be a bad word among those who fancy themselves data-enlightened.  Yet, they are perfectly reasonable approach to understanding the problems where live data is rare:  in effect, all that represents is “substantive expertise and knowledge” substituting for data that does not exist.  As per the alleged saying by Marvin Minsky, an uninformative prior (i.e. the uniform prior) is still a prior.  By making the assumption, you are tearing down the possible giants whose shoulders you might be standing on and throwing yourself down to the ground zero.

The ultimate target of Silver’s blame, of course, is not really the usual “punditry,” but a body of political science theorizing, encapsulated in The Party Decides.  But this is, and has always been, a dodgy theory.  It rests on the assumption that the party leaders can, ultimately, always rig the game so that their opponents are always frustrated.  The anti-establishment tone of the campaigns in both parties, especially current wave of discontent swamping the Democratic Party, actually demands that the moving parts behind this theory should have been given a more serious thought rather than simply taken as a crude “predictive” tool.  It should have been obvious from the beginning that, while the resources of the establishment may be sufficient to forestall revolts most of the time, it may not be enough to stop them all the time, and it is desirable to identify what the warning signs are that might indicate that the institutions are about to fall apart–which they did, to an extent, for both parties in 2016.

One analogy might be drawn to civil engineering (and indeed, most of the old fashioned engineering problems):  steel bridges don’t collapse very often, but nobody believes that steel bridges are uncollapsible just because they don’t collapse often enough.  They know that bridges, depending on the material and construction techniques, will be able to tolerate stress from a certain set of weight for some time under a certain set of circumstances.  While very little data may be obtained from the fully constructed bridges, the building materials and techniques are subject to rigorous tests and these tests, using the understanding of other sciences, are translated into reasonably reliable estimates for how much stress an actual bridge can tolerate.  Simply by recognizing that the party establishment may not be all powerful, that given enough weight and circumstances, it might just collapse, more thought could have been given to this old fashioend engineering-like approach–if only the practitioners were sufficiently skeptical, constructively speaking (pun intended) to begin with.

This is, I think, a problem with a lot of “data science” approaches.  I have ranted often that data science, predicated on using data for generating predictions, is not really a science.  It occurs to me that it is not really an engineering discipline either, at least in the old sense, as engineering too rested on the premise that all “theories” are ultimately wrong if pushed to the limits (i.e. the “theory” of a steel bridge ALWAYS collapses under the “extreme data” of very heavy weights.)  More and more, it strikes me as a modern day cult, closer to astrology than astronomy, in that it too rests on generating “prophecies,” if only based on a more rigorous examination of the past.

PS.  I think I was being a bit unfair to Silver who does, after all, recognize the value of poor predictions as learning opportunities in the scientific process.  The criticism applies to the “data science” mindset on the whole, which values predictive power above an understanding of the process that generates the data rather than particularly to Silver.  As the popularizer of this mentality to “journalism,” however, Silver does deserve a fair share of this blame, particularly since the rest of the essay tilts more to disparaging theory building as mere punditry, rather than how to best integrate theory building and data analysis as good “science” should be.

I don’t think “punditry” (or old fashioned baseball know-how) is necessarily a bad thing.  What is missing is that they do not provide the means with which how wrong they are can be evaluated.  Statistics provides such means, and as such, transforms magic into science.  As a twist on Clarke’s quote goes, any sufficiently analyzed magic is indistinguishable from science.  Any “science,” on the other hand, that is not analyzed might as well be magic and superstition–thus my contention about astrology above.  Unfortunately, humans are magical creatures who’d rather “believe in” fairy tales than analyze them, and both punditry and “data science” approaches try to exploit this proclivity.

PPS.  I can’t help but wonder if the data journalism folks will compound this mistake with another one, for exactly the opposite/same reason.  Polls until recently have shown Trump losing badly to the Democrat, whether it is Clinton or Sanders.  As I’ve been harping on this site, this is largely due to Trump’s weakness among the partisan Republican voters who are both educated and affluent, who are least likely to cross the party line on the election day.  I don’t see Trump losing a lot, if any meaningful proportion of these voters, now that the Republican race has been sorted out.  Those are precisely the poll numbers that should be discounted, according to political science theories.  Of course, having once been burned by the “punditry,” pure data types will insist on ignoring it, I imagine, to their peril.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s