The Question of Variance.

Jeff Greenfield has an interesting, if rather predictable, article in Politico about how Democratic pollsters are concerned, in spite of the apparent advantage that Clinton has in numbers.

The problem, of course, is exactly that of variance, not a “standard error,” but of “standard deviation.”   The former, commonly a staple of pollsters and others who use statistics without thinking about probability theory foundations, simply tells us where the mean is with some measure of certainty.  With large enough sample size, we will know where the mean is with near certainty (technically, full certainty will require a sample size of infinity, but that’s only mathematical technicality.)  No matter what the sample size, however, the standard deviation will not go down–it is an intrinsic characteristic of the probability distribution itself.

One distribution can have a definitely lower mean than another.  Large enough data set will confirm, over and over again, that one mean is definitely smaller than the other.  BUT random draws from a distribution with a smaller mean can be larger than a random draw from a distribution with a larger mean.  Moreover, in a game of war (the card game) that depends no random draws from a distribution, a smaller distribution with a larger variance can be to a greater advantage (when running up against a distribution with a larger mean) than a higher mean and low variance distribution.  You will never have a 50% probability of winning–your mean is smaller than the other side, but the probability that you will win will be higher with a smaller mean-higher variance distribution!

This is the reality that the veteran pollsters are feeling:  Clinton will probably win, as per the significant difference in the means, but the variance of Trump’s support is such that knowing the difference in the means is not an obvious guarantee that it was with a low variance distribution like Romney’s.  Somehow, I get the sense that many of the “data science” types covering politics are not grasping this (not being able to distinguish standard deviation and standard error is a common problem I found among those who learned “statistics” by formulas and examples, not the math under the hood).  They live in the illusion that the more precisely they can estimate the mean, which they can with large enough data, they can safely disregard the variance, which they seem to consider as no more than nuisance getting in the way of knowing the “truth” which can be neatly wrapped up in a set of means.  That’s not how probability works.  That’s not how the reality, where the data comes from, works either..


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s