One misleading notion that use of statistics got into people’s heads is that there is some kind of “truth” that using statistics can help get at, only if we can cut through the “noise” that interferes with the search. I think this is a very dangerous idea.
Most of the statistics concerns itself with conditional means of a population (that is represented by a sample). If we are to say that the average Hungarian is, say, 171cm tall, this is not the “truth.” The “truth” is the height of an individual Hungarian, whether he/she is 195cm tall or 152cm tall. All that the summary statistics of Hungarian height data will tell us is that the “truth” consists of some distribution centered around 171, not that “the truth” is 171cm that is obscured by all the Hungarians who are not approximately 171cm tall. In other words, we want to know if the “truth” that we seek has anything to do with the distribution (i.e. the variance and other nth order moments) or the mean. Sometimes, the mean is a reasonable approximation for the truth (i.e. what is the % of the voters who might vote HRC over Trump), but sometimes, it isn’t (i.e. what is the average Trump voter like?)
Of course, even when the mean is the meaningful given the truth that we seek, sometimes, the noise is still informative. That 45% of the voters might support HRC over Trump is meaningful only if people don’t change their mind. If the support from a given population subset changes wildly from poll to poll, it may be a methodological folly, a product of small sample size coupled with sampling bias in each poll, or, perhaps, something actually meaningful about the unsettled and ambiguous state of mind among a certain group of voters. In other words, rather than something that interferes with the discernment of immutable truth, the noise can easily be a clue to how variable the truth itself is. Which one is it? Now, that’s where things would get interesting–and deserves much additional thought.