This article on NYT Upshot used to be titled “A favorable poll for Donald Trump has a problem,” I think. Whatever the case is, it is now retitled “may have a problem,” which I think is wise.
This addresses something that’s been bugging me, albeit more generally than the USC polls: what do you do with polling outliers?
To be entirely honest, I think USC polls are doing something innovative and insightful: the point raised by Gelman et al in this article is that accounting for party affiliation of the voters, which pollsters seem very averse to doing, is in fact a quite sound thing to do, given the stability of the party affiliation and the vote choice in today’s electoral environment. Indeed, the way USC folk have constructed their panel essentially achieves through the research design what Gelman et al did in their study by post stratification–creating a sample that incorporates a reasonable mix of Republican partisans and Democratic partisans.
I am not so sure if the problem where people overreport in favor of the winner is as big a problem as the NYT folk make it out to be: in the end, the past vote choice is itself a proxy, for party affiliation of the voter. As it were, past vote choice ought to have been closely correlated with the party affiliation today, with the Republicans recalling (probably correctly) voting for Romney and Democrats recalling voting for Obama. Unless an unexpected number of Republicans remembering voting for Obama were found in course of sampling–which would be a dead giveaway sign of the kind of overreporting in favor of the winner that could cause problems–I would not be so quick to dismiss the possible bias from overreporting–at least not too much.
Now, the problem that could arise from the type of weighing that USC folk have done is that, if the support for Obama were overrepresented in the sample to begin with, weighing them down to reflect the overall partisan balance would underrepresent them even more than they should be. It is in this sense where the USC poll might be overrepresenting the support for Trump–by undersampling, in effect, the latent Democratic voters and, relatively speaking, oversampling Republican voters. But this is not necessarily a fatal flaw once you have the full data: if someone does say that he or she voted for Obama in 2012–whether it is true or not–that is a valuable piece of information. This, in turn, can be used to identify other interesting questions, which, although somewhat indirectly, provide an understanding of what’s going on in this election. Just what kind of voters say that they voted for Obama but would support Trump, for example? How does this square with the actual electorate that Obama did have in 2012? Assuming that their answers are actually honest, what implication does this have for potential “swingability” of a given slice of the electorate?
People do not change their partisan stripes readily–at least, not in today’s environment. Whether it is honest or not, that a respondent should say that they were an Obama supporter is significant, as an indicator of their political inclination–and note that USC constructed its panel before the Trump (and Sanders) phenomenon hit.
In a sense, I may not impressed by this critique by NYT because of my own bias: I don’t care much for aggregate predictions from any poll, but am deeply interested in tracking how different demographics are moving–how their “swingability” (or “variability” in their choice) evolves over time. When the variability is high, the predictions become inherently hard to make. The composition of the electorate will be slightly different this time from 2012, to say the least. We should want to know how it will be different and how different components of the electorate will behave–will they behave as they did in 2012, or will they do something else? The way USC has been conducting its polls is potentially much better at gauging the variability, even if it may be poor–perhaps!–in predicting the outcome.
PS. A better way of describing my view towards use of statistics is that, as long as the data itself is true–that is, not made up out of whole cloth–every analysis reveals some aspect of the real world. The “prediction” is not so much important as the revelation of the real variability in the data, conditional on the approach. But, the catch is that, in order to learn what a given approach to data has to teach us, we need to pay attention to how they got to their conclusions, rather than their “conclusions.” It may be highly improbable that Trump is doing as well as the USC polls suggest that he might be. If he is indeed not doing so well, what is it about the technique used by the USC team that biases the result? If he is indeed doing so well–and all others are missing it–what are they catching that others have missed. And, even if Trump is not doing so well, as long as the data is true and the methodology is sound–and both of these seem to be the case–they are still capturing something about the “truth” that others are not, even if the “prediction” might be off. At this stage, even if we might be quite sure that USC polls are “inaccurate” as the predictor of the results, they are using a novel technique that, quite frankly, makes a good deal of logical sense. It is in the deconstruction of what they have done that we will learn, not in trashtalking over whether they got the conclusion right or wrong.