Information, Uncertainty, Incentives, and Trust.

Sandeep Baliga at the Cheap Talk blog has an outstanding summary of the contributions by Bengt Holstrom and Oliver Hart, the latest winners of the Nobel Prize in Economics.

The Holstrom-Hart Nobel is a bit personal to me, albeit through an indirect route, via one of my former teachers Paul Milgrom.  Paul liked to talk about how he came to graduate school not for PhD, but for MBA because he wanted to be an actuary, and how he found ways to apply actuarial thinking to economic theory.  Given the contributions by Holstrom and Milgrom that I found most enlightening brought together statistics and epistemology to a theory of incentives, this is an apt starting point for my reflection on their work.

The example by Baliga is an excellent illustration of the basic problem:  a worker at a burger joint does two things, one easily observable (the number of burgers), the other not so observable (the work in the kitchen).  By tying the incentives only to the number of burgers sold, the principal winds up discouraging kitchen work, and in so doing, subverting his own interests.  The solution is to create a low-powered set of incentives that depend comparatively little on burger sales.

But this opens up a whole slew of other questions.  Two questions pop into my mind immediately because these concern my eventual work in political science, especially with regards the relationship between voters and elected officials.  First, does the principal really know where the unseen parts of the business is?  Second, how does the principal know if the kitchen is being genuinely looked after?

In the legislative arena, the functional equivalent of burger sales come from the public record of legislative accomplishments and actions:  the number of bills, the voting record, etc.   Yet, these constitute comparatively little (and often, easily “faked”) aspects of the legislative work. Fenno and Mayhew, back in 1960s and 1970s, had written about how valued the “gnomes” (to borrow Mayhew’s terminology) who slave away at the unseen aspects of legislative and policymaking work without public accolades are by the legislative insiders, who reward them with currency that are particularly valuable intralegislatively.  Yet, this understanding is not shared by the members of the voting public, nor, apparently, by political scientists lately.  Very few regular voters appreciate how complicated the inner workings of the legislative process is, the kind of hidden negotiations and compromises that are needed to put workable bills and coalitions together–especially bipartisan coalitions.  Still, there is an implicit understanding that, without legislative outcomes, something isn’t being done right, that their agents are shirking somewhat and somehow that prevents their production–perhaps they are right in their suspicion.

The more problematic might be the obsession of the political science in putting data in place of theory (notwithstading the immortal Charlie Chan quote, “Theory, like fog on eyeglass, obscures facts.”–because “data” is not same as “facts.”)  The visible part of the legislative accomplishments, often padded by “faked” votes designed only to put votes on records (for example, the increasingly innumerable but meaningless “procedural” votes in the Senate designed only to publicly show who’s on which side, more  or less), are used to generate various statistics that purport to measure things like “ideology,” which, in turn, are assumed to be homologous to Euclidean space, and are fitted into models.  Since the measures are derived from the observed facts, they describe what goes on fairly accurately–but with significant exceptions that change over time, which are usually dismissed with the claim that they are mere “errors” and “nuisance.”

Fenno and Mayhew thought things differently.  Granted, they didn’t have the kind of legislative data or the tools for analyzing them that their more modern counterparts do (this is literally true:  the changes in Congressional rules around 1975 immediately tripled the number of recorded votes in the House, for example–coinciding neatly with the changes in House organization that followed the ouster of Speaker McCormick, engineered by the liberal Democrats.)  They saw the paucity of data that prevented data intensive analysis on their part as a normal part of the political process, where the seen and the unseen coexist and the importance of the unseen aspects of politics is deemed as important, even by those who did not know the specifics–e.g. the voters.  That brings the question back to what prompted to Holstrom to wonder, why so few contracts are written based on the “sufficient statistic” criterion, and as such, echoes the argument by Weber 100 years into the past (to be fair, there’s a paper by Oliver Williamson on this very point–if I could find it.)  Weber’s argument was twofold.  First, the compensation for the “professional” (“bureaucrat” in his terminology) should be low-powered, set without much regard for the visible indicators of performance because how exactly the professional “performs” is too noisy and complicated to measure with precision.  In turn, the professional should develop a code of ethics and honor–“professional conduct,” literally–whereby their work is carried out dutifully and faithfully without regard for the incentives in the contracts.  If you will, the mail will be delivered with utmost effort, as a point of honor, through rain, snow, or sleet, because that’s what mailmen do, so to speak. Most important, both must be part of the common knowledge:  the professionals “know” that they will be paid no matter what, while the principals “know” that the professionals are doing their utmost, even though the results are not necessarily obvious.  In other words, I don’t know what exactly they are doing, but whatever it is, I know it’s important, dang it.

This is a difficult equilibrium to sustain, with a LOT depending on the players’ beliefs, and potentially open to a lot of abuse and suspicion.  Mike Chwe might say that these beliefs, in turn, would require a lot of cultural trapping to sustain, various rituals carried out to show that the “professionals” indeed are being “professional.”  The “home style” by the legislators whereby they return home and engage in various ritualistic interactions with their voters to show their tribal solidarity might be seen in the same regard.  One might say that a lot of seemingly irrational socio-cultural activities, such as belief in creationism, are exactly that as well.  Of course, this is the kind of equilibrium that IS being subverted by the tilt towards visible data:  as we can see below, the correlation between Democratic shares of House votes and the DW-Nominate scores of the incumbents (with signs adjusted):

correlation

What the graph is showing is that, if you know the voting records of a House member in the preceding session of Congress, you can predict his vote share with increasing accuracy as 20th century progressed.  It does mean that the voters were becoming more “policy-minded,” in the sense of measuring their evaluation of the politicians more on the basis of visible record, but does it mean that the voters were becoming more “rational”?  To claim that would presuppose that the performance of the burger joint depends only on the burger sales and that kitchen is irrelevant to its success. Holstrom (and Max Weber before him) would say in no uncertain terms that that’s stupid.  But what does this mean for the trends in politics today?  I’ve been making a series of argument (and was halfway through a book manuscript) on this very point, but shockingly few people seemed to care, even if, I strongly suspect, the mess of the 2016 elections is a sharp reminder of this problem.

This is an illustration of the potential danger that the data-intensive environment of today is posing us:  because we have so much data, we become contemptuous of the unquantifiable and unaware of the potential limitations of the data that we are using.  If the data is always right, so to speak, i.e. has zero error, there can be no statistics that can be done with it, so to speak.  Then we’d know THE answer.  We do statistics to be less wrong, not necessarily to be “right” (I’m paraphrasing my old stats prof.)  If we insist on mistaking statistics (or indeed “science”) for the “right answer,” woe be upon us.

PS.  One great irony is that, while, intellectually, Paul was one of major influences on my way of thinking, I had precious little interaction with him when I was actually at Stanford. By the time he was teaching his “module,” (Stanford econ reorganized its graduate courses  when I was there so that we had 4 “modules” instead of 3 quarters.  Go figure) I was fairly deep in my occasional depressive spirals and was unable to do practically anything, let alone prepare for prelims.  In a sense, studying for econ prelims is easy–you literally have to study the textbooks and know the formulas, so to speak–just the answers you are supposed to know, even though, admittedly, the questions will be hard.  But depressed people have the hardest time doing routine chores when locked up, figuratively speaking, without anyone talking to them.  It is easy, in a sense, for people who have no stakes to think that depressed people ought to be locked up by themselves until they are better.  In practice, what almost always happens is that, after being locked up for a few months, they will be far more damaged than when they began.  But talking to depressed people requires way too much commitment for people without stakes of their own, too much to be asked of “strangers.”

 

Advertisements

Poststratification, or Why Look at Crosstabs.

Andy Gelman has an excellent post that almost constitutes a direct salvo in defense of the USC polls being conducted for LA Times.  (NB:  I think USC polls are still skewed somehow, but there are also important bits of information being lost in the usual polls that it provides).

The truth about elections and parties is simple:  party id’s don’t change too much and people tend to vote party.  If a poll shows a big swing in one direction or another, accompanied by a big swing in the partisan composition of the respondents, then the poll is probably capturing a big swing in survey response bias that correlates with partisanship.  We want to know what the choice will be, conditional on all sorts of factors known to be correlated with  the vote choice–and partisanship is indeed highly correlated with the vote choice.

In context of 2016, this is especially relevant since Trump’s unpopularity is especially acute among the Republicans–at least, those voters who are reliably Republican in most elections:  wealthier white suburbanites.  He is barely tied with Clinton among this subsample, according to various polls, and trailing significantly behind among the women among them.  These are, in a sense, far more significant predictors of Trump’s defeat than how unpopular he is with the minorities, who tend to be overwhelmingly Democrats anyways.

Soggy!

I have to confess that I did not know about the “cruchiness” vs. “sogginess” distinction laid out by British journalist Nico Colchester until now.   Unfortunate, since this captures my idea of variance-centric thinking quite well.

The premise behind variance centric thinking is that the world is soggier than we’d like to believe.  Consider the following scenario where we have some input into a system that generates a distribution of outcomes.

X -> [Black Box] -> Y

We want to learn how this “black box” works by working through the relationship between X and Y.  But, while we’d like to believe that a neat “functional” relationship exists between X and Y, i.e. where every value of X is associated with exactly one value of Y, this is not really true most of the time.  Every value of X is associated with a range of the values of Y, a probability distribution, if you will.  Instead of a neat and “comfortable” formulaic universe where pushing a red button will produce a can of Coke with certainty, we might get a situation where you get a can of Coke only with 75% probability, a ball of yarn with 15%, and a bowl of worms with 10%, or whatever.  Even if you do exactly same thing again, you will probably get different outcomes, and you’d be crazy only if you expect the same thing to happen each time, to turn a well-known saying upside the head.

An article in Computer Weekly from a few years ago, it seems,  arrived at a similar conclusion:  even if you have huge data piles, sometimes, the insights they offer have such large variance that you still don’t know what will happen if you push the red button.  All the that data tells you is that you don’t really know what you will get (exactly, at any rate) if you do press it.  The data about red buttons, if you will, is not sufficiently crunchy in that the relationship between the input and output is not clearly and neatly defined.  Perhaps better to seek out crunchier data where you know what you will get if you do exactly the same thing over and over again.

The problem with the search for crunchier data is that, often, crunchy data does not exist.  Even if some crunchy data might exist at some time (e.g. Christmas rallies in stock markets), human strategic behavior, e.g. arbitrage, quickly wipes them out once they become known.  Sometimes, besides, you have to deal with problems you have, not the problems you wish you had–and red buttons may be all that you have to deal with.

Or, in other words, you need to understand how crunchy or soggy your data is.  This is variance-centric thinking comes in.  Variance tells you how soggy your data is, and how to deal with it once you figure it out.  Maybe you do want to learn its mean, if the data is crunchy enough…but with the proviso that, depending on its crunchiness, the means may not be good enough.  If the data is very soggy, means may not be worth knowing.

(Sometimes) The Noise is the Signal!

One misleading notion that use of statistics got into people’s heads is that there is some kind of “truth” that using statistics can help get at, only if we can cut through the “noise” that interferes with the search.  I think this is a very dangerous idea.

Most of the statistics concerns itself with conditional means of a population (that is represented by a sample).  If we are to say that the average Hungarian is, say, 171cm tall, this is not the “truth.”  The “truth” is the height of an individual Hungarian, whether he/she is 195cm tall or 152cm tall.  All that the summary statistics of Hungarian height data will tell us is that the “truth” consists of some distribution centered around 171, not that “the truth” is 171cm that is obscured by all the Hungarians who are not approximately 171cm tall.  In other words, we want to know if the “truth” that we seek has anything to do with the distribution (i.e. the variance and other nth order moments) or the mean. Sometimes, the mean is a reasonable approximation for the truth (i.e. what is the % of the voters who might vote HRC over Trump), but sometimes, it isn’t (i.e. what is the average Trump voter like?)

Of course, even when the mean is the meaningful given the truth that we seek, sometimes, the noise is still informative.  That 45% of the voters might support HRC over Trump is meaningful only if people don’t change their mind.  If the support from a given population subset changes wildly from poll to poll, it may be a methodological folly, a product of small sample size coupled with sampling bias in each poll, or, perhaps, something actually meaningful about the unsettled and ambiguous state of mind among a certain group of voters.  In other words, rather than something that interferes with the discernment of immutable truth, the noise can easily be a clue to how variable the truth itself is.  Which one is it?  Now, that’s where things would get interesting–and deserves much additional thought.

Politics and Curiosity.

Dan Kahan, whose work I like a lot, has a fascinating new paper out.

The great advance that Kahan and his coauthors make is to attempt systematically defining and quantifying “curiosity.”  I am not sure if what they are doing is quite right:  enjoying science documentaries, for example, does not mean one is or is not “curious.”  (I’d found some science documentaries to be so pedantic that and assertive of the filmmakers’ own views that they were nearly unwatchable, but good science documentaries point to the facts, then raise questions that follow from them without overtly giving answers, for example).  But a more useful perspective on curiosity comes from how one reacts to an unexpected observation:  a curious person reacts by wondering where the oddity came from and investigating the background thereof; an incurious person starts dismissing the oddity as irrelevant.  The third component of their instrument, the so-called “Information Search Experiment,” however, gets at this angle more directly.

Observe that curiosity is, potentially, at odds with simple scientific knowledge.  On surface of the Earth, the gravitational acceleration is approximately 9.8m/s^2.  There was a physicist  wtih web page dedicated to scientific literacy (that I cannot find!) who had a story about how his lab assistant “discovered” that, under some conditions, the measured gravitational acceleration is much smaller.  While this finding was undoubtedly wrong, there are different approaches with which this could have been dealt with:  the incurious approach is to dismiss it by saying that this simply cannot be, because the right answer is 9.8m/s^2.  The curious approach is to conjecture the consequences that would emerge were the different value of the gravitational acceleration true and investigate whether any one of them also materializes.  The usual approach taken, even by scientifically literate persons, is the former, especially since they know, with very little variance, that the gravitational acceleration has to be 9.8m/s^2.  It is rare to find people who react by taking the latter path, and to the degree that “scientific literacy” means “knowing” that the variance of 9.8m/s^2 being the correct answer is small, it is unsurprising that “scientific literacy” is often actually correlated with closed-mindedness and politically motivated reasoning.  (which Kahan had found in earlier studies)

This does make for an interesting question:  I had mused about why creationism can be a focal point, but the proposition that 1+1 = 3 cannot.  Quite simply, 1+1 = 3 is too settled a question (or rather, ruled out by too-settled consensus) to serve as a focal point, while, for many, evolution is not yet sufficiently settled a question.  To the degree that, on average, social consensus tends to converge to the truth (even if not always the case), overtly false “truisms” cannot serve as focal points indefinitely–even if they might persist far longer than one might expect, precisely because they are so useful as focal points.  But the more accepted truisms are, the more likely that contrary findings–even true ones–are to be dismissed without further question as simply being “abnormal.”  In the example above, the probability that a lab assistant simply made a mistake that led to abnormal finding is simply too high compared to there being an actual discovery.  As such, this is not worth wasting time investigating further, beyond berating the hapless  lab assistant for not knowing what he is supposed to be doing.  However, to the extent that “knowledge” is simply an awareness of the conventions, it systematically underestimates the variance in the reality and discourages curiosity as a waste of time.  This, furthermore, is not without justification as the conventions reflect “established truths” that are very nearly certainly true (i.e. with very little variance.)  When people become too sure of the received wisdom where the true variance is actually quite high, a lot more legitimate discoveries are bound to be tossed out with dismissiveness.(Underestimating variance in the name of the received wisdom is exactly how the great financial meltdowns happen:  to borrow the line from the movie The Big Short, those who defy the conventional wisdom will be ridiculed by being badgered with “are you saying you know more than Alan Greenspan?  Hank Paulson?”  Well, physics progressed because, on some things, some insignificant young man named Albert Einstein knew more than Isaac Newton–before he became the Albert Einstein.  Physicists took the chance that Einstein might actually know more than Newton, rather than dismissing him for his pretensions.  The rest is history.  (NB:  one might say that the structure of physics as a way of thinking probably made this easier:  Einstein was able to show that he might be smarter than Newton because he showed what he did without any obvious mistake using all the proper methodology of physics.  But then, physics established that it is about the right methodology and logic, not about the “results.”  This is, in turn, what bedeviled Galileo:  he might have gotten the answer more right than the contemporary conventional wisdom, in retrospect, in terms of approximating the reality–although he was still more wrong than right overall–but he could not precisely trace the steps that he took to get to his answers because the methodology to do so, quite frankly, did not yet exist–they would be invented by Newton centuries later.)

The real scientific literacy, one might say, should consist of a blend between scientific literacy and curiosity:  knowing where the lack of variance is real and where the lack of variance only reflects the reflected consensus, so to speak.  Is 1+1 =2 really true, or does it seem true because everyone says it is?  I have to confess that I do now know what the best answer to this question is.  On simple questions like 1+1, demonstrating the moving parts may be easy enough.  On more complex questions, it is far easier to simply tell people, “trust us:  X is true because that is true, and we should be trusted because of our fancy credentials that say that we know the truth.”  Perhaps, beyond some level, truth becomes so complex that a clear demonstration of the moving parts may no longer be possible.  If so, this is the only path for even partial “scientific literacy,” especially since simple broad awareness of the social conventions that are approximately right (i.e. right mean, wrong variance) might be more desirable socially than everyone wandering about looking for real answers without finding them.

Unfortunately, this turns “science” back to a question of religion and faith.  Rather than product of scientific investigation doused with suitable amount of skeptical curiosity, “science facts” simply become truisms that are true because “high priests” say so, with the real moving parts consigned to “mysteries of the faith,” with the potential for a great deal of abuse, including the persecution of the heretics, literal or figurative, most of whom may be cranks, but may also include some real insights that happen to deviate from the received wisdom more than it is expected to.  This is, of course, politically motivated reasoning revisited, with the sober implication that we may not be able to separate “politics” and “religion” from “science” easily.

 

Variances and Regressions

Usually, when we run regressions or any of its near relatives, we are dealing with the changes in the means.  Typically, especially if you are thinking about variances, this is not the question that you want to know about.

It is not unreasonable that, an increase in test scores on average as a function of some variable also increases the variance–i.e. more students’ scores might drop while, at the same time, others gain even more.  This is not just a matter of statistical nuance, but a substantial pattern with real consequences.  (see the whole debate over inequality, following Piketty’s book)  Of course, this involves dispensing altogether with the homoskedasticity assumption–because, if that were true, this would not even be a problem.  But, precisely because homoskedasticity is built into the assumptions of OLS regression, nobody can even think about this problem systematically.

There are potentially approaches one can take that leverages off of tests and “remedies” for heteroskedasticity to deal with this, perhaps at a systematic level.  That, however, might take a bit more work than not…  Has anyone already done any significant work on this?

Variance Centric Thinking and Making Sense of Trumpism

Carl Beijer’s argument about sources of support for Trump is an excellent example of how variance-based thinking beats means-centric thinking.

For a means-based thinker, Trump’s message is about trade, racism, and other ideas that don’t make a great deal of sense.  However, since his message can be placed on a map, so to speak, it follows this line of thinking that his followers are motivated by the mean of his message that can be calculated.  Since the mean of his message lies in the realm of insanity, it follows that those who follow his message must be as crazy as well.

The variance-centric thinking does not presume to measure the mean of Trump’s message:  it is all over the place so that calculating the mean does not make much sense.  (If the distribution is very fat, you can be very far away from the mean and still catch a meaty part of the distribution.)  They key is simply that Trump is different, drawing an usual, but very identifiable following who may or may not take his proposals seriously.  The important thing is to identify those who are–and why they don’t fit the existing paradigm, rather than why they fit Trump’s paradigm.  To paraphrase Tolstoy, if all unhappy families are different, asking why they are unhappy will yield too many answers none of which will be universally true.  It makes more sense to ask why aren’t they happy–that is, what their deviations are from the formula of the happy families and start the analysis from there.  (In the like vein, the taste preference of Bud Light drinkers cannot be cleanly identified–they are a heterogeneous group who choose Bud Light either because it is cheap, they like the ads, they have no taste, they had odd taste that is not being met by anyone else, or because they genuinely like Bud Light.  If you think you can profit by identifying the “Bud Light taste” and profit by jacking up the price, you will be in for a huge disappointment.)

So the Trump voters are…different and heterogeneous, but they are similar in that they are unhappy with the status quo and are drawn to Trump for many different but understandable reasons–most of which are directly or indirectly tied to the economic and social changes.  Trying to put them all in the box, by identifying their mean and using it as a stand in for them all is bound to disappoint because their variance is so huge.

PS.  In a sense, it is really a matter of knowing the limits of your data:  can you profit by knowing the true mean of Trumpists’ (or Bud Light drinkers’) preferences?  What is the variance on their tastes, preferences, or motivations?  If they come with sufficiently small variance, then yes, you do have a lot to gain by knowing their mean preference with precision.  If they don’t, then all the extra data will only give you an erroneous belief that you know what the truth is, not the truth itself, which is inherently too variable to be pinned down.

Variance vs. Means Centric Statistical Thinking: An Illustration.

I’ve written a lot about means-centric vs. variance-centric statistical thinking.  So what do I mean by it?  In the former, we focus on the ability to “predict” something, at its mean or whatever.  In this sense, variability in the data is a nuisance, even an enemy, something to be minimized.  In the latter, however, we want to know what variables cause the biggest variability in the data.  The variance is not only an essential component, it is the main subject of our thinking.  If we can predictably forecast outcomes, it is not only boring, it is also something that can be manipulated and “pumped,” until it is broken (as per my discussion of how signals in a correlated equilibrium can be abused until it breaks, or indeed, as is the logic behind Stiglitz-Grossman Paradox, which is really just a variation on the same argument).

In the end, really, math that accompanies both approaches turn out to be the same:  in the means centric approach, you identify the predictor variables that help minimize the “errors”; in the latter, you identify the variables that happen to be correlated with the variances–which turn out to be the ones that minimize the “errors.”  This convergent evolution, unfortunately, obscures the fundamental philosophical difference between the two approaches.

An illustration might be found using some data from the 2016 primaries.  Consider the following plot.

trump and dem2012 in white and black

The graph illustrates the support for Trump in primaries as a function of the Democratic voteshare from the 2012 election, with two different types of counties:  whether the county population is above or below 75% white, which is roughly the national average–the red dots indicate the counties with below 75% white population.  The biggest variability in the Trump support can be found in the areas where Romney did well in 2012 (i.e. small Democratic voteshares):  the Republican primary voters in Republican dominated areas with large minority populations did not like Trump, while those from the counties with largely white populations did not have much problem with him.  Yes, it is true that Trump did well in many counties with both large Republican majorities and significant minority populations, but the counties where he performed poorly conditional on large Republican majorities are mostly characterized by large minority populations.  As a predictor, this is terrible:  because the conjunction of large minority population and large Republican majority from 2012 does NOT predict weak support for Trump necessarily–there are too many exceptions for that.  But, the reality is that conjunction of all these variables moving in the same direction does not happen–to pretend that they do so feeds into the conjunction paradox identified by Tversky and Kahneman, in which people think the conjunction of characteristics believed to be correlated with each other, rightly or wrongly, is also the most likely–e.g. “Linda is a bank teller and is active in the feminist movement” rather than “Linda is a bank teller.”  People already prone to believe conjunctions happen with too great a frequency already (which partly accounts for the beauty contest game–how people trying to follow a “narrative” systematically downplay the variance)!

From the variance-centric perspective, the large gap that opens up for Trump’s support in Republican friendly areas with large minority populations is not merely interesting–it IS the point.   It is the variability that we are interested in.  Incidentally, this is why Trump’s support numbers are jumping wildly–his support in many red states (i.e. the South–where Republican electoral dominance and large minority populations coincide) is highly uncertain, leading to what Nate Cohn calls “Trump’s Red State problem,” which, to be fair, should already have been apparent from the primary data already–and the polls that showed Trump’s serious national level unpopularity consistently indicated that he is characterized by particularly low popularity among the Republicans.

The key reason that this cannot be readily translated into a prediction is that we know more than the data itself, or rather, we have a broader context that includes data from elsewhere, in which to place the present data.  As Gelman et al observe, that respondents say that they voted for a particular party in the last election (in a poll) is a significant piece of information known to be highly correlated with their present latent choice, even if we may not entirely trust their response to be accurate or honest.  To insist that this be ignored is foolish–even if it cannot be taken at its face value, especially if it is correlated with a particular variability seen in the data.  To the degree that the reality is inherently complex and uncertain, coming up with a fully accounted for prediction model that can predict everything is, quite literally, in the realm of impossibility.  Much better to adopt a two step approach to learning:  identify the sources of variability, then investigate for the correlates of the variability, with the awareness that variability itself is a random variable–i.e. the variance itself may be correlated with particular variables themselves.  (NB:  homogeneity is an absurd assumption and not really a necessary one, except to make OLS BLUE, sine variance is always variable…)

 

On Polling Outliers

This article on NYT Upshot used to be titled “A favorable poll for Donald Trump has a problem,” I think.  Whatever the case is, it is now retitled “may have a problem,” which I think is wise.

This addresses something that’s been bugging me, albeit more generally than the USC polls: what do you do with polling outliers?

To be entirely honest, I think USC polls are doing something innovative and insightful:  the point raised by Gelman et al in this article is that accounting for party affiliation of the voters, which pollsters seem very averse to doing, is in fact a quite sound thing to do, given the stability of the party affiliation and the vote choice in today’s electoral environment.  Indeed, the way USC folk have constructed their panel essentially achieves through the research design what Gelman et al did in their study by post stratification–creating a sample that incorporates a reasonable mix of Republican partisans and Democratic partisans.

I am not so sure if the problem where people overreport in favor of the winner is as big a problem as the NYT folk make it out to be:  in the end, the past vote choice is itself a proxy, for party affiliation of the voter.  As it were, past vote choice ought to have been closely correlated with the party affiliation today, with the Republicans recalling (probably correctly) voting for Romney and Democrats recalling voting for Obama.  Unless an unexpected number of Republicans remembering voting for Obama were found in course of sampling–which would be a dead giveaway sign of the kind of overreporting in favor of the winner that could cause problems–I would not be so quick to dismiss the possible bias from overreporting–at least not too much.

Now, the problem that could arise from the type of weighing that USC folk have done is that, if the support for Obama were overrepresented in the sample to begin with, weighing them down to reflect the overall partisan balance would underrepresent them even more than they should be.  It is in this sense where the USC poll might be overrepresenting the support for Trump–by undersampling, in effect, the latent Democratic voters and, relatively speaking, oversampling Republican voters.  But this is not necessarily a fatal flaw once you have the full data:  if someone does say that he or she voted for Obama in 2012–whether it is true or not–that is a valuable piece of information.  This, in turn, can be used to identify other interesting questions, which, although somewhat indirectly, provide an understanding of what’s going on in this election.  Just what kind of voters say that they voted for Obama but would support Trump, for example?  How does this square with the actual electorate that Obama did have in 2012?  Assuming that their answers are actually honest, what implication does this have for potential “swingability” of a given slice of the electorate?

People do not change their partisan stripes readily–at least, not in today’s environment. Whether it is honest or not, that a respondent should say that they were an Obama supporter is significant, as an indicator of their political inclination–and note that USC constructed its panel before the Trump (and Sanders) phenomenon hit.

In a sense, I may not impressed by this critique by NYT because of my own bias:  I don’t care much for aggregate predictions from any poll, but am deeply interested in tracking how different demographics are moving–how their “swingability” (or “variability” in their choice) evolves over time.  When the variability is high, the predictions become inherently hard to make.  The composition of the electorate will be slightly different this time from 2012, to say the least.  We should want to know how it will be different and how different components of the electorate will behave–will they behave as they did in 2012, or will they do something else?  The way USC has been conducting its polls is potentially much better at gauging the variability, even if it may be poor–perhaps!–in predicting the outcome.

PS.  A better way of describing my view towards use of statistics is that, as long as the data itself is true–that is, not made up out of whole cloth–every analysis reveals some aspect of the real world.  The “prediction” is not so much important as the revelation of the real variability in the data, conditional on the approach.  But, the catch is that, in order to learn what a given approach to data has to teach us, we need to pay attention to how they got to their conclusions, rather than their “conclusions.”  It may be highly improbable that Trump is doing as well as the USC polls suggest that he might be.  If he is indeed not doing so well, what is it about the technique used by the USC team that biases the result?  If he is indeed doing so well–and all others are missing it–what are they catching that others have missed.  And, even if Trump is not doing so well, as long as the data is true and the methodology is sound–and both of these seem to be the case–they are still capturing something about the “truth” that others are not, even if the “prediction” might be off.  At this stage, even if we might be quite sure that USC polls are “inaccurate” as the predictor of the results, they are using a novel technique that, quite frankly, makes a good deal of logical sense.  It is in the deconstruction of what they have done that we will learn, not in trashtalking over whether they got the conclusion right or wrong.

Critical Thinking and the Variance, Formulaic Thinking and the Mean.

I might have been completely missing the point, but this article in LA Review of Books had me refining my thinking about means-centric thinking vs. variance-centric thinking.

In effect, the article is bemoaning the decline of critical thinking in all manner of venues in favor of “content,” or as the author terms it, #content.  What counts as #content, in turn, is determined by whatever it is that the audience wants.  Or, in other words, the problem quickly comes to resemble the beauty contest game.  The statistically meaningful consequence of beauty contest games, in turn, where the players try to base their decisions not on what they themselves want but what they think other players want is not that they necessarily misjudge, at least on average, but that they become too stereotyped in their thinking, so to speak.  The distribution of true preferences usually feature substantially larger variance than the distribution of anticipated variances.  Put differently, when the media attempts to deliver the #content that they expect that people want, rather than trust their judgment to come up with something on their own, they not be too different on the average across multiple instances, but their variances certainly would be.  In the context of individual draws, in turn, this has a significant implications.

A relatively simple demonstration is in order.  What is the expected distance between observations drawn randomly from a distribution with a small variance and another distribution with the larger variance?  For the sake of simplicity, let us take a standard normal distribution and another normal distribution with the standard deviation of 10.  The distribution of the difference will be distributed as a normal distribution with mean 0 and the standard deviation of around 10.5, but the mean 0 is only due to the negative differences exactly cancelling out the positive differences.  The actual distances (or squares thereof) are distributed as the chi-squared distribution with two degrees of freedom (sort of–since true chi-squared distribution is the square of a standard normal distribution) arising from squaring a normal distribution of the variance 101 (or the square of 10.5).  Of course, this is practically definitional:  variance = E(x squared) – (mean of x) squared.  So as the difference in variance between the reflected conventional wisdom (what the players think other players want) and what the players really want increases, the actual average gap grows as well, on both sides!

It is also striking that this bears a curious resemblance to the coalition politics of “populism” today.  The talk of “dangerous radicals of left and right” became an epithet, but it captures a certain truism:  “radical” political leaders draw support from the left and the right, even if not exactly in the same proportions.  So the conventional politicians operating on the anticipation of what the electorate gets the mean right, but hideously underestimates the variance, and the gap between the reality and their program is increasing, even as they get the mean with ever precise precision, thanks to the Big Data and associated technologies?  This is an interesting thought….

PS.  An important caveat is that being able to guess the mean with ever increasing precision will not necessarily increase the gap.  The key is simply that the gap, if the true underlying data is sufficiently variable, cannot be made to go away even if you do know the mean precisely.  So the more accurate rendering of the argument is that the marginal gain from a more accurate guessing of the mean is small, when the natural variance is high.  Mathematically, the variance of the differences will decrease of the variance of one of the distributions goes down.

An interesting problem emerges, however, if this is viewed in context of polarization.  One might say that both parties, while reducing the variance, have grown farther from the mean of the popular distribution.  So, whereas, in the old era, the distribution of the differences might have been N(0, a+b) where both a and b were fairly high variances, we now face N(c, a+b’), where b’ < b, but c is significant.  So the mean gap squared (i.e. E(mean squared for the distribution of differences)) is now a+b’+ c^2, while it was simply a+b in the past.  If this gap is to be construed as the extent of “mean representation,” it is not necessarily clear if the present parties are any more “representative” than in the past.

PPS.  Yes, I am assuming independence and model completeness in distributions, which assuredly is not the case–even in the high variance era, things like “popular politics” and “home style” assured that outliers were, in fact, reflecting unobserved variables–which, all practical purposes, meant that the outliers in one distribution were correlated with outliers in the other distribution somehow.  But this goes well beyond a simpleminded model.