The devil in the data that left election forecasters with egg on their face this week has a familiar name — it’s the same villain that tripped up the banks that financed subprime mortgages back in 2007, causing the financial crisis. Its name is “correlated error.”

Prediction models can make very accurate forecasts based on many not-so-accurate data points, but they depend on a crucial assumption — that the data points are all independent. In election forecasting, the data points are polls, which are clearly imperfect. Every individual poll has a relatively large margin of error amounting to several percentage points, sometimes favoring one candidate, sometimes the other, all skewed by hundreds of small things — the specific respondents chosen, the means of contact, the phrasing of questions, the representation of voter demographics and so on. These errors can be magically smoothed out by poll aggregation, giving a much more accurate mean polling number — provided the errors in individual polls were all due to different causes, and were therefore independent and uncorrelated. We saw this magic in the accurate predictions made by forecasters like Nate Silver and Sam Wang in the 2012 elections.

But this year we saw something different: Almost all the swing state polls overscored Clinton’s numbers by two to six percent. This error is called “systematic” or “correlated error.” Since it affected most or all polls, it was probably caused by some common disrupting factor or factors that were outside the well-established and hitherto reliable poll methodology itself. It was this correlated error that completely threw off the prediction models. Likewise, leading up to the 2007 crisis, financial institutions misjudged the probability of massive subprime loan defaults because they failed to realize that the chances of individual defaults were correlated, not independent.

What could have caused this correlated error to skew all the polls in 2016? That’s a subject that pollsters are trying hard to research and pinpoint right now. It will take several months for their findings to be released. But here are two possible speculative causes that can explain this perfectly. I have to give credit for these to Michael Moore, who back in July wrote an amazingly prescient article in which he predicted exactly how Trump would win in excruciating detail. Both of these reasons are ultimately related to the well-documented enthusiasm gap in this election just as it was in the Brexit vote, where a similarly large polling error took place.

1) Emotional voters: All of us are familiar with the situation where our minds incline one way and our hearts tug another. Answering a poll is a boring intellectual exercise, while casting a ballot in the solitude of a voting booth is an empowering, emotional one. It is easy to imagine somewhat conflicted voters who answered “Clinton” to a pollster but in a fit of emotion cast their vote for Trump. If a small, but consistent proportion of Trump voters acted this way, it would have affected all polls and given them all the same correlated error.

2) Depressed voters: Most pollsters try to determine how likely a respondent is to vote and factor that in their final numbers. If there were sizable numbers of Clinton voters who told pollsters that they fully intended to vote but on election day did not find the will or enthusiasm to actually go cast a ballot, that could also explain some of the correlated error. As Moore put it back in August, “If people could vote from their sofa via their Xbox or remote control, Hillary would win in a landslide.”

Other factors like the inability to contact rural voters have been proposed, but it seems to me that good pollsters should have been able to overcome those kinds of problems.

So even the best of the pollsters have a lot to learn. How about the modelers?

I think modelers need to make some changes too.

Consider a hypothetical state that had numbers similar to Michigan this year. The raw polls showed about a 3.5 percent edge for Clinton. I’ve tried to reverse engineer two simple models with predictions and behavior similar to the FiveThirtyEight and PEC models using the same kinds of tools they used. Imagine that Model 1 predicted a 70 percent probability of Clinton winning and Model 2 predicted a 99 percent probability. Here is how these predictions would have to be modified in the presence of systematic correlated polling error:

With correlated error of: | 0% | 1% | 2% | 3% | 4% |

Probability of Clinton win: | |||||

Model 1 | 70% | 65% | 59% | 53% | 47% |

Model 2 | 99% | 95% | 84% | 63% | 37% |

The actual correlated error for Michigan turned out to be four percentage points. If Model 1 had known and taken into account this magnitude of correlated error, its prediction of Clinton winning would have changed from 70 percent to just 47 percent, and Model 2’s prediction would have changed from 99 percent to 37 percent. Both models would have predicted a Trump win in this hypothetical scenario. What’s interesting is how large the swings in the probabilities are with very small changes in the correlated error.

Some readers here defended Nate Silver’s forecast, which had the probability of Clinton winning at 71.4 percent, on the grounds that it should not surprise anyone that about a one in three chance materialized. Technically, that is correct. I also agree that Silver’s model had some built-in defense against correlated error, while the other models had much less or none. But remember how large the swings in probabilities were in the models above. The modelers knew about the Brexit fiasco, which had a correlated error of four points, in an election with a similar “enthusiasm gap.” As I argued last month, it is extremely misleading to state such a potentially fragile probability to one decimal place: It implies that you are confident about the accuracy of the prediction to the precision stated. Most people are not deeply familiar with the technical details of a probability and tend to think of it as a “score” of the race. They are easily misled by the falsely stated precision. As I recommended then, probabilistic election forecasts should be dispensed with altogether and replaced with the seven-point qualitative scale already in wide use. If probabilities have to be stated, they should include a hedging statement that shows how much they would change in the presence of, say, a two or four percent correlated error as “margins of error.” If forecasters had done this, the potentially large error swings would have discouraged people from taking the forecasts as gospel truth. It would have saved the entire field of election forecasting from public embarrassment.

Hopefully, further research will identify the causes of correlated polling errors and find ways to detect them, and the modelers will build on the lessons learned from this humbling experience.

That is an interesting analysis of the testing limitations.

I would point out that a Bayesian approach to these polls as tests of the likelihood of a Trump or Clinton win would have left no one surprised. Since there were similar pre-election likelihoods for either to win, any testing by polls that were not exceedingly accurate had a very poor chance of predicting the outcome, as your model reveals.

I know why they got it wrong. Yet another reason is that Trump supporters were mailed not to trust the media. "98 pwrcent of the media is liberal". Don't speak to them. So they did not speak to you. It was a very effective tactic. I recived mail and spoke with people who said as much. You were not given the information on purpose. That is what a media blackout is intended to do. If your oppent does not know what you are up to you have the upper hand.

>>Almost all the swing state polls overscored Clinton’s numbers by two to six percent. >>This error is called “systematic” or “correlated error.”

Everyone, but everyone analyzing this election is assuming that the vote count we received was accurate, while every expert, every pollster, every commentator and political pundit got it wrong.

What basis do we have for trusting the vote count? Votes are counted by software programs. The software is coded by employees of a few private companies. It wouldn't take much — just a few lines of code — to boost Trump's results. Isn't it suspicious that the pollsters missed in the swing states, but they got it right elsewhere?

There is a <a href="http://www.thepetitionsite.com/?z00m=27308900&redirectID=2235833444">nacent movement</a> of people asking for an audit of the election. It's the least we should expect.

@Josh,

The pollsters didn't just miss swing states. There were large misses in Missouri, Utah and Maine among others, but they did not affect the outcome. Also, there was far more campaigning and a receptive audience in the swing states, so it should not be surprising that whatever caused the miss should show up there.

Sure election audits should be performed, and I'm sure they will be. But there seem to be enough safeguards: Vote counting machines are tested in front of all party representatives, have audit trails and candidates can ask for recounts. Cheating this way would be very risky especially as close races may trigger automatic manual recounts.

@Pradeep

Nate Silver's team repeatedly talked about this type of systematic, correlated error, as their reasoning as to why Trump's chances were far higher in their forecast than all the others. I don't think anyone can say it would be mathematically defensible for them to predict a >50% chance of a Trump win based on all the data. They even had an article named "Trump is a normal polling error away from winning." The direction of the polling error is very difficult if not impossible to predict though. Yes there's was the secret trump voter and the enthusiasm gap, but there was also the massive data-driven GOTV effort that Clinton had and Trump was lacking.

Your classification of 2012 is incorrect. The polling error in 2012 was as large as this year, it just happened to favor Obama in a way that didn't flip the states in their forecast.

In very plain terms, Clinton was ahead but there was a very significant chance of a correlated polling error in either direction (both nationally and regionally). 70-30 chance of winning was justified based on the data.

Nate did talk about what happened if he assumed that polling errors in every state were completely independent – his model went to 98% instead of 70%, which is one of the big mistakes made by many others.

There is another source of systematic correlated polling error obvious looking at the sampling in many of the polls that predicted a Clinton win. The agenda. Larger portions of Democrats were included in the sampling if you look at the polling data. There is a clear and apparent advantage granted in the appearance that one campaign is ahead of the other. It infers strength both motivating the base and discouraging opposition.

@Jesse,

Of course it was not mathematically defensible for Nate Silver to predict a >50% chance of a Trump win based on the data. I’m not saying that he should have done that. My point is related to presentation: I’m saying that he, and the other forecasters, oversold false precision, and did not give a “margin of error.”

Look, after Silver’s brilliant 2012 forecasting run, the public anointed him “Official Scorekeeper of the 2016 Presidential Race.” There were thousands of people who checked his site several times a day to see the current state of the race. The single number stated there with authoritatively stated precision, either reassured or terrified them, and certainly changed their behavior.

Most people, heck, even many mathematicians, don’t have a gut feeling of what a 71.4% probability is, even if they intellectually understand it. And even those who understand probabilities did not get a sense of how fragile that number was, how easily prone to change due to systematic polling error that pollsters knew was possible, as it had happened recently in the similarly emotional, identity politics-based Brexit vote. There is a yuge difference between a 71.4% probability that moves 5% with a given systematic polling error, and one that moves 25%. Which was it? What was the “margin of error”?

So here, in hindsight, is my recommendation for modelers on how they should report their calculations:

Probability of Clinton win (Range ± 1 Brexit*): 71% (47-95%)Expected Clinton Electoral Votes (Range ± 1 Brexit*): 318 (

242– 337)1 Brexit* = Systematic Polling Error (Bias) of 4 percentage points

Had the reporting been done this way, the possibility of a Clinton loss would have been palpable, in a way that that it wasn’t with a single probability number that gave no indication of how fragile that probability was. There would not have been the loss of credibility that followed.

As for your second point, you state “The polling error in 2012 was as large as this year.” This was indeed true for the national polls. The fact that states did not flip, however, indicates it was probably normal stochastic variation. There was not a large systematic polling error in the state polls in 2012, whereas this year it occurred in almost all the swing states, and many of the others. Of course we did not know about the Brexit phenomenon then. I believe the Brexit phenomenon is related to strong identity politics, which I’m afraid we’re probably going to be stuck with for many years. Hopefully new polling methods will emerge that can estimate it.

In any case, I think national polls are simply a waste of time and resources. If we are going to elect presidents by electoral vote, the popular vote is simply irrelevant.

So my second recommendation, this time for pollsters, is:

Forget national polls. Divvy up the country amongst yourselves and invest your time and resources to produce really high quality state polls.If anyone wants to get the national picture, they can simply add up all the state polls. You know the poll aggregating sites will do that, so don’t worry about it.

Finally, to round things up, here’s my final recommendation, this time for the media.

Do not ever report the results of just one poll. Every election update must include reports of several polls together, or a report of a poll of polls. No exceptions.I don’t think the media will listen, but I hope pollsters and modelers might. After all, they have a scientific reputation to salvage.

P.S. The field of Robust Statistics is relevant here.

Correlated error? That's a fancy name for bias. Prejudice. Bigotry. Corruption. Crime.

The "analysts" know in advance what the result must be, and adjust "facts" and "data" to match the expected result.

@polistra,

I must step in here. While correlated or systematic error is indeed known in statistical circles as "bias" this is a scientific term that has no connotation of intent. Rather it is a measure of inaccuracy caused by unknown systematic factors. It has absolutely no connection with prejudice unless proved to be so which would require a massive conspiracy theory involving thousands of hard working pollsters whose reputation for accuracy is their meal ticket. Were they to allow prejudice to enter their results, they would be shooting themselves in the foot in the worst way — not only by creating complacency in voters on the side putatively favored, but by reducing their own credibility which would likely result in the future loss of their jobs. Also remember that the polling bias in 2012 was 2% towards the other side — favoring Romney. (These polling errors generally occur towards the side that has less enthusiastic voters. Perhaps one day we will be able to quantify it.)

Such things happen despite the best intent. There is absolutely no justification to introduce terms like bigotry, corruption and crime into a serious scientific discussion. Please desist from doing so.

I'm a Trump zealot, and something of a wonk, PhD in social psychology

I may have missed the point here

(1)

"correlated polling errors" is sometimes called multicollinearity,, correlated predictors

https://en.wikipedia.org/wiki/Multicollinearity

"In statistics, multicollinearity (also collinearity) is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a substantial degree of accuracy."

There is no mention here of which predictor (error) variables were correlated

(2) the 2008 banking meltdown is NOT (imho) this correlation error, as follows –

collateralizing mortgage debt, which is what was done, assumes an orderly or predictable pace or rate of mortgage liquidations (pay-offs) aka tranches, or slices, said prediction based on past performance ( yadda yadda 'past performance is no guarantee of future…') but we make actuarial bets all the time

Thing is – we ALWAYS have the Black Swan (rare) event, which occurs of a certainty ! – and not too far out in time

=

nothing to do with polling error

=

the polls called a Hillary win /she won the popular vote,

the polls are unable to swap popular and electoral voters, at least not yet

=

we have the Bradly effect, shy (or inhibited intimidated Trump voters- I'm one)

=

and as polls tightened but stayed close to Clinton's lead, I said – NOT – and said that the undecideds would have a come to Jesus moment at 11:59PM and break for Trump, which happened

=

@Pradeep

You write:

—

There is a yuge difference between a 71.4% probability that moves 5% with a given systematic polling error, and one that moves 25%. Which was it? What was the “margin of error”?

So here, in hindsight, is my recommendation for modelers on how they should report their calculations:

Probability of Clinton win (Range ± 1 Brexit*): 71% (47-95%)

Expected Clinton Electoral Votes (Range ± 1 Brexit*): 318 (242 – 337)

1 Brexit* = Systematic Polling Error (Bias) of 4 percentage points

Had the reporting been done this way, the possibility of a Clinton loss would have been palpable, in a way that that it wasn’t with a single probability number that gave no indication of how fragile that probability was. There would not have been the loss of credibility that followed.

—-

Based on what you just wrote, I am quite sure you have a very fundamental misunderstanding of how the model arrived at its 71.4% chance.

The 71.4 % chance *already* incorporated the chance of a polling error as a fundamental input to the model. The relative probabilities of various magnitudes (and directions) of systematic polling errors (both nationally, by state, and by region), estimated by analyzing the historical polling data in presidential elections from 1972 onwards, were what were used (along with the current poll standings) to seed the 10,000 scenarios in the Montecarlo simulation from which the final win probability was derived.

In other words, we have the polls as a baseline input. For each run of the model, actual vote results = polls + national error A, state Errors B, C, D … etc.

Then we have run 2 of the model where A, B, C, D, etc. are different than in run 1.

Then we have run 3, 4, … up to 10,000.

How are these systematic error parameters varied for each run? Well, the error parameters A, B, C, D are randomly generated, and the probability distribution used to generate them is based on historical polling data versus election results. Also, the state errors are correlated with larger correlation factors regionally and smaller correlation factors nationally, and the final state and national values have to make sense in regards to a total vote balance.

One *then* tabulates how many times Clinton or Trump won to arrive at the final probability of winning the election.

If one read all the articles they put out, one would quickly realize they were reporting and explaining the meaning of the model's output very clearly. Basically – 71.4% of the time, the systematic polling errors (nationally, regionally, and by state – A, B, C, etc. in the model description I wrote above) will not be enough to hand Trump the Electoral college, and 28.6% of the time they will. Don't forget, there was a 10.5% probability of Trump winning in the Electoral college but losing the popular vote – so more than a third of the scenarios where Trump won, he did so while losing the popular vote.

Because there are various errors, it becomes very difficult to present the results. That being said, multiple articles were published at various points along the race by fivethirtyeight saying "if national polls shifted Trump's way by X points, he'd be at Y chance of winning instead of Z."

It seems like the closest thing to what you want is some kind of chart showing a further breakdown of the 10,000 models run according to the average national error parameter:

Error A: C+5, C+4, C+3, C+2, C+1, 0, T+1, T+2, T+3, T+4, T+5

Result: C (%),C(%),C(%),C(%),C(%),C(%),C(%),C(%),T(%),T(%),T(%)

@Jesse,

Yes, I agree that the probability itself includes some of the uncertainty. But the identity-politics based systematic shift seen in Brexit and this election is a new phenomenon, which there are no methods yet to estimate. Pollsters have not yet had a chance to develop ways to detect it and incorporate it into their results. Until that is done, and ways to estimate its magnitude are developed (by quantifying "enthusiasm" in some way, say), the possible disruption caused by unknown factors like it must be made explicit.

Also, as you imply, only professionals and people who pay close attention to the modeling methods can understand such distinctions at a deep level. When conclusions are presented to lay audiences and are taken seriously by them to the extent of influencing behavior, care must be taken that they are not misunderstood. This is "data journalism" after all, not pure mathematics. That's why my primary suggestion was to not have probability based forecasts at all, except maybe as the widely used 7-point qualitative scale (solid red, likely red, leaning red, tossup etc). If probabilities must be used (since people are addicted to having a score to follow), something like my recommendation would diminish misunderstanding greatly.

A range of predicted electoral votes would probably be best. Error bars are always a good idea.

@Jesse,

My chart was exactly an attempt to present your 0, T+1, T+2, T+3 and T+4 percentages which I obtained with a simple reverse-engineered model.

The ±1 Brexit range was your range C+4 to T+4.

@Pradeep

I agree with you on highlighting the pitfalls of polling, I just don't see a benefit for what you propose as displaying the probability given a +/- Brexit range versus the probability given *all* potential errors – including those greater than Brexit. The second case – the probability of all errors, including and beyond Brexit range, and in either direction – is what informed the final output probability of victory in the fivethirtyeight scenario.

In other words, the +/- Brexit range is you propose is *more* restrictive than what the model allows for: the various cases that make up the 10,000 runs in any montecarlo simulation allow for swings of even more than 4 points.

For example, looking at Oregon, where the polling average was 45.6% to 35.9% – Trump was still rated at 6.3%. In a strictly +/- Brexit, it would be 0%. In Wisconsin, the final polling average was 46.4% to 40.5% for Trump – even worse than his national lag. Yet the model *still* gave Trump a 16% chance of winning the state – likely due to good polling in surrounding, demographically similar states such as Iowa, and volatile polling in places like Pennsylvania – which highlighted a potential polling error in that polling region.

In any case – I agree that it's obvious that sites like PEC and Huffington Post vastly underestimated the potential for polling error and how systematic polling error could affect the outcome. I think that fivethirtyeight – especially when including their written pieces explaining the model and why it was so bullish on Trump as compared to the others – did a far better job at highlighting how polling errors could lead to a Trump win. It's difficult to say what the exact percentage should have been, but I think his probability was also a victim of herding in the national polls and should have been closer to the 35% value given before those final national polls came out.

I don't know how to quantify the effect of things like the ground game, advertising, endorsements, and enthusiasm aside from their effect on the polls. Things like advertising and endorsements should inform the polls, as should enthusiasm to some degree in terms of likely voter models. Ground game, in terms of knocking on doors and discussing things with voters, should show up in the polls as well, but not in terms of election day turnout.

Anecdotally, I know that my friends and I, registered independents, all received daily texts and (at a minimum) bi-weekly phone calls during the entire early voting period in Florida from the Clinton campaign, yet heard nothing from the Trump campaign aside from some fundraising calls earlier in the election. Either pundits over-estimated the impact OR correctly estimated the impact yet under-estimated the lack of enthusiasm for Clinton.

@Jesse,

The Brexit range is like a set of error bars. Yes, polling errors could be even higher than that, but this is a mental yardstick that people can use because it has already happened before.

In an ideal world, polls should incorporate information from all the factors you mention. However, this is a fledgling science so there are bound to be new challenges that polls haven't faced before. The disappearance of land lines is one difficulty that pollsters faced relative to 2012. Enthusiasm and passion is another factor that polls have not faced before to this degree (although it was there before, but not named and recognized – Obama had about a 1/2 Brexit shift in 2012). So there are always going to be things that we don't know we don't know that skew polls. By definition, we can't estimate their magnitude, because they are not in the polls, and therefore not in the model calculations. The Monte Carlo simulations have no principled way of estimating the probability of a given magnitude of polling error if the factor that generated it had no history.

The only way to warn people that such unknown errors could exist is to tell them how much your forecast will vary with a given systematic error in the polls.

It's important that we understand how the data got the Tories, Brexit, and Trump so wrong since they are likely just the beginning of a new worldwide populist movement. France and Germany hold elections next year, and there are already signs there of populists doing much better than expected.

Just want to point out that one of the big reason for the financial crisis was bad modeling. One huge failure was that the models for housing prices "assumed" that prices would never go down. Had the modelers run this assumption by any experienced banker or realtor, that assumption would have been debunked. Also, the data used for the models only looked at recent experience (last 5-10 years) in which prices had only increased.

As far as error correlation, in good models, that can be resolved (at least partly) by testing the model against an independent validation sample (not to be confused with a holdout sample).

Finally, making sure that the model is transparent enough to identify what variables are driving the predictions. This will help identify whether the model makes sense and will help to identify correlated variables within and across models.

Of course, none of this is easy or simple and cannot be easily automated, so when trying to make predictions on the fly (like during an election) many of these steps are just too time consuming.

there is only an error in the predictive and exit polls if you think the official tallies are correct.

If you trust the official counts why dont you also trust the prevote and post vote polls?

Why do the official counts show a red shift vs the exit polls in all electronic voting precincts but not paper trail precincts? why is there "correlated error"

in the red precincts and not the blue?

Richard Charnin finds that in 300 exit polls between 1988 and 2008 there are 138 that differe from official counts by more than a standard deviation 132 of those show red shift favoring republicans. what is this correlated error that works across multiple decades are there supposed to be shy bush voters as well as shy trump voters? in 2012 Ohio the ad hoc explanation was that Democrats over responded to exit polls. In fact the opposite was true, the Dems under responded. You need to consider the possibility the exit and prevote polls were correct and the official vote tabulations were rigged.

but i think computerized election fraud is not your specialty . read Code Red by Jonathon Simon for background.

Enthusiasm gap, HRC could not draw crowds in fact she needed surrogates to draw crowds throughout the campaign. Furthermore democrats core position on climate change, immigration, and globalization all are antithetical to the labor vote in the rust belt. Democrats will lose to any republican that runs on Trumps platform. Nixon and Regan both captured the blue collar vote.

Fake climate Change will be the dager that kills liberalism going forward, there is no reconciliation between labor and the left. In fact DJT a man who lives in a gold penthouse, reached an electoral majority in Pennsylvania with the support of the labor vote that the robber barons extinguished with the Kestone cops.

Statistics are no substitute for a keen sense of observation and history.