*Updated on November 9, 2016:*

There is only one person who correctly forecast the U.S. presidential election of 2016. His name is not Nate Silver or Sam Wang or Nate Cohn. It is Donald Trump. Trump made a mockery of the predictions of all the erudite analytical election forecast modelers. Uttering the battle cry of “Brexit Plus,” he confidently grabbed the thin sliver of a chance that the models gave him by winning the Sun Belt states of Florida and North Carolina and then, in a near-miraculous example of threading the needle, flipping not just one but three of the ordinarily blue Rust Belt states that formed Hillary Clinton’s “firewall” — Wisconsin, Michigan and Pennsylvania — to red.

Like everyone else, I am stunned. In my pre-election *Abstractions* post below, I commented that the “science of election modeling still has a long way to go,” but I must admit that the distance is far beyond what I had imagined. It seems pointless now to try to dissect the statewide predictions of the various models as I had promised to do — none of them were even remotely in the ballpark. It is unclear how long it will take before election forecasting is trusted again.

You could be kind and say that the election results were not incompatible with the model that showed the most uncertainty (Nate Silver’s), but there is no doubt that all the model builders completely missed the Trump win. His surprise victory took perfect advantage of the vagaries of the electoral vote system, even as the margin in the popular vote was razor-thin in favor of Clinton. But the modelers also missed something more fundamental, and they will have to revise their models to accommodate it. This was a systemic undetected polling error — a kind of invisible “dark matter” of polling — that underestimated support for Trump in key states by two to six percentage points.

In a comment on my original *Abstractions* post below, I said, “The uncertainty we have to account for is the uncertainty of things ‘we don’t know we don’t know.’” It turns out that this uncertainty is far larger than I thought. It will be interesting to read the full details of how Trump won as will no doubt be dissected in numerous post-mortems by both data analysts and pundits. In the meantime, it might be worth it to follow Sean’s suggestion and read about the professor who has been calling every single election correctly since 1982 on the basis of just 13 simple questions. He too, predicted a Trump victory, as did Michael Moore in an article that now seems prescient.

*Originally posted on November 8, 2016:*

## Why Nate Silver and Sam Wang Are Wrong

*Will the results of the U.S. presidential election discredit or vindicate popular election forecasting models?*

As voters head to the polls, two of the most celebrated and successful election forecasters, Sam Wang of the Princeton Election Consortium (PEC), and Nate Silver of FiveThirtyEight disagree “bigly.” The former predicts the likelihood of a Clinton win at over 99 percent while the latter put the chance of a Clinton win in the mid to high 60s all of yesterday, rising to the low 70s today. This has resulted in Silver blasting a Huffington Post article, which criticized his low estimate, and calling very high estimates “not defensible” in an invective-laced twitter storm, and Wang responding with a defense of his 99 percent model. It was bound to happen: These two models represent the extremes in public prognostication. Most other models, including those of The New York Times Upshot and betting sites, rate Clinton’s chances somewhere in the middle, at about 85 percent.

Interestingly, both the PEC and FiveThirtyEight models agree that the aggregated polls show Clinton ahead of Trump by about 3 percent nationally, and their predictions for the number of electoral votes Clinton will get are 307 and 302, respectively (270 are needed to win). This disparity in the probabilities, and relative agreement in the number of electoral votes, validates my comments in the last Insights column — that aggregating poll results accurately and assigning a probability estimate to the win are completely different problems. Forecasters do the former pretty well, but the science of election modeling still has a long way to go.

The problem is twofold: First, modelers do not estimate a margin of error for their uncertainty, and second, there is far too little empirical data to validate probability models. Mathematically, there are standard ways to generate a probability number as Sam Wang demonstrates. You let your model run, generate the difference between polling percentages (about 3 percent), calculate the expected error (say, 0.8 percent) and out pops the probability (say, 92 percent). This is based on the premise that all the assumptions that went into the model are true (Sam Wang considers the median of recent polls to be the true number). However, to get the margin of error of this probability figure we need to assign weights to all the other assumptions that you did not make — after all, an assumption is a plausible but unproven proposition and therefore its opposite also has some probability of being true. You would need to calculate the probabilities of a Clinton victory in hundreds of alternate models in order to find the margin of error of the probabilities. While this meta-modeling would put the probability estimate in perspective and be more accurate, notice that in the absence of enough empirical data, the likelihood of alternate assumptions would still be arbitrary. True accuracy would require a complex model that incorporated many more features than current models do, using data from hundreds of presidential elections, and we don’t have that luxury. An aggregation of existing models is the best simulation of such a meta-model that we have today.

There is historic evidence against Wang’s tight forecasts — he predicted a gain of 53 plus or minus 2 house seats for the Republicans in 2010, and they actually gained 63, as Harry Enten of FiveThirtyEight pointed out. On the other hand, there is no evidence that there is as much uncertainty as the FiveThirtyEight model suggests — it is far more likely that the observed tightening in the polls at this time is caused by people returning to their “home” party, as Wang has said. The best we can do is to aggregate models and discard the outliers — FiveThirtyEight and PEC —just as the modelers aggregate polls and discard outliers there.

Tomorrow, the results will either validate election forecasting models or show that this is a fledgling, imprecise science. Here at *Abstractions*, we’ll compare the detailed state predictions and try to determine which poll aggregation model was the best.

No matter which candidate wins, there is something to look forward to.

Some more thoughts:

The uncertainty we have to account for is the uncertainty of things "we don't know we don't know."

Unlike probability estimates done a month ago, we can reasonable assume that the chance of a huge event (like the Trump tape or the FBI director James Comey’s letter to Congress) that alters people’s minds in the few remaining hours is very small. So only two things could realistically move the needle: first, if all the polls are systematically wrong on account of the “shy voter” or Bradley effect; or second, that there is a late momentum swing of greater than 3 to 4 percentage points in the direction of Trump which the pre-election polls cannot capture, because they take a few days to reflect the electorate’s minds. This happened in the case of the Brexit vote, and in the 2012 election in favor of President Obama. There seems to be very little evidence of the shy Trump voter, and after the tightening that brought Clinton’s margin down from 5 to 6 percentage points to 2 to 3 points last week, the last-minute momentum seems to be towards Clinton in the latest polls which show a small uptick in her favor.

Is the uncertainty nil, or less than 1%. Probably not.

Is this uncertainty enough to reduce Clinton's chances to around 70%?

I don't think so.

Probability models can never truly be confirmed or refuted. Let's say Trump wins on a 269-269 tie (all of Fla, Nev, NC, Ohio, Iowa, NH go Trump). There is no way of knowing if that was a 1-in-100 event, a 1-in-6 event or a 1-in-3 event. All we would know is that it happened. Let's say Clinton wins 323-215 (Fla, NC, NH, Nevada go for Clinton). Was this the most likely scenario? Or was it a 1-in-4 event? When the probability models showed that the Cubs had only a 10% chance of coming back from a 3-1 deficit to win the World Series, the fact that they did come back does not mean the probability model was wrong.

Pierre-Paul, I placed a bet this summer on the Golden State Warriors *not* coming from down 3-1 (was given +400 odds, thought it was sure money). And then just a few weeks later, the Cavaliers came back from a 3-1 deficit to win the Finals (against the Warriors!). Who would have thought we'd see back-to-back series with such low probability of occurring. And of course, to top it off, we just saw the Cubs come back from 3-1 to finish off the 3rd major post-season comeback from teams down 3-1 this calendar year. Incredible.

This is all to say…who the hell knows?! lol

Neither is wrong, nor right. Both say anything could happen. 1 percent is not rare. None of us would get on an airplane with a 1 in 100 chance of crashing, unless the danger on the ground had an obviously higher risk than that.

Predictions like these without alternatives to compare them with are a parlor game.

The "science" of election polls and models is a sham. They all are based on assumptions that are not known, and the amount of data is relatively scarce. When you have polls within a single state that vary by large degrees, you are seeing that the different assumptions around the polling are what are driving the differences. Building a statistical model on junk data results in junk results. The most interesting recent data I am seeing is that exit polls show voters looking for a strong leader. That sounds like a Trump surge to me, but of course, the data point is so small that it is meaningless.

@Pierre-Paul,

You are right, probability models can never be confirmed or refuted *absolutely*, but they can be confirmed or refuted *for all practical purposes*, in two circumstances: First, if you have can do empirical tests millions of times — the probabilistic predictions of quantum mechanics have been confirmed to a dozen decimal places; and second, if the predictions are extremely strong and are disconfirmed. For example, if Sam Wang makes 3 consecutive predictions with 99% probability and they all come out wrong, the chances of that happening would be one in a million.

More practically, science views any result with a p<.05 (probability less than 1 in 20) as an indication that there is probably something going on. If Trump wins, Sam Wang would see a less than 1 in 100 event, and that would most likely cause him to tweak his model. For Nate Silver, three wrong 70% predictions in a row would yield a p of 0.027, and would very likely prompt a tweaking of the model.

Remember it is a national poll which distorts to begin with. (Just think of HRC winning California and NY by 10 million total votes but losing the rest of the country by 6 million.) It's really about the dozen or so toss up states. If HRC wins the total vote in those states by 2 million, she's in, same for Donald. I think that's a better predictor.

Evan, one thing that I love about sports stats is that they clearly demonstrate that improbable things happen all the time. In the 2016 MLB season there were 5610 HRs in 184,580 plate appearances – about 1-in-33. Even home run king Mark Trumbo only hit a HR in 1-in-14 plate appearances. It's highly improbable that a home run will be hit on any given plate appearance. Yet there is a very high probability (90%+) that a HR will be hit during the course of a given game. But since we have tons of baseball data, few doubt the baseball probability models.

I keep seeing this argument, and it strikes me that people don't know what they are talking about. You do not put confidence bands on your probability model. You put confidence bands on your aggregated polling numbers, and how those confidence bands overlap defines the probability. Yes, it's slightly more complex than that, because state by state votes are what matter. But that just means there's 50(ish) probabilities to model and integrate. It doesn't change the basic idea.

As you have pointed out, Nate and Sam start with basically the same aggregate numbers, Nate is simply putting much larger confidence bands (i.e., he's less confident in them) because he doesn't pretend to know what the abnormally large number of declared 3rd Party voters will do come election day. Therfore, there is more overlap of these confidence bands, which translates to a higher number of pro-Trump scenarios.

This seems like the most reasonable position to me. If everyone asked 3rd party/undecided voters which major party candidate they'd vote for if they were forced to choose, maybe we could make assumptions about how they'd break. But they generally don't. So that's a mighty big guess (a.k.a., a lot of uncertainty).

I'm more curious to know how these models compare against the other predictions that say Trump will win (and many of those predictions have never been wrong, including a professor who has been calling every single election correctly since the 1982).

Not even an US citizen, but, well… em… OOPS maybe the right word?

302/307 Clinton VS an actual maximum of 264 at the time I write this, this is more than 12% off.

No model saw that much uncertainty, including 538.

You wrote there was "no evidence of the uncertainty" that 538 was seeing, but they were far off too. Why? How? What did they all miss?

I would also like to differ about the "extremes" models you listed: PEC was one extreme sure. But 538 was not the other extreme. USC/LA was the true outlier for months. Why haven't you listed it as such?

Anyway, I am going to read your next article with great interest and I hope to get the great insights I usually get from your articles.

1. Models like Silver's are only as good as the data that goes into them. The polls didn't provide accurate numbers apparently. For example, Pennsylvania had been for Clinton for months. There's only so much one can do to correct for that.

2. It's hard to correct for sudden changes in support or what undecided or third-party supporters will do.

3. A 25% probability of winning means that on any given day, Trump would win 1/4 times. You only get to run that experiment once, but those are not bad odds. It's not hard to get two heads on two coin flips. And to be frank, Silver's predictions are better than the ones that said Trump had <5% chance of winning.

4. Modeling like this is still fairly new. We have no real idea how well it works.

@EcoGuy,

Yes, oops is right!

What happened was a Brexit-like phenomenon of systematic polling error, combined with the quirks of the US electoral vote system which favored Trump. I will discuss in more detail in a comment or another article.

USC/LA was an outlier poll. Here, we were discussing poll-aggregation models and not individual polls. However, the USC/LA poll was a National poll, which was pro-Trump but it was wrong too, just in the other direction. The last 3 USC/LA polls before the election, on the 6th, 7th and 8th November, showed Trump at +6%, +4% and +3% respectively. So the poll was showing momentum in Clinton's direction at the end. Also nationally, Trump actually lost the popular vote by 0.1%. So I don't think it was a particularly good poll. The IBD/TIPP poll was probably better. But all the dozens of other polls had an average Brexit-like pro Clinton error of 2-4%.

Sam Wong was trying to influence the election on NPR yesterday. I believe most of these pollsters were shilling for Clinton. They wanted to create a belief that Trump had no chance. Sam Wong is a hack.

Yeah, guys. His name is Sam Wong. Jason researched it, as he researches all things.

I suggest this link on work by Nicholas Taleb https://dl.dropboxusercontent.com/u/50282823/binary%20forecasting%20538.nb.pdf

Using his "trader approach" all the pollsters would be bankrupted by their cumulative errors over the last few months. Either these pollsters methods are invalid or they were fudging the numbers or both.

The reproducibility crisis in social science is a bigger scandal than just for political science. http://pss.sagepub.com/content/22/11/1359

We have forgotten this truism, "It's what you know that ain't so"

I think Trump had said lots of his supporters were hiding it during phone polls.

If that is true it could explain wrong predictions.

Garbage-in-garbage-out. In a race this close, even if a tiny percentage of people lie at polls it would not be surprising if all prediction models fail.

Sam Wang was wrong, historically wrong. He gave a greater than 99% chance of winning, and defended it, and poof! Like magic all that work and statistics are out the window and into the trash. if the predictions were 55, 60 or even 70% and he lost, it would be embarrassing but not like this. Greater than 99%? He is a brilliant man, but there must have been a personal slant to his model that he just did not want to believe there was a chance. He admitted he was off in his numbers, and that is where I believe his personal beliefs crept in. Good luck.

Nate Silver wasn't wrong. Given the available data, his model was spot on. He correctly factored in the known unknown (systematically faulty polls) that turned out to be the case. There was indeed "as much uncertainty as the FiveThirtyEight model suggests".

Silver was torn a new hole because of the mere suggestion that "Donald Trump is just a normal polling error from winning". Silver didn't have to predict Trump winning with probability > 50% to be more-or-less right. He gave Trump a 30% chance and this is virtually the same chance as a baseball player getting a hit or a NBA player missing a free throw. Happens all the time, and nobody is surprised. Silver's model was the least surprising of all the models. That is the standard we should have when grading these models.

Clearly, the underlying polling data was massively flawed. The real lesson here is that the polling data needs to be fixed. It's broke!

Pradeep

Lichtman's (the "professor") work sounds impressive on the surface, but would seem to be subject to drastic overfitting in that one can simply tune the answers to the keys to match the outcomes. I find it difficult to believe that he "answered" the key questions for all those elections and only then looked at the outcomes to evaluate his model. That would be superhuman constraint, and given the wide range of possibilities for such terms as "charismatic", nigh impossible for a mere mortal. Cheers.

PS: Mr. Moore is scarily spot-on.

Don't forget Art Laffer, who also predicted a Trump win using history as a guide:

http://www.weeklystandard.com/art-laffer-trump-should-win-easily/article/2003371

1. sampling error

2. confirmation bias – 1 in 3 odds of trump win is not "he loses" it's "why are the odds of such an outlier winning so high". NEED MORE DATA.

3. Professor Lichtman's questions plus polls yield a nice a Bayesian approach with prior experience helping to gauge predictive value of tests (polls). If his pre-test probability is high for a trump win, the polls have to have a very high specificity for a Clinton win to be predictive. Boom. Otherwise, need better data. Gotta love Bayes.

First, Allan Lichtman's Thirteen Keys model predicted Trump's win. The objection that the keys are not linear, assessed by accepted numerical standards, probabilistically quantized or explanatory of polls is the objection that the hammer of polls must be effective in driving screws, because, dammit, it's the only scientific tool we've got. This kind of thinking seems to me to be Popperism.

Second, my best guess as to systematic error in Silver's model is that his model systematically underestimated the level of economic distress, both now and in fears for the future. Judging from what I've seen on his site, his main economics person, Ben Casselman, is a very skilled orthodox economist. The thing there is that orthodox economists are not skilled at analyzing real world economics, as witnessed by the world economic crisis of 2008.

Third, the idea that Trump might lose the popular vote but win in the electoral college wasn't a difficult notion at all, given the election of 2000. I myself had made comment to this effect to the noted political theorist Corey Robin when he was insanely carrying on about the possibility of the Republican defeat leading to a realignment election, or at least to the Republicans' disappearance a la the Whigs.

@dmck,

I share your skepticism. I think Lichtman has identified some interesting factors that might influence elections but that's about it. It'll be interesting to see how well his predictions do in the future. Technically, he was wrong about Trump in this election because he counts his prediction of Al Gore's popular vote win as correct. Both of these cannot count as correct.

I agree that Michael Moore's intuition is far deeper.

Trafalgar Group got it right with their polling. As far as I know, they were the only polling group to call the midwest correctly

I find one issue with these election predictions is that they could have an influence on the outcome. If a potential voter sees these predictions, and sees that one candidate has a large "probability" of winning, then they may be less likely to vote (especially if they have to take off work, find babysitting, walk during a transit strike, etc.). Conversely, if they see that there is a close race, they may be more likely to vote. So there is feedback to making election predictions if they are made publicly available which could affect the outcome (and hence the probability). If this is indeed an effect, I'm not sure how it may have influenced this election.

What I'm wondering is whether such election predictions create feedback? A potential voter may decide not to vote if they see that the prediction is for a landslide for one candidate. Conversely, if the race is predicted to be close with high probability, they may be more likely to vote. If this is an effect, then one could argue that it could skew the vote (and hence skew the probability). One would have to decide which candidate had more marginal voters (voters who were less likely to vote for some reason, e.g. transportation or work issues or poverty).

Have you all considered that we cannot really predict with the tools we have? Gaussian models don't work out too well when people counting does not work like crops in an ANOVA.

Indeed, the system looks a lot like a classic nonlinear dynamical system. The electorate is more similar to a stable attractor than some nice distribution. Small changes in initial conditions will yield significant changes due to period doubling. If, in fact, what we have is a chaotic sample space then we cannot predict by definition.

The pollsters missed it because their frame of reference was too small. Remember a few years ago when the Tea Party sprang up? The media dismissed it. Nancy Pelosi said it was Astroturf and dismissed it. The tea party had a strong grass-roots organization that did not go away. As an astute businessman, Donald Trump jumped out in front to drive that train. The sentiments of the tea party stayed under the radar of an elite (in their own minds) media. The tea party was a diverse collection of like-minded individuals who came out to vote on November 8. What the media missed was the true significance of real grass roots movement of a few years ago. Donald Trump gave voice to those concerns which lacked the elegance of academia and claimed victory.

Trump was NOT the only person to have correctly predicted his win. Did you not notice Michael Moore who was emphatic and vocal that Trump would win? He even made a documentary about it for Pete's sake.

Thinking they had it all logically figured out based on historical trends and ignoring the voices who were warning that times had changed is why the pollsters got it wrong. Its the same reason you don't keep betting the same team every year, forever. No matter how many times in a row they've won, when things have changed enough, all bets are off. You lose big, along with the team if you fail to contiously be able to imagine a new outcome is possible and insist on clinging to your system which has become outdated.

How can you lead off an article with 'Trump is the only one who forecasted this race' but then end your article with the fact that two other people predicted the outcome. This makes your lead statement a lie, which you know is a lie because you have the proof that it's a lie in your summation. This is the kind of misleading, in fact totally erroneous comment, that sends people, and pollsters, in the totally wrong direction unless they read the entire article to find the truth and expose the lie. Why not say right off that only 3 people correctly predicted it? What's your purpose – generating hype over facts?

The repeated failure of pollsters to give reliable results contains an important lesson in a surprisingly little-known theorem in statistics:

mean-square error = variance + bias^2

Thus to control the total error in the prediction, one needs to minimise variance resulting from stochastic effects by using as large a sample as possible….but it's even more important to minimise bias, because it enters as a squared effect. This is hard, but clearly pollsters need to do far, far better, as we clearly live in times when voters seem especially prone to giving biased responses.

Responses to Readers

@Ian Agol,

Yes, I am sure these authoritative over-precise predictions do influence people’s behavior. That’s another reason why I think election forecasters should temper their predictions and give margins of error.

@wjr

Yes, there is definitely chaotic behavior involved, especially in the case of long-term forecasts, as I mentioned in my Insights column.

However, here we are only considering the final forecasts made just before the election. At this late stage, the very low chance of a race-deciding swing can be ignored. Therefore, you should, theoretically, be able to make fairly accurate forecasts using Gaussian models if you have accurate polling data. This was demonstrated in the 2012 election.

@Crystal,

Sorry about that! My statement was hyperbolic and meant to be a figure of speech. I don’t really consider Trump’s to be a true prediction because all candidates “predict” they are going to win. What I wanted to highlight was that people with extreme self-belief like Trump can generate their own Steve Jobs style “Reality Distortion Field” field and can turn what seems like a low probability possibility into a self-fulfilling prophecy.

I do respect Michael Moore’s perceptive prediction, and I’ve quoted and praised it several times in this blog and comments.

@Robert,

Your comment is spot on! Thanks.

@david carlson,

Thanks for your comment. You're right, polls done by Robert Cahaly of the Trafalgar Group in the week prior to the election accurately predicted Trump wins in several swing states.

Nate Silver had given this pollster a 'C' rating, and there were others who found implausible claims about the voting of ethnic groups in these polls. So this success may be genuine or just a coincidence.

What is interesting is that Cahaly looked for the "shy voter" effect, and claims to have detected it by asking people "Who do you think your neighbor will vote for?" Apparently people overwhelmingly answered "Trump" even when they said that they themselves did not support him. It remains to be seen how reliable this kind of indirect technique is and whether it gains acceptance among other pollsters.

Maybe publicly-available predictions can never be correct. Certainly, many people who liked Clinton (or more likely, didn't want Trump) stayed home because, knowing that Clinton was going to win, they had no reason to inconvenience themselves. I'll leave it to the professionals to decide how many, and whether it was enough to change the outcome.

But whatever we decide about this election, there is something self-referential about making a perfect prediction of an election and then announcing it to the electorate. This is the kind of thing that lets you build paradoxes and "Incompleteness Theorems" in mathematical logic.

If we decide that perfect predictions can affect the outcome (thereby turning themselves into imperfect predictions), my "vote" is that we stop trying to predict. Why not just let the people decide on election day?

This is a pitiful attempt to cover your tracks on calling out FiveThirtyEight as an 'outlier.' They're model was the only that took seriously the uncertainty in polling margins. Indeed, there seems to be plenty of evidence that there was 'as much uncertainty as the FiveThirtyEight model suggests.' If you were paying attention to their analysis leading up to last Tuesday, they made attempt after attempt to justify this uncertainty and in the face of b.s. criticism like this post, they stood behind their model's prediction of nearly a 1/3 chance of a Trump win. And they repeatedly described exactly the kind of systematic polling error it would (and did) take for this result. How you can claim that FiveThirtyEight got it wrong is puzzling.

I apologize for my first post being so transparently self-serving, but I predicted 301 EV for Trump compared to his actual total of 306. My model was simple but by no means a guess. It is easy to dismiss criticism of FiveThirtyEight because it did allow a 1/3 chance of Trump winning, but it is hard to dismiss predicting the EV within 5 votes.

http://foolishdoings.blogspot.com/2016/11/how-i-predicted-election_9.html

before you decide the polls were wrong you must decide if the official counts were correct, rather than just assuming they were not manipulated.

Charnin finds that in 300 exit polls from 1988 to 2008 there were 138 that deviated form the official counts by more than the standard error.

shockingly 132 of those were in favor of the republicans. this redshift is impossible by chance or rather has odds of one over (2^100).

if you think trump voters were shy and did not do exit polls this must be proven by finding shy trump voters that did not take exit polls. if you find none then it look like the tabulations were rigged. This consistent red shift must be explained. In Michigan the Alternet Election Integrity Team finds that there is no shift where paper ballots are used, and there is a red shift in red counties with electronic voting. please explain without adhoc dismissals.

I think the most obvious flaw was the incorrect sampling methods.

The "professor" you refer to only picked a Donald Trump victory if the 3rd-party candidate got over 5% of the vote. Since the 3rd-party candidate got less than 5%, he actually predicted a Clinton victory in the end.

People lied. I and my mother lied to every pollster. So did most of my Trump friends. We did not want the crap for it. So sidestepped. My nieces both voted for Trump but kept quite about it and had no bumper stickers. They are both RN's. I remember having Reagan stickers and never worrying. Now your car would be egged or keyed. So the "shy voter" effect, I suspect, is much higher than you suspect.