“Didn’t the polls and statistical models get everything wrong in 2016? How can I trust them?”
Every political data analyst has been asked that same question over and over since the 2016 election. Prior to Election Day 2016, many people watched the coverage and concluded that the polls ruled out a Trump win and signaled clearly that Hillary Clinton would be president. That perception wasn’t correct (polls can be off in the same direction in multiple key states—models and analysts who priced in that correlated error were able to see the possibility of a Trump upset ahead of time), but it stuck with people and after Election Day they felt burned and lost faith in the polls.
But hopefully 2018 will help people trust survey research a little bit more. This year, the data did a good job of predicting the final election outcome.
Heading into the election, the consensus both in the world of data and the world of reporting was that (1) the most likely outcome in the House was a decent Democratic win with a blue landslide and a GOP hold both still possible and (2) that Republicans would likely hold the Senate and probably add to their majority.
These general predictions were right, or close to it. Democrats took the House and Republicans padded their margin in the Senate by a bit more than expected.
In fact, at the time this is being filed (and this is subject to change as more results come in) the best estimate for number of Democratic seats is in the high 220s or low 230s. Yesterday I used the data to estimate that they’d end up with 228 seats. Moreover, statistical forecast models projected that Democrats would end up with about 231 seats or 227 seats. The RCP average indicated that Democrats would win the overall House popular vote by seven points, and they seem poised to do get somewhere very close to that. That all communicates a solid level of accuracy.
On the Senate side, the TWS Forecast thought that 52 GOP seats was the likeliest outcome. We don’t know the final composition yet—Arizona and Montana are still out— but it looks like they will probably settle on 53 seats. That’s not perfect, but it’s well within the plausible range of outcomes the model laid out.
It’s harder to say how good the polls were on a race-by-race basis at this point. Votes are still being counted in some of the most important races, and the exact accuracy of the polls is something I intend to revisit in more detail soon. But I will note a few things about what happened on the Senate side.
In some cases, there were real polling errors. RCP’s poll average in Indiana (one of the best aggregates out there) put incumbent Democratic Sen. Donnelly ahead by about a point, and Mike Braun, the Republican, will likely win by a large margin (he’s ahead by about 9 points at the time this was filed). In Tennessee, the final average of polls was off by about 6 points. The polls also seem to have undershot Republican Josh Hawley, who looks headed for a mid-single-digit-win despite being virtually tied with Sen. Claire McCaskill heading into the election.
Some of the misses were smaller. In Florida, Democratic Sen. Bill Nelson led Republican Gov. Rick Scott by 2.4 points heading into the election, but Scott narrowly won. So the “call” was wrong there, but the polls were only off by a couple points. The polls appeared to be off by a similar margin in West Virginia, where Joe Manchin won by 3 points despite after leading in the polls by about 5 points. And Texas falls somewhere between this category and the last one—Beto O’Rourke will likely outperform his polls by a few points.
In other key races, the polls were better. In New Jersey, they suggested a roughly 10 point win and that appears to be about when scandal-ridden Democratic Sen. Bob Menendez got. In North Dakota, the polls look about right despite the fact that the race was under-polled in the final stretch. The Ohio Senate polls were off by a couple points, but they were trending in the right direction at the end. And the Wisconsin polls got the Vukmir-Baldwin result basically right. And the “sleeper” races stayed asleep. Smith won the Minnesota special by low double digits (what the polling suggested) and Bob Casey won re-election in Pennsylvania by about 13 when the polls showed him up by 14.
The data was also decent but not perfect in governors races. Most of the races that weren’t considered “toss-ups” ahead of time stayed off the map. In Georgia, the pre-election polls seemed basically right. The Florida polls were off by about four points—that’s a real but not unheard of error. In Wisconsin, the average of the last three polls showed a roughly 2.3 point Evers lead and the race is currently deadlocked. Kristi Noem, who led the South Dakota polls by 2, won by 5. Kate Braun beat her polls by a bit in Oregon, and Republican Mike DeWine beat his polls by quite a bit (won by about 3 after trailing by mid-single digits in the final polls) and Republican Kris Kobach underperformed his polls (he lost to Laura Kelly by 5 after leading by 1).
Once we get full results from every district and state we’ll be able to calculate the exact level of polling error this cycle. But these numbers look pretty good from a 30,000 foot view. Democrats did about as well as expected in the House. Republicans did better in the Senate than projected, but it was well within the range of plausible outcomes. Republicans did a bit better than some expected in governor races, but the result didn’t seem crazy (e.g. at the time of filing, Democrats had netted six seats and I guessed they would net six to eight on the morning of Election Day).
There are good reasons to be careful about how you use and examine the polls (e.g. low response rates, different methodological choices by various pollsters, complicated sources of error, correlated error, etc.). But this election should prove that they’re not garbage. In fact, they’re arguably the best tool we have for understanding public opinion.