Introducing the TWS SwingSeat Model

After a lot of number-crunching, the SwingSeat model is officially live, with probabilistic forecasts for each 2018 Senate race and chamber control.

For those who weren’t following me during the build, the basic goal is to use past and current data to empirically generate probabilistic forecasts for every Senate race and for chamber control. This forecast will be updated daily with new data, and it’ll be used to (1) explain what’s happening in the election and( 2) help demystify some of the statistical models used in politics (and other areas of life).

SwingSeat is a bit more complicated the analyses I usually produce, so it deserves some explanation. I’ll answer four questions:

What is the model?

What is it saying?

How does it work?

And what’s next?

I’d encourage you to at least read the first two sections (they’re short) since they’re the most helpful for understanding what’s happening and how I think people should understand my forecast.

But if you can’t wait and want to skip directly to the probabilities, the actual model page is over here.

So what is the model?

A model is, at its core, a way of thinking about a specific part of the world. It’s a system for taking in some type of information, processing it, and reaching a conclusion. People use models all the time. If you oversleep your alarm, look at the time and predict that you’re going to get to work late and miss a morning meeting, you’ve ran a mental model. You’ve taken in information (what time it is, your personal knowledge of traffic patterns) and predicted something (that you’re not going to get to work in time). Good job!

Similarly, this model takes in current information (mostly polls) to make probabilistic predictions about which candidate is going to win each Senate race in November and which party is going to control the upper chamber come January. SwingSeat eats data, processes it, and spits out projections. And I’ll be updating the predictions with new data every day from now until Election Day.

If you’re interested in details, I’ve written an explainer on statistical models here. But before moving on, I need to emphasize one point: the model isn’t capital-T Truth. It’s one way to look at some (important) parts of the data we’ll see heading into 2018. It’s not a replacement for traditional reporting, expert analysis, or polling (in fact, it’d be much less accurate without polling). Right now, I think SwingSeat is capturing some of the key features of the landscape, but the outputs don’t match my intuitions about every race. And that’s okay—the model is one tool out of many that I’ll be using to analyze races from now until November.

What’s it saying now?

Right now, there are two main takeaways from the model.

First, Democrats must pitch a near-perfect game to win the chamber—that’s because Republicans have a great map.

Most analysts (myself included) have argued that the 2018 Senate map is bad for Democrats. They’re defending 26 seats while the GOP is defending only nine, and five Democratic senators are defending very red states. Republicans, on the other hand, are only defending one state that Hillary Clinton won (Nevada) and only a couple of light-red states. Democrats are currently slightly ahead in the polls in most toss-up races (though the polling is sparse and dated in some cases). But if the GOP manages to win one or two races, the Democratic path to 51 seats becomes extremely tough.

And SwingSeat basically understands this. It assigns Democrats a win probability that’s noticeably north of 50 percent in key races such as Florida, Missouri, and Nevada, but gives the GOP a slightly better than 2-to-1 chance of holding the Senate. That’s partially because it knows that the GOP could, even in a strongly Democratic environment, win one or two competitive Democratic seats. Moreover, if the GOP improves their numbers across the board, they could net a few seats in an otherwise tough midterm. Obviously the Democrats are still in the game. There are some simulations where polls move towards the Dems—and events that have roughly 1-in-3 chance of happening happen all the time. But SwingSeat thinks that Republicans have a real advantage.

Which brings me to the second main takeaway: Republicans have a higher ceiling than Democrats.

Republicans have a lot of potential targets on this map. If GOP candidates generally improve their standing (or if the polls are biased against them) they could actually gain seats from where they are now. But similar across-the-board improvement for Democrats would yield lower returns. If Democrats hold all their seats and win Nevada and Arizona, they win a bare majority Senate. Beyond that, the landscape gets even tougher. Phil Bredesen is currently leading Republican Marsha Blackburn in Tennessee, but it’s a red state and Bredesen’s lead is far from safe. Beyond Tennessee, the map gets harder still. Ted Cruz has a roughly 4-to-1 chance of winning. And Republicans shouldn’t have much trouble holding off Democrats in deeply red states such as Utah, Wyoming, Nebraska, and Mississippi.

SwingSeat also generates state-by-state probability estimates, but I wouldn’t sweat those too much right now. There isn’t a ton of polling, and the model is doing its best to estimate win probabilities based on what little data we have. In some cases, the model is spitting out pretty good estimates. Ted Cruz does seem to have a big advantage in Texas, and Florida probably leans slightly towards Nelson. But I think SwingSeat is underestimating the probability of a Republican win in Arizona, Montana and West Virginia and overestimating the likelihood of a GOP victory in North Dakota and Indiana.

As we get closer to the election, these probabilities will get more accurate, and the model will become more confident.

How does this thing work, exactly?

Statistical models often sound fancy, but the basic idea behind SwingSeat is simple. It does four things: (1) Attempt to get an accurate picture of public opinion as it stands now; (2) Estimate how much that could change between now and Election Day; (3) Simulate election scenarios based on those estimates; and then (4) Calculate probabilities.

Here’s the detailed, step-by-step version:

First, the model divides the races into two groups—races where polls have been conducted and races where they haven’t—and performs a fundamentals-based projection in the races where no polling is available.

The idea here is simple. If you don’t have any polling in the race, but you still want to make a prediction, you should use the fundamentals—factors like results in the last presidential election, whether an incumbent is running, candidate experience, presidential approval, if it’s a midterm, candidate experience—to predict the results of the race. And that’s exactly what SwingSeat does: It uses those factors and data from 1992 to 2016 to project the results of races where no polling data is available.

This sort of projection typically isn’t very accurate, but that’s not a big problem: We have polls in almost every key Senate race, and most of the time when we don’t have polling (such as Wyoming, Nebraska, Vermont or New York) it’s because the race isn’t supposed to be competitive. So when it’s in use, the fundamentals model has a lot of cushion.

The second step of the model is to deal with races where polls are present. The basic idea here is simple: The model uses polling to get a sense of where public opinion is and, using past data, estimates how far the eventual result will be from the current numbers.

The technical details aren’t all that much more complicated, to be honest. I’ll get into more specifics in my running series on how to build a model (here are the first three parts), but basically I estimate where public opinion is by using a weighted average of the polls. Recent polls get more weight (exponential decay with a long half life early in the race and a short half life at the end), and polls that survey more people get more weight. I also weight the polls based on their past accuracy relative to each other (more details to come on this, but I use a regression similar to FiveThirtyEight’s simple plus-minus along with a conservative standard for when and how much to weight polls up or down). But you really don’t need to worry about the latter two factors much—most of the weighting in SwingSeat comes simply from time.

I try to include as many reputable polls as possible in the average—basically if a poll shows up on RealClearPolitics or would have showed up in the Huffington Post Pollster, it’ll almost assuredly be included in SwingSeat. If there’s any suspicion that a poll may be fake, I won’t put it in the model. I tend to include polls from partisan pollsters, but I don’t include polls done on behalf of a specific group or campaign. These simple rules should be enough to guide most decisions, but I’ll continue to formalize this standard as time goes on and explain any decisions that fall into a gray area.

I also only use head-to-head polls that feature the most probable general election matchups. Most of the upcoming Senate primaries either aren’t very suspenseful (we have a good idea of who will win) or aren’t very consequential (subbing in one candidate for the other doesn’t make a big difference). The only big exception to this rule is Arizona, where a three-way primary between Martha McSally, Kelli Ward, and Joe Arpaio is still undecided. At this point, my forecast assumes that McSally will win. That’s a shortcoming of the model that I’m open to fixing. I just haven’t come up with a Senate primary prediction model that I’m confident in yet.

So once I’ve calculated a weighted average of the polls in the most likely head-to-head matchup, I estimate how far off the average is from the eventual result. I get this estimate based on a regression (very similar to the what other models do) that uses the number of days until the election, the number of recent polls, the level of agreement between those polls, the number of undecided and third party voters, the margin, and how recently the latest polls were conducted to decide how much I expect the poll average to change.

These factors all point in the direction you’d expect—the average gets more accurate as you approach Election Day. More polls (and more agreement between polls) translates into a lower expected error. Error is higher in races where one candidate is leading by a wide margin, and there’s more error when there are more undecided voters. And if it’s been a long time since we’ve had a poll in a given race, the model becomes less confident about its estimates.

Together, the poll average and the error give us the two main ingredients of the model—an estimate of where we are and an estimate of how different Election Day might be from today. Which brings us to the third step: simulation.

The idea here is to use what we know about the current state of the race to play out different Election Day scenarios.

Each simulation involves two steps. First, the model randomly picks a “correlation”—that is, it decides if one party is going to, across a wide range of races, do better on Election Day than they’re doing now. (Or if each race is going to mostly do its own thing.)

This is a key part of the model. We’ve seen elections where the polls generally moved together in one direction (or where the polls have systematically underestimated one party). If you don’t try to account for that possibility, your model (mental or statistical) can become too certain of its overall seat projections. That’s part of why some people didn’t take Trump seriously enough in the 2016 general election—they didn’t realize that the polls (which generally favored Hillary Clinton) could be systematically underestimating Trump’s support. This step is an attempt to take that sort of possibility into account.

After picking a correlation, SwingSeat uses the estimated error to guess what the result might be in each race.

(For the statistically inclined: I used a random correlation and predicted error to construct a covariance matrix and then used a multivariate normal distribution to simulate outcomes. The normal distribution isn’t perfect here, but it works reasonably well on past results, and I’m open to tweaking it as the election goes on.)

SwingSeat then repeats this two-step process (picking a correlation and then simulating results) thousands of times, providing us with a huge number of possible scenarios.

And that’s most of the model. I use these simulations to generate the probabilities you see in the maps, charts, and text of the model page.

A few quick housekeeping notes before closing this section:

I manually assign Democrats a 100 percent probability of winning the California Senate race because two Democrats advanced in the top-two primary.

And I used an ad-hoc method to deal with the Mississippi special Senate election (Thad Cochran left the Senate due to health issues, triggering this special election). In that race, all the candidates (regardless of their party) will be running in a non-partisan primary scheduled for Election Day. If no candidate gets above 50 percent of the vote, the top two candidates head to a December run-off. My model isn’t well-suited to predict this sort of race, so I combined Cook, Sabato, and Inside Elections race ratings with a fundamentals forecast (done as if one Republican and Democrat were running in November) to get an estimate. Cook and Sabato rate the race as “Likely Republican” and Inside Elections has it at “Solid Republican.” So I tallied up how often Cook, Sabato, and Inside Elections called the winner correctly in races where there was a “Likely” or “Solid” favorite (these handicappers rate races as Toss-up, Likely Republican, Leans Democratic, etc.) on every day, and averaged those probabilities with my fundamentals-based forecast. The result intuitively makes sense—Republicans are strongly favored there with an outside chance of a Democratic win.

One final note: Mississippi is not correlated with other races. That is, if Republicans had a great or terrible year overall in one simulation, they were no more or less likely to win in Mississippi. It’s not exactly clear to me how one general election result or another would influence a potential run-off in Mississippi, so I made the error un-correlated.

So what’s next for the model?

If you made it through the methodology, congratulations! I know it was a slog. I only have a few more housekeeping notes.

There are going to be mistakes in the data and code. I’m just one man and slip-ups are inevitable. If any of these mistakes make a big difference, I’ll make sure that’s known here or elsewhere on the site. But if I make a fix and it’s not a big deal (conceptually or in terms of the results) I won’t announce the change with a formal post.

I’m also going to keep experimenting with the model, both in terms of visuals and the math.

The model currently has the basics visualized—control probability, state-by-state probability maps, and a time series showing the projection and a range of outcomes. But I’ll be adding to that display as time goes on. If you (as a layman or a professional) have ideas about what you want to see displayed, let me know. Part of the fun of modeling is experimenting with different ways to display the reams of data you end up generating.

I’m also going to keep experimenting with the math. This is my first large-scale, poll-based election model, and I know it’s not perfect. I’m open to making changes that clearly improve the predictive power of the model, but I’m going to be conservative about making those sorts of changes. It’s easy to trick yourself into putting a thumb on the scale by “fine-tuning” your model, and I want to avoid that. But I’m not opposed to careful changes, and I’ll announce any major changes with an explanatory post.

So go have a look. SwingSeat is here.

Related Content