Which party is going to win control of the Senate in the midterm elections? It’s a simple question. But also a difficult one. And right now, I’m in the middle of the process of building a model that will try to shed some light on it by calculating win probabilities for every Senate contest.
I’ve done a lot of data journalism (primarily for The Weekly Standard and RealClearPolitics), but I haven’t built this type of model before. So rather than building the model and unveiling it all at once, I’m going to write through the process, talking to you about the individual pieces of the model as I build them, the challenges inherent in modeling elections, the uncertainty in the data and anything else that stands out to me during the build process. Instead of having a secret sauce, I want to show you how the sausage is made. This way, after the election, it’ll be easier for us to understand what the model did right, and wrong.
* * *
For our first installment, I’m tackling basic, high-level questions. But before getting into the details, I want to acknowledge that this project begins with many enormous debts. FiveThirtyEight, the Upshot, and many others have built excellent models and have spent huge amounts of time and energy figuring out how to communicate well about them. Before them, political scientists have been studying elections and building predictive models for decades. I’ve already collected data from the New York Times Upshot, FiveThirtyEight, RealClearPolitics, the Huffington Post, the Argo Journal, Project Vote Smart, and others. Each of these outlets have made critical datasets public and I’m grateful to them.
So let’s start at the most basic level possible: How should we think about modeling?
At their best, models are empirical ways of thinking about the world and processing information. Even if you’ve never built a statistical model, you crete mental models that help you process the world. If you go outside, see a bunch of dark clouds and guess that there’s a high probability of rain—congratulations! You’re running a predictive model about weather in your head.
Models aren’t perfect—like every human endeavor, they have limitations. For instance, they generally assume that the same rules that governed the past will apply to the future. That’s often a better approach than guessing or constructing some ad hoc forecasting method, but there are times (e.g. the 2016 Republican presidential primary) where the old rules break and using an empirical model might lead to an incorrect predictions.
Also, models are what they eat. If you feed your model unreliable, incorrect, or otherwise bad data, then it may make incorrect projections even if the behavioral assumptions are solid.
Then there’s the basic truth that models are usually attempts to simplify the messiness of reality. Most people think of a coin toss has a 50-50 chance of landing on heads (or tails). But that’s not the capital-T Truth. Given the right equipment, a physicist might be able to look at air resistance, how fast the coin is rotating, the height from which it was flipped, and a whole gaggle of other factors to track the real, physical events and see the process of a specific coin toss in a specific environment in a decidedly more deterministic light. The 50-50 heads or tails model doesn’t capture all of those details. It’s just a highly-useful, highly-accurate simplification.
Perhaps most important is this: Models can only see what they’re designed to see. Imagine that every day I looked south out of my apartment window and decided, based on the amount of cloud cover, whether or not I should bring my umbrella with me that day. If a giant stormcloud was approaching my apartment from the north, I might miss it and get drenched. When you’re building a model it’s important to keep track of what you’re not looking for. You can’t take into account every possible influence (see our coin flip example), but you can try to be certain that whatever you’re overlooking, you’re doing so intentionally.
Next question: Should we even be building predictive models for politics?
Obviously I think the answer is yes, which is why I’m building one. But I took the question seriously and I want to explain my thinking.
It’s possible to argue that predictive models aren’t the best use of our time: They encourage people to watch politics as if it’s a sport where only the score matters. The data is better used to explain, rather than predict, election outcomes. Maybe it’s even possible that they could lower turnout. I take those concerns seriously and I’ll address them on a point-by-point basis in the future.
But for now, let me say this:
First, I hope that my model can be used as a helpful explanatory tool. My primary goal in building a model isn’t to perfectly nail the win probabilities of every candidate (though I’m obviously aiming for that). Instead, my primary goal is to create a useful anchor that can help us understand the true state of the horse race and serve as an entry point to explain why the race is where it is, what events do and don’t shape public opinion, how candidates do and don’t shape races, and more. I want the model to be another tool that complements and points towards other data sources (e.g. good reporting, rigorous quantitative and qualitative analyses of races, polling aggregates and other tools both at TWS and other outlets) in order to investigate important questions about the races in real time.
Second, I hope my model will help people understand probabilities. I want to design visuals and write analyses that don’t just tell people that a candidate has a 2-to-1 chance of winning a race—I’d like them to come away from my work with a better understanding of what two to one odds actually means.
Third, I want to preemptively de-magic the model. (That’s partially why I’m going to write this ongoing series about it.) I want to create visual tools that help readers (especially those who have never written a line of code or break out in cold sweat when they touch a math book) intuitively grasp what’s going inside the model. When my model’s projections change over time, I want readers to be able to easily understand why. If my model does something that other models don’t, I want them to be able to make educated guesses about what’s going on.
Finally, I want to produce reasonably accurate predictions that contribute real knowledge to the broader conversation about elections. My model will use a lot of the same raw materials as others—polls, past election results, information about candidates, possibly fundraising data, and more—so it’ll probably produce some similar results. But I’m hoping that in the process of building and maintaining the model, I can contribute to our collective knowledge on how elections work and how to predict them.
* * *
So what comes next?
Sometime in the new few weeks, I’m planning on publishing a couple of pieces on the “fundamentals” (basically everything outside head-to-head polls) of Senate elections. I’m not sure how much fundamentals will improve my forecasts once I add polls in (FiveThirtyEight and the Upshot both have fundamentals-based components, so there’s good reason to think it will help). But not every Senate race will have solid, reliable polls, so at the very least the fundamentals-based forecast will help fill in the gaps.
After that, I’ll continue to work on the polling based segment of the model. And when I’m happy with the calibration, predictive ability and general quality of the model, I’ll continue to design interactive displays that will help users understand what the predictions mean and what’s going on inside the model itself.
Once the model is up and running, I’ll continue to write about the inputs, the outputs and how the model is transforming one into another.
Talk to you soon.

