These days, where there are elections there are also election forecasters. But political scientists have been doing this for a long while. Back in the early 1990s, when I was in graduate school in political science, I wrote a seminar paper on methodologies of election forecasting, which at that time I took a pretty dim view of (I still do!).
Where prediction is concerned, it is always worth evaluating our forecasts, lest we trick ourselves into thinking we know more than we actually do. So for fun, I am going to evaluate predictions of the upcoming UK elections.
Courtesy Will Jennings, a political scientist at the University of Southampton @drjennings, via Twitter, below is a summary of various predictions of the outcome of the upcoming election.
The 12 predictions span a huge range, +/-33 seats for the Conservatives and +/-25.5 for Labor. Six of the 12 forecast Labor holding more seats than the Conservatives, and 6 forecast less. With such a wide spread, it is mathematically safe to say that some predictions will be better than others.
I am going to evaluate 2 questions using this data after the results are in.
1. Which forecast showed the most skill?
2. Does the collection of forecasts demonstrate any skill?
To evaluate #1, I will use a naive baseline as the basis for calculating a simple skill score. The naive baseline I will use is just the composition of the current UK Parliament.
Party | Seats |
---|---|
Conservative | 302 |
Labour | 256 |
Liberal Democrat | 56 |
Democratic Unionist | 8 |
Scottish National | 6 |
Independent | 5 |
Sinn Fein | 5 |
Plaid Cymru | 3 |
Social Democratic & Labour Party | 3 |
UK Independence Party | 2 |
Alliance | 1 |
Green | 1 |
Respect | 1 |
Speaker | 1 |
Total number of seats | 650 |
Current working Government Majority | 73 |
Some methodological details:
- I am evaluating predictions of actual seats, not percentage gains or losses.
- I am counting all seats equally.
- I am not evaluating the prediction of specific seats, but overall parliamentary composition. Yes, this means that skill may occur for spurious reasons.
- Yes, there are other, likely "better," naive baselines that could be used (e.g., using recent opinion poll results). Such a choice will reflect upon absolute skill, but not relative skill.
Given that there is a wide spread among multiple forecasts, we have to very careful about committing the logical fallacy of using the election outcomes to select among forecasts. This is of course a very common problem in science (which i described in this paper in PDF in the context of hurricane forecasts in reinsurance applications). I have described this problem as the hot hand fallacy meets the guaranteed winner scam. It is easy to confuse luck with skill.
So I will also be evaluating the forecast ensemble. I will do this in 2 ways. I will evaluate the average among forecasts and I will evaluate the distribution of forecasts, both against the naive forecast as well as the election outcome. We can expect to be able to conclude very little about the skill of a forecasting method (as compared to a specific forecast) because we are looking at only one election. So my post-election analysis will necessarily include the empirical and the metaphysical. But we'll cross those bridges when we get there.
This exercise is mainly for fun, but because my new book has a chapter on prediction (that I am in the midst of completing) it is also a useful way for me to re-engage some of the broader literature and data in the context of a significant upcoming election.
Comments, suggestions most welcomed from professionals and amateurs alike!