27 March 2015

Evaluating Predictions of the UK General Election

The United Kingdom is going to have a general election on May 7th, just over five weeks from now. There are high stakes and a lot of uncertainties, probably more so than usual. Yesterday, David Cameron (Conservative and Prime Minister) and Ed Miliband (Labor and party leader), squared off in parallel interviews with interviewer Jeremy Paxman in a broadcast watched by 12% of the UK TV audience.

These days, where there are elections there are also election forecasters. But political scientists have been doing this for a long while. Back in the early 1990s, when I was in graduate school in political science, I wrote a seminar paper on methodologies of election forecasting, which at that time I took a pretty dim view of (I still do!).

Where prediction is concerned, it is always worth evaluating our forecasts, lest we trick ourselves into thinking we know more than we actually do. So for fun, I am going to evaluate predictions of the upcoming UK elections.

Courtesy Will Jennings, a political scientist at the University of Southampton @drjennings, via Twitter, below is a summary of various predictions of the outcome of the upcoming election.
The 12 predictions span a huge range, +/-33 seats for the Conservatives and +/-25.5 for Labor. Six of the 12 forecast Labor holding more seats than the Conservatives, and 6 forecast less. With such a wide spread, it is mathematically safe to say that some predictions will be better than others.

I am going to evaluate 2 questions using this data after the results are in.

1. Which forecast showed the most skill?
2. Does the collection of forecasts demonstrate any skill?

To evaluate #1, I will use a naive baseline as the basis for calculating a simple skill score. The naive baseline I will use is just the composition of the current UK Parliament.

Liberal Democrat56
Democratic Unionist8
Scottish National6
Sinn Fein5
Plaid Cymru3
Social Democratic & Labour Party3
UK Independence Party2
Total number of seats650
Current working Government Majority73

It is important to note that there are, effectively, a limitless number of ways that a forecast evaluation might be structured, with different results as a consequence. Always beware post hoc forecast evaluations. I have no horse in this race, so I am producing a very simple evaluation, based on methods I have used before on many occasions. These choices could of course be made differently.

Some methodological details:

  • I am evaluating predictions of actual seats, not percentage gains or losses.
  • I am counting all seats equally.
  • I am not evaluating the prediction of specific seats, but overall parliamentary composition. Yes, this means that skill may occur for spurious reasons.
  • Yes, there are other, likely "better," naive baselines that could be used (e.g., using recent opinion poll results). Such a choice will reflect upon absolute skill, but not relative skill.

Given that there is a wide spread among multiple forecasts, we have to very careful about committing the logical fallacy of using the election outcomes to select among forecasts. This is of course a very common problem in science (which i described in this paper in PDF in the context of hurricane forecasts in reinsurance applications). I have described this problem as the hot hand fallacy meets the guaranteed winner scam. It is easy to confuse luck with skill.

So I will also be evaluating the forecast ensemble. I will do this in 2 ways. I will evaluate the average among forecasts and I will evaluate the distribution of forecasts, both against the naive forecast as well as the election outcome. We can expect to be able to conclude very little about the skill of a forecasting method (as compared to a specific forecast) because we are looking at only one election. So my post-election analysis will necessarily include the empirical and the metaphysical. But we'll cross those bridges when we get there.

This exercise is mainly for fun, but because my new book has a chapter on prediction (that I am in the midst of completing) it is also a useful way for me to re-engage some of the broader literature and data in the context of a significant upcoming election.

Comments, suggestions most welcomed from professionals and amateurs alike!