01 November 2010

Did Paul (RIP) Have Skill?

Paul the Octopus died last week.  For those unaware Paul, who spent his days at the Sea Life Centre in Oberhausen, Germany, gained fame by correctly picking the winners of all seven of Germany's World Cup matches last summer, as well as the final.  In this post I ask a few questions about the Paul phenomena and then distill some broader lessons for making sense of predictions.

First, did Paul have predictive skill? That is to ask, were his picks better than those that would have been made at the time by a naive forecasting methodology?

The answer is yes. Paul had skill.  The naive forecasting methodology that I employed last summer in my World Cup pool was the estimated market value of each team in the transfer market, under the assumption that the higher-valued squad wins.  For the 8 games that Paul picked in the World Cup the naive forecasting methodology would have gone 5-3, missing out on Germany's wins over Argentina and England, and Serbia's victory over Germany in the Group stage.  Paul's picks easily bested the skill threshold.

What can we learn from the Paul phenomena?

One lesson might be that his results indicate that some octopi have Delphic capabilities and can see the future.  Call me a cephalopod skeptic, but I don't think that Paul could actually pick the winners of World Cup matches.  If so, then skill, by itself, is not a sufficient basis for evaluating a forecast. Rather than being about Paul, his fame and predictive successes say something about us, and how we act in response to forecasts made in many practical venues such as finance and science.

Consider the math of Paul's feat.  If each team has an equal chance of winning a World Cup match, then the odds of picking 8 of 8 winners is 1 in 256 (2^8).  The are long odds to be sure, but not impossibly odd.  Once Paul attracted attention, he had already picked several games correctly, thus increasing the odds that he'd be viewed as an oracle (for instance, if you only heard of Paul before the World Cup final, he then had a 50% chance of "proving" his predictive capabilities to you.)

In fact, I would go so far as to argue that the odds of a Paul -- some predictive oracle -- emerging were in fact 100%.  Wikipedia's recounting illustrates why this is so:
Some other oracles did not fare so well in the World Cup. The animals at the Chemnitz Zoo in Germany were wrong on all of Germany's group-stage games, with Leon the porcupine picking Australia, Petty the pygmy hippopotamus spurning Serbia's apple-topped pile of hay, Jimmy the Peruvian guinea-pig and Anton the tamarin eating a raisin representing Ghana. Mani the Parakeet of Singapore,[55][56] Octopus Pauline of Holland,[57] Octopus Xiaoge of Qingdao China,[58] Chimpanzee Pino and Red River Hog Apelsin in Tallinn zoo Estonia[59] picked the Netherlands to win the final.[60] Crocodile Harry of Australia picked Spain to win.[61]
If the solution space is covered by a range of predictions, then it is a guarantee that one of those predictions (or sets of predictions) will prove correct.  We to selectively forget about the bad predictions (who among us knows of Petty the pygmy hippo?) and focus on the successes.  To put a judgment of skill into context, we have to know something about the universe of competing predictions and methods.

This sets up a rather up-is-down situation in which we allow reality to select our oracles, rather than our oracles selecting our futures.  We thus very easily risk being fooled by randomness into thinking that forecasters have enhanced chances for future skill based on past performance, when in fact those success may just be a combination of (a) coverage of the solution space by a range of forecasts, and (b) our selective memories and focus of attention on forecast successes.  Anyone who has invested in last year's hot mutual fund, will likely have learned this lesson.

Scholars who study judgment and decision making are well aware of these sorts of cognitive biases.  One is called the "hot hand fallacy" which is based on the assumption that a recent pattern will continue.  For instance, the assumption that because Paul got 8 of 8 right in the World Cup, that he'd have good chances to do well in predicting the next competition.  In my view the "hot hand fallacy" is very poorly named because it is based on studies of basketball players (who in making a streak of baskets show a "hot hand") who actually do have "hot hands" at times. They also get lucky.  So the phenomena actually combines true skill and illusory skill.

Another implication of the Paul phenomena is that in some instances, it may not be possible to rigorously evaluate a forecast methodology.  With an octopus, it is easy to assert that whatever methods he employed, they probably were not very rigorous.  But what if it had been JP Morgan or Goldman Sachs with the remarkable record based on their lengthy quantitative analyses?  Then it would be more difficult to assess whether their results were the consequence of a true forecasting ability, or just luck.

If you look around, on a daily basis you'll see all sorts of examples of the potential challenges presented by predictions in important settings.  Are the economists who anticipated the financial downturn in the past few years actually smarter than others?  Or were they just the few outliers in a fully covered distribution?  Or both? 

You can see a  discussion of these subjects in a bit more depth in the following articles and book chapters:

Pielke, Jr., R.A. (2009), United States hurricane landfalls and damages: Can one-to five-year predictions beat climatology?. Environmental Hazards 8 187-200, issn: 1747-7891, doi: 10.3763/ehaz.2009.0017

Pielke, R.A. Jr, (2003), The role of models in prediction for decision. Models in Ecosystem Science 111-135, Princeton University Press.

Pielke, Jr., R. A., D. Sarewitz, and R. Byerly, (2000), Decision making and the future of nature: Understanding, using, and producing predictions. Prediction: Decision Making and the Future of Nature 361-387, Island Press (D. Sarewitz, D., R. A. Pielke, Jr., and R. Byerly, editors).