## 16 August 2010

### Why Rare Events are a Certainty

[UPDATE 8/25: See this interview of NOAA's Marty Hoerling.]

[UPDATE 8/ 17:  See this interview of Peter Stott by Tom Yulsman.]

The Russian heat wave has finally broken, but people will be talking about it for a long while.  In this post I am going to discuss the statistics of rare events.

Consider the following statement:
Meteorologist Rob Carver, the Research and Development Scientist for Weather Underground, agrees. Using a statistical analysis of historical temperature records, Dr. Carver estimates that the likelihood of Moscow’s 100-degree record on July 29 is on the order of once per thousand years, or even less than once every 15,000 years — in other words, a vanishingly small probability.
How rare is a 1 in 15,000 year event?  It is not as rare as you might think, and here is why.

Imagine if you have a fair coin (50-50) and you flip it three times.  Suppose that you want to know what the chances are that you will observe one or more heads in that sequence.  The odds can be calculated by determining that the only sequence with no heads it tail-tail-tail, which will occur, on average only 1/8 of the time.  So the odds of observing at least one head in that series of three flips is 7/8 or 87.5%.  You can generalize this approach such that you can consider coins with different odds and for many flips.  The generalized formula is called the binomial probability distribution, and there are many useful calculators for the distribution on the web (e.g., here).  (Technical note: The binomial distribution can be approximated by other distributions, such as the Posisson distribution, which has been shown to well approximate the occurrence of certain weather extremes.)

We can use the binomial probability distribution to evaluate how rare the Russian heat wave was under a variety of assumptions.  But to do so we need two numbers.

One number that you need to know is the odds of an event.  In this example I'll use the 1 in 15,000 year event provided by Rob Carver.  Whether that number is accurate or not doesn't really matter for this example.  If you'd like to use another, you can, and below I'll show you how.

The second number that you need is the number of relevant events.  This is a bit tricky, and it is not at all clear to me what an "event" is according to Carver.  One possibility would be to use the number of meteorological stations in Russia or the number of grid points in a spatial reanalysis.  But this has some problems as the "event" that we are discussing is not just an extreme at a point, but a more systemic event associated with a persistent atmospheric pattern.  So we could ask how many high pressure systems typically occur in the northern hemisphere over a summer season.  I asked my father this question and he suggested perhaps 10-12 per season at the latitude of Moscow.  Again, if you don't like this number you can alter it to your liking.

So next, open up the Vassar binomial probability calculator.  We can use it to answer a few questions.

1) What are the odds of at least one 1 in 15,000 year heat wave event occurring over a 1,000 year period (picked because Russian meteorologists say that nothing of this magnitude has been observed over the past 1,000 years)?

To answer this enter the following into the calculator:

n = 10,000 (1,000 years * 10 high pressure systems per year)
k = 1 event
p = of probability 0.00006667 (that is, 1/15,000)

The odds of such an event occurring over 1,000 years are 48.7%! Given these statistics, it is not at all surprising to see one such event in Moscow over the past 1,000 years.

2) Part of the problem of course is that the Russian heat wave has already occurred, and this can create a form of hindsight bias in our consideration of rare events.  So looking forward, what are the odds of anbother such event occurring in Russia over the next decade, assuming these same odds (which, again may or may not be accurate).

To answer this enter the following into the calculator:

n = 100 (10 years * 10 high pressure systems per year)
k = 1 event
p = of probability 0.00006667 (that is, 1/15,000)

The answer is 0.7%, pretty small, but not zero.  If you were laying odds in order to bet on such an occurrence they would be about 143 to 1.  A longshot, but not impossible.

3) With weather there are all sorts of events that can be classified as extreme -- floods, hurricanes, drought, temperature, and so on.  I have no idea how many such weather "events" there might be in a year.  But for fun, lets assume that there are 1,000 weather "events" in a year.  We might ask, what is the probability of seeing a 1 in 15,000 year event (of any type) over the course of a year?

n = 1000 (1 year * 1,000 events)
k = 1 event
p = of probability 0.00006667 (that is, 1/15,000)

The odds of at least one 1 in 15,000 year event is 6.4% for one year.  How about over the next 10 years?  28.3%!!

If you want to test out different numbers than I used above, it is easy to do so.  But whatever numbers you use, you'll find that individual rare events are not so rare when considered over time and space.  This is one reason why the issue of attribution of causality is so frustratingly difficult (it is also difficult because of uncertainties in both "n" and "p" in the calculations above).

More specifically, there are three reasons why the question of whether extreme events are increasing due to specific causal factors is difficult to answer with certainty: (1) a short data record, (2) specific extremes occur infrequently and (3) a range of legitimate methodological approaches to the issue.  In such circumstances it would be easy to be fooled by randomness and black swans.   The good news is that the best policies in these conditions do not require certainty about causality, they instead emphasize robustness to uncertainty and ignorance.