29 December 2009

The "Consistent With" Fallacy: How Not to Compare Predictions and Observations

Over at Real Climate there is a misleading post up about IPCC global temperature projections as compared with actual temperature observations, suggesting success where caution and uncertainty is a more warranted conclusion.

The scientists at Real Climate explain that to compare a prediction with observations one must assess whether the observations fall within a range defined as 95% of model realizations. In other words, if you run a model, or a set of models, 100 times, you would take the average of the 100 runs and plot the 95 individual runs closest to that average, and define that range as an "envelope" of projections. If observations fall within this envelope, they you would declare the projection to be a predictive success.

Imagine if a weather forecaster said that he ran a model 100 times and it showed that tomorrow's temperature was going to be between 25 and 75 degrees, with 95% confidence, with a "best estimate" of 50 degrees. If the temperature came in at 30 degrees you might compare it to the "best estimate" and say that it was a pretty poor forecast. If the weather forecast explained that the temperature was perfectly "consistent with" his forecast, you'd probably look for another forecaster. If the true uncertainty was actually between 25 and 75 degrees, then one might question the use of issuing a "best estimate."

Gavin Schmidt explains how this works in the context of the current "pause" (his words) in the increase in global average surface temperatures over the past 11 years (emphasis added):
The trend in the annual mean HadCRUT3v data from 1998-2009 (assuming the year-to-date is a good estimate of the eventual value) is 0.06+/-0.14 ºC/dec (note this is positive!). If you want a negative (albeit non-significant) trend, then you could pick 2002-2009 in the GISTEMP record which is -0.04+/-0.23 ºC/dec. The range of trends in the model simulations for these two time periods are [-0.08,0.51] and [-0.14, 0.55], and in each case there are multiple model runs that have a lower trend than observed (5 simulations in both cases). Thus ‘a model’ did show a trend consistent with the current ‘pause’. However, that these models showed it, is just coincidence and one shouldn’t assume that these models are better than the others. Had the real world ‘pause’ happened at another time, different models would have had the closest match.
Think about the logic of "consistent with" as used in this context. It means that the larger the model spread, the larger the envelope of projections, and the greater the chance that whatever is observed will in fact fall within that envelope. An alert reader points this out to Gavin in the comments:
I can claim I’m very accurate because my models predict a temperature between absolute zero and the surface temperature of the sun, but that error range is so large, it means I’m not really predicting anything.
Gavin says he agrees with this, which seems contrary to what he wrote in the post about 11-year trends. Elsewhere Gavin says such statistics are meaningful only for 15 years and longer. If so, then discussing them in terms of "consistency with" the model spread just illustrates how this methodology can retrieve a misleading signal from noise.

About 18 months ago I engaged in a series of exchanges with some in the climate modeling community on this same topic. The debate was frustrating because many of the climate scientists thought hat we were debating statistical methods, but from my perspective we were debating the methodology of forecast verification.

At that time I tried to illustrate the "consistent with" fallacy in the context of IPCC projections using the following graph. The blue curve shows a curve fit to 8-year surface temperature trends from 55 realizations from models used by IPCC (the fact that it was 8 years is irrelevant to this example). With the red curve I added 55 additional "realizations" produced from a random number generator. The blue dot shows the observations. Obviously, the observations are more "consistent with" the red curve than the blue curve. We can improve consistency by making worse predictions. There is obviously something wrong with this approach to comparing models and observations.

What should be done instead?

1. A specific prediction has to be identified when it is being made. A prediction in this case should be defined as the occurrence of some event in the future, that is to say, after the prediction is made. For the IPCC AR4 this might generously be defined as starting in 2001.

2. Pick a quantity to be forecast. This might be global average surface temperature as represented by GISS or CRU, the satellite lower tropospheric records, both or something else. But pick a quantity.

3. Decide in advance how you are going to define the uncertainty in your forecast. For instance, the IPCC presented an uncertainty range in its forecast in a manner differently than does Real Climate. Defining uncertainty is of critical importance.

For instance, eyeballing the Real Climate IPCC Figure one might be surprised to learn that had there been no temperature change from 1980 to 2010, this too would have been "consistent with" the model realization "envelope." While such uncertainty may in fact be an accurate representation of our knowledge of climate, it is certainly not how many climate scientists typically represent the certainty of their knowledge.

If any of 1, 2 or 3 above is allowed to vary and be selected in post-hoc fashion it sets the stage for selections of convenience that allow the evaluator to make choices that pretty much show whatever he wants to show.

4. A good place to start is simply with IPCC "best estimate" One can ask if observations fall above or below that value. Real Climate's post suggest that actual temperatures fall below that "best estimate."

5. You can then ask if falling below or above that value has any particular meaning with respect to the knowledge used to generate the forecast. To perform such an evaluation, you need a naive forecast, some baseline expectation against which you can compare your sophisticated forecast. In the case of global climate it might be a prediction of no temperature change or some linear fit to past trends. If the sophisticated method doesn't improve upon the naive baseline, you are not getting much value from that approach.

The bottom line is that with respect to #4 Real Climate shows that actual temperatures are running below a central estimate from the IPCC AR4 as well as below the various scenarios presented by Jim Hansen in 1988. What does this mean? Probably not much. But at the same time it should be obvious that this data should not be used as evidence to announce the successes of climate predictions, as Real Climate does: "the matches to observations are still pretty good, and we are getting to the point where a better winnowing of models dependent on their skill may soon be possible." Such over-hyping of the capabilities of climate science serves neither science nor policy particularly well. The reality is that while the human influence on the climate system is real and significant, predicting its effects for coming years and decades remains a speculative enterprise fraught with uncertainties and ignorance.


  1. "the matches to observations are still pretty good, and we are getting to the point where a better winnowing of models dependent on their skill may soon be possible."

    I get SO tired of seeing this statement, without seeing any changes. They have been making this statement for years. Sounds like HOPE without CHANGE to me.

  2. Roger,

    It has been very clear for years that Gavin and his friends desperately need some help from quality professionals in the areas of statistics, software, and forecasting. It's past time for amateur hour to be over. It should have been over when it was revealed that an error in Mann's home made code was the cause of his hockey stick (why write new code to do ordinary principal component analysis when commercially prepared standard statistics packages could do it easily? Other than they didn't produce hockey sticks from his data.)

    With trillions of dollars in costs and massive infringements on life, liberty and property at stake, why can't climate scientists get some professional help? Besides, if they got pros to do the stats, code and forecasting, they could start to focus on real climate science concerns -- like calibrating their thermometers and siting them properly. Or providing their data and methods so that others could check their work.

  3. Well, I think it's OK to say that the uncertainty of the predictions is large - whenever more accurate predictions are impossible, for whatever reason.

    But entertainingly enough, the completely constant temperatures - no warming for 30 years at all - is consistent with the IPCC predictions, given the indicated uncertainties, as well. See the horizontal line at


  4. Excellent analysis and comment, thanks. We are getting to the point where the observed evidence is now being treated as "noise" and that models predicting models is becoming reality. This fits post-modern politics where the rhetoric is reality, despite evidence showing that the actual data doesnt support the rhetoric.

  5. More models, with a wide range of results allow more accurate Texas Sharpshooting. Just wait for ten years of observations and find the results that most closely match the data. Paint the bulls eye right there and proclaim that the most accurate model has been verified.


    The problem is with the concept of “consistent with”. Climate science should devise experiments or observations that attempt to invalidate the hypothesis of global warming. At least that’s the way science worked before the atomic bombs went off.

    Hmmmm Nuclear weapons are consistent with changes in the scientific method. Or maybe we just shouldn’t allow the IPCC to frame the argument.

  6. And if the lack of warming falls below their 95% confidence interval, does that mean a 95% chance their climate models are all wrong?

  7. I'm not comfortable with the practice of taking the mean of several different model runs and calling that a "best estimate". That would be a valid approach if the model estimates were data containing random, independent errors. However, there is no reason to believe the model errors are either random or independent.

    In my own experience, output from simulations is best described as a probability distribution, i.e. 90% probability the outcome will be X or greater, 50% of Y or greater and 10% probability of Z or greater. If the actual outcome is less than X or more than Z, then you haven't characterized it properly.

  8. Thanks for the great post.

    As a comment the finance industry stock market funds look a little like the performance of models. Each company starts many funds. Over the years some under-perform the market and some over-perform. So the people in the under-performers are "rolled up" into the better performers. And the under-performing funds are wound up.

    So the finance company can say things like "the Platinum fund has produced above market returns of 25% year on year for 5 years!" - then usually the year after it doesn't do so well, but the sales pitch has brought in more punters.

    This system works even if - as many believe - the fund managers are no more effective than chimps throwing darts into the stock list at the back of the WSJ.

    Like models perhaps? As you point out, without some definite claims at the start as to what we will see over coming years we *may* be seeing the same cynical cherry picking that goes on with stock funds.

    The request - for us amateurs the models and their specific "predictions" are not accessible. I'm very interested in how the models do for the past with data other than 1 GMST value per year.

    I would love to see the comparison between model value and observed for cloud cover, humidity, temperatures in different parts of the globe, vertical profile data. Not sure whether you can bring that to pass or can point to work already done in this area.

  9. The greater problem to me is that the models and the generic predictions never seem to come with falsifiability tests. Having to wait 100 years to test a 100 year climate projection/prediction is hardly satisfactory. This allows them to keep moving the goalpost - and claiming "consistent with" - as data comes in year after year. We've already seen post hoc assertions that there could be as much as thirty years pause in global warming before "the trend" asserts itself. Since we're already ten years in to a flat trend, it's not surprising that some benighted soul would reach for thirty years - it's long enough to protect the guilty until we're all old - or dead.

    Someone please show me the papers that verified the global warming hypothesis against strong falsification. What great experiments proved the case? Every scientific field has them, and teaches them to grad students. Molecular biology has Crick and Watson for DNA, Crick and others for the Central Dogma and the nature of the genetic code, etc. The work was done with brilliant, elegant experiments that gave what was called proof at the time. Where is the proof for the CO2 global warming apocalypse hypothesis? I'd love to read these papers if they're really out there.

  10. It's interesting that the financial markets have some regulations (arguably, not enough), while climate modelers and projections don't seem to have an independent body overseeing them.

    It seems like the scientific communities standard operating procedures are simply not fitting something this big and this critical to the world.

  11. Modeler's need to defend their models. Their financing depends on them to be good enough, but never that good.

    It is nice to compare the prediction of a model. But if the model achieve the correct prediction without having the correct proportion of each GHGs and aerosols doesn't mean nothing. For example, to be considered correct both the prediction of the temp and of the the different forcing must be right. If not they are meaningless.

    One way to verify if the model are were good or not would be to rerun them with the observe evolution of GHGs and aerosols and then compare it to the different temp reconstruction.

    McIntyre inserted himself in the debate between Michaels and Hansen about scenario A,B,C. To use scenario A (BAU) was incorrect because the evolution of GHGs was closer to B. But the problem was that temp were closer to scenario C which guessed a steep decreased in GHGs concentration.

  12. The match of course depends on how accurate the Temperature graphs actually are, as the models have been “trained” to fit the curve. IF the temperature data is incorrect then the model fit means absolutely nothing – it just reflects a figment of a computer modellers imaginings.

    As Phil Jones has: If anything, I would like to see the climate change happen so the science could be proved right, regardless of the consequences."

  13. No.

    Your argument, Roger, is arguing based on "predictions". Surely you are aware of the difference and the implications, so why argue this way here?

    The IPCC and adaptive management are about "projections. Your policies don't work in an adaptive management framework. Sorry.

    And IIRC Tamino showed that the...ahem...projections of temp are within bounds. Can't find it in a quick scan of his site, but surely someone here remembers.



  14. If you want talk about misleading, what about the chart labeled "IPCC AR4 Individual Realizations". The models were adjusted for the 2007 report so they fit the past temperature record, but the chart goes back to 1980. It is hardly surprising that there is a good fit on temperature there. Someone who didn't know this would think the the climate models predicted the Mount Pinatubo eruptions based on this chart.

  15. I finally got fed up with these buffoons. With millions in grants, they don't want to know the "truth". Once upon a time, someone tried to stretch the sparse temperature data to cover the globe. Unfortunately, many scientists have wasted their time working with the "end product" from excessive smoothing & smearing of the data. The Fortran77 code they're still using at CRU doesn't look like it has been updated in years.

    Fortunately, GCHN has raw climate data online. Feel free to work it up yourselves, or drop by http://justdata.wordpress.com and post a request.

  16. "Obviously, the observations are more "consistent with" the red curve than the blue curve. We can improve consistency by making worse predictions."

    Certainly. There is nothing paradoxical about that.

    "There is obviously something wrong with this approach to comparing models and observations."

    Not really. When people claim that "the models fail to predict the present circumstance", the argument of consistency shows that this claim is not supported by any appropriate interpretation of the models.

    You yourself say "actual temperatures are running below a central estimate from the IPCC AR4 as well as below the various scenarios presented by Jim Hansen in 1988. What does this mean? Probably not much." This is exactly right, and the "consistency" claim is simply a way of formalizing that statement exact statement.

    "Probably it doesn't mean that much". That's all.

    This doesn't mean that such consistency is a useful design goal, of course. Your argument does show that such an approach would be meaningless. But nobody actually does that.

  17. We always hit our target in Texas.....

  18. Dano-Tamino's estimation of uncertainty intervals without treating the effect of volcanic eruptions like Pinatubo as exogeneous is very misleading. Why a trained statistician would present distortions of this kind is mystifying.

    Of course, he's developed a following of people who know no better-- but that's less mystifying.

  19. Simulations / predictions / realizations / projections / what-ifs / ensembles / EWAGs; what you label them doesn't make any difference at all.

    If they don't accurately reflect the measured data they are all useless.

    In the real world, this is called Validation. Absent Validation, the numbers are worthless and should be junked into the trash cans.

  20. Lucia -

    Of course no one can predict eruptions with certainty, and they are noise in a long-term trend regardless. Such arguments are not compelling.

    Nonetheless, what is happening here is a complete disregard for what is actually done on the ground. Of course, policy-types know about adaptive management and scenario analysis. One expects a certain audience to consistently turn their heads away from such analyses and instead find seemingly deliberate confusion of prediction and projection very compelling. This is not to say that informed decision-making finds such argumentation compelling.



  21. While IPCC models are not actually predictions, each of us is entitled to make our own specific predictions for temperatures in individual years. A little-noticed experiment was run on this at climateaudit in February 2008. My "prediction", based on my expectation of continued warming based on the models, beat out *every single climateaudit poster*. And my 2009 prediction looks like it'll be equally accurate. The expectations of cooling that many "skeptics" had in early 2008 have been rather clearly falsified. Details here.

    And 2010 looks like it'll be warmer than ever.

  22. Dano-
    Don't understand what you mean by
    "The IPCC and adaptive management are about "projections. Your policies don't work in an adaptive management framework. Sorry. "

    I don't know about the IPCC, but the term "adaptive management" as used in my field has a range of more to less disciplined forms, but does have an important basis in real-world monitoring, as ecosystems are thought to be too complex to predict in advance. So I don't understand how you can have adaptive management without depending on real world data following interventions. Maybe IPCC uses the term in a different way?

  23. Sharon, adaptive management uses data. That's how you adapt. And as you collect more data, your models get better and your projections more accurate. So rigidly using predictions is the last thing adaptive management does, and the enumerated policies in the original post utterly disregard this, throwing the issue into confusion and uncertainty. Shocking, I know, but that is the effective outcome of the post. I hope there is no one here who supports such outcomes and rather everyone wishes for more openness for our nominally democratic process to work effectively.



  24. Everyone -

    I was abused as a college student by a statistics professor. That is, he was very insistent that certain concepts were understood and explained correctly such that a good third of his tests were as much about vocabulary as math.

    I say "abused" in jest, while emphasizing that some of the fine points he emphasized were difficult to intuit and made for some ugly exam scores but I am better for having gone through it.

    I preface with that because his biggest peeve was "confidence intervals". He insisted that the term "confidence" did not belong there and that a confidence interval in no way should be interpreted that "we are 95% confident that the actual value (were 100% of the population sampled) lies in the interval range." If you offered such an explanation you got an "F".

    Instead, he explained that the confidence interval represents the range in which another prediction would fall if the same number of samples were independently taken from the same population. So we would say we were 95% confident that another such sampling would deliver a prediction between the two values establishing the range. A subtle but vital distinction.

    If the actual value were known (eg if the entire population were sampled) then there'd be no need for an estimate - so stating that one has a degree of confidence that the actual value lies between two is a bit silly. One can only describe what the statistician knows - the sampled data. This is especially true when the sampling itself affects that which is sampled - then the "actual" value can never be known - which drives Heisenberg's uncertainty principle.

    From that it becomes intuitive that a wide interval at 95% (relative to a smaller one) describes either a smaller sample size or a set of numbers with a wider range of values. The wider the interval, the less predictive the estimate.

    As I look at predictions and defenses generated or restated at RealClimate I can't help but wonder if the "confidence interval" is being sold a bit loosely. The wide gray bands become a catch all for any number of outcomes that they can use to say "See - we predicted that" when what the wide gray bands really mean is that the data aren't altogether tight enough to make a meaningful prediction.

    Am I wrong here? Was I misled 30 years ago by an abusive stats professor?

  25. adam--
    You were not mislead by your stats prof. What RC is showing is a distribution of temperatures from individual runs that one would expect if they reran their computer models. This was not what they described as the expected range for earth's surface temperatures when they wrote AR4-- and the had good reasons to not use RC-type as confidence intervals for the earth's surface temperature.

  26. Forgive me, Dano, I am still a bit befuddled.

    In my kind of adaptive management, suppose we wanted to increase the prevalence of species x. First we measure how prevalent it is today. We have some ideas of what might work (perhaps you could call them heuristic models), so we try a couple of different things in different parts of the range. We then measure to see if any of the treatments were effective.

    So are you saying the equivalent for climate is that we measure it today, then try some different policy options, and then measure it tomorrow to see if our policy options worked to change the climate? Maybe I’m missing something but I don’t see a clear role for models here at all.
    It seems to me that the point of models is to understand the climate, not to project potential futures with any degree of certainty. Yet many are, in fact, using these projections as the basis for policies.

    So what are you saying the models should be used for, and how does that relate to adaptive management?

  27. You may have run across the term "formative evaluation".

  28. Sharon, your models give you a trajectory for your scenario analysis. You implement policies (or refrain) according to the trajectory. You adjust according to the trajectory and emergent properties. You re-run models with new data to see if trajectory changed and if needed, adapt. Companies, militaries, municipalities do this today along a spectrum of formalization and complexity.




  29. I find Professor Pielke's post to be unambiguously clear and focused with precision.

    Happy New Year to everyone.

  30. Of course we know it's a prediction when the temperature goes up and a projection when it goes down. At least perhaps now there's a tiny, tacit admission that hindcasts are of very little use for validation because it's akin to peeking at the exam paper.

    Maybe though the data collectors should not also be in the same team as the modelers. Likely the "pause" would already have been adjusted out if it wasn't for Roy Spencer's stubborn independence. As it is the 30's bump is getting less prominent every year: In another 10 years it'll have disappeared altogether.

  31. Gosh Dano,

    Sounds like you'd be 100% behind this guy.


    Implemnet policy according to trajectory? It's in there.

    Adjust according to the trajectory? It's in there.

    Adapt? It's in there.

    What's not to like?

  32. http://www.realclimate.org/images/hansen09.jpg

    An intersting aspect of this Real Climate analysis:

    Notice how the regression lines don't run through the year 1984, even though that appears to be the "starting" year (in that the 3 scenarios, and the surface temperatures, all produce essentially the same number for 1984).

    That little trick of course makes the Scenario B look more reasonable compared to the actual temperatures because there is an offset to start with. If all the regressions started at the same number in 1984, the regression for Scenario B versus the actual surface temperatures would look much worse.

  33. There are several misconceptions here.

    Gavin's statement is a refutation of the idea that current temperatures lie outside the range of model predictions. It doesn't follow (as the commenter suggests) that Gavin would argue in favor of comparing forecasts on the basis of conformity to confidence bounds alone, so this is a strawdog argument.

    Specifically, the argument about, "observations are more 'consistent with' the red curve than the blue curve" is yours, not RC's. For a comparative observation it would be better to look at likelihoods, in which case the observation has a higher likelihood on the blue curve than the red curve. Adding more noise to the red curve would lower the likelihood, i.e. worsen the informativeness of consistency of the spurious red curve with the observation. I don't think Gavin would contest that idea.

    In #5, you argue for use of naive forecasts as a benchmark. Good idea. However, your linked post then misuses the naive forecast by violating #3 above (failing to specify uncertainty for the naive forecast or consider it in the comparison). It also specifies a trended naive forecast that's not physically realizable in the long run. Sensible naive forecasts need to be stationary in some sense, or at least have some explanation for nonstationarity, and are likely to have wide confidence bounds, and thus low likelihood for any particular outcome. This is primarily a property of the noise of the noise in the system rather than a problem with forecasts.

    Your idea in #3 that a no-temp-change trajectory would be consistent with the RC graph isn't necessarily supported by the graph. The graph shows marginal distributions of temp at each time, not joint distributions over time, and you'd have to know the latter to know if the statement were true.

    As you say, what we should conclude from this is probably 'not much.' In fact, less than it would appear from the RC figure, because the AR4 ensemble is an ensemble of best guesses, rather than a full representation of uncertainty, including climate sensitivity, forcings, etc. However, it seems premature to pass judgment on winnowing of models by skill as hype, when we don't know yet what Gavin means.

    If only your last sentence were taken to heart on both sides of the debate ...

  34. John M -

    The T3 tax is indeed an example of an adaptive tax, but it's a vastly suboptimal one, because it doesn't account for dynamics (delays in temperature response) and is subject to extreme noise (endogenous climate variability). It might better be called "reactive management."

    Sharon -

    From your example it sounds like you have the advantage of multiple realizations and a short time horizon, permitting a kind of plan-do-check-act management. The climate system is noisy, delays are long, and we only have one, which is why models need to be in the loop.

  35. The model realizations are like hugely distorted "beer goggles" thus making all of the girls look hot.

  36. 6 etc.

    Our ecosystems are also "noisy" if you mean they are full of complex interactions that are not well understood. I'm not sure that ecosystems are less complex than climate systems. We also have the problem with ecosystems that short term trends might not reflect some real long term trends of importance.

    People in the natural resource world live with that tension and mostly go with what we observe. Climate scientists seems to live with that ametension and mostly go with what they project, if I understand you correctly, it's a matter of emphasis.

  37. Dano (23) writes,

    "Sharon, adaptive management uses data. That's how you adapt. And as you collect more data, your models get better and your projections more accurate."

    Contrast that with Kevin Trenberth's description of what the IPCC does:

    "But they (the IPCC projections) do not consider many things like the recovery of the ozone layer, for instance, or observed trends in forcing agents. There is no estimate, even probabilistically, as to the likelihood of any emissions scenario and no best guess."


    So Kevin Trenberth says that the IPCC projections do NOT consider "observed trends in forcing agents."

    So much for collecting more data in order to make the projections more accurate.

  38. 6, etc. and Dano,

    I am becoming suspicious that there is some kind of disconnect here either among different members of the climate science community and IPCC (as Mark Bahner just pointed out), or between the combined climate science community as an entity and other science communities, or both which leads to a lot of confusion on this topic.

    It sounds as if
    1) we are supposed to use these models to get an idea of what might happen in the future
    2) But we won't know if they are accurate or not because there is too much noise in the system, so
    3) So we can't tell from comparing reality with projections how good they are.

    So we plan to adaptively manage to model projections and not to observed data.

    Is this what you are saying?

  39. Maybe someone should buy RC - 'Biostatistics The Bare Essentials' by Geoffrey Norman and David Streiner.

  40. Sharon:

    Is this what you are saying?

    It might be what you wanted me to say. I looked back over my text upthread and I see no indication that your enumeration was what I actually said.


    So Kevin Trenberth says that the IPCC projections do NOT consider "observed trends in forcing agents."

    Thank you for the lack of context in the snippet you chose to highlight (plus ça change, plus çe la meme chose!) The IPCC merely provides the analysis and context for decision-making.

    The larger issue is humanity's inability to address this and many other issues, not whether what scientists are working on now is compelling or informs decisions made by policy-makers. If informed policy-makers need additional analysis for their governance, then you can bet they will ask for it, highlighted by the very next sentence after your preferred quote: There is no estimate, even probabilistically, as to the likelihood of any emissions scenario and no best guess which, to me in effect says more about the process and communication of such than most of the ink spilled wailing and gnashing and rending of raiment over whatever the ink-spillers are wailing about this time.



  41. Sharon -

    It's not what I'm saying anyway.

    It's narrowly true that global mean temp anomalies are not very informative over short horizons, but there are many other sources of information available. One problem is that, while there's a record of past climate projections (not always what one would like for ex post evaluation, but at least something), there's almost nothing substantive on past skeptical, null, or naive forecasts or models. It's hard to compare something to nothing.

    The model-data dichotomy is basically false. Nearly all measurements involve some kind of implicit model of the underlying process. I don't see how one could adaptively manage to data alone, without some underlying model - mental or mathematical - of the process to be managed.

    What does adaptive management of climate to data mean to you?

    Steve (8) -

    The kind of multivariate information you seek is in reports like AchutaRao, K. M., C. Covey, C. Doutriaux, M. Fiorino, P. Gleckler, T. Phillips, K. Sperber, K. Taylor "An Appraisal of Coupled Climate Model Simulations", Edited by D. Bader, 2004. UCRL-TR-202550 - or try googling "CMIP". Data and model runs are available with a good interface at http://climexp.knmi.nl

    If climate modelers are running the "perfect prediction scam" there should be a paper trail in the literature - easy to demonstrate.

  42. "The T3 tax is indeed an example of an adaptive tax, but it's a vastly suboptimal one,..."

    Subopitmal compared to what other adaptive carbon dioxide tax?

  43. Dano,
    In your message 28..
    "You implement policies (or refrain) according to the trajectory. You adjust according to the trajectory and emergent properties. You re-run models with new data to see if trajectory changed and if needed, adapt."

    Now, I am thinking that you might be using different language for things than I would use and we are really the same thing. "you rerun models with new data to see if trajectory changed, and if needed, adapt" I think is the key statement.

    Let's take a company that has sales figures that are less than they would like. They know those figures. They then try an option to improve them. They measure and the sales remain the same or go down. Even in that relatively simple case,it could be due to other factors, such as a bad economy, rather than the failure of their intervention per se.

    They would then have a robust and rich discussion about their next steps. If you would take your statement "you rerun models with new data to see if trajectory changed, and if needed, adapt" and relate it to this example, I think we would finally reach across our language barriers.
    Thanks for your patience.

  44. In addition to markbahner's comment, how is the T3 tax any more "reactive" than what Dano has described?

    Indeed, McKitrick specifically argues that if the models are correct, his tax will be even more punitive to fossil fuels than the other "adaptive" policies.

  45. "The bottom line is that with respect .....below the various scenarios presented by Jim Hansen in 1988. What does this mean? Probably not much."

    It depends on what has happened vis-a-vis the forcing assumptions. Hansen's scenario 'A' included huge increases in CFC's, which leveled off in the mid 1990's and have begun to drop.


    Hansen argued as late as 2001 that addressing the green house gases other then CO2 would be 'cheaper,faster' and that CO2 would eventually address itself as a result of resource limitation.

    CFC and N20 emissions have been addressed. So Hansens's scenario 'A' business as usual scenario didn't occur.

    Scenario 'B' was 2.6 degrees/century..assuming 'business as usual' in regards to CO2 emissions which is based on the assumption that CO2 emissions would continue to grow.

  46. "Hansen argued as late as 2001 that addressing the green house gases other then CO2 would be 'cheaper,faster' and that CO2 would eventually address itself as a result of resource limitation."

    Actually he argued for soot reduction which was something you'd have thought everyone could agree on - as it's the easier one, being solid and was (according to him then and many more since) responsible for 50% of the warming.

    I'm hugely suspicious about GISS being the only record to have 2005 as the hottest year. Rather than models being adjusted by the data as Dano suggests; it is a bit too often that the data is adjusted to reflect the model. Santer et al 2008 was a classic in that regard - if you look with a skeptical eye. I'll just stick with % error validations rather than fool myself into thinking I'm correct and the data is wrong. I don't think it would stand up in court. And if I used the wide-error-bars-makes-it-better argument, they'd throw away the key.

  47. Hansen 2001


    [quote]We suggest equal emphasis on an alternative, more optimistic, scenario that emphasizes reduction of non-CO2 GHGs and black carbon during the next 50 years....

    By mid-century improved energy efficiency and advanced technologies, perhaps including hydrogen powered fuel cells, should allow policy options with reduced reliance on fossil fuels and, if necessary, CO2 sequestration.[/quote]

  48. Harry
    For your information CFC's were replaced by HFC's which are very powerful greenhouse gases with a tendency to leak far more than CFC's did. For that reason they are now being phased out and replaced by propane, iso-butane and CO2 as refrigerants - (ie less powerful greenhouse gases that don't leak). Ergo no net reduction in non-CO2 greenhouse gases was achieved. Has N2O been limited in any way? As I understand it that is very difficult to estimate but is highly unlikely. Hansen's scenario A was business as usual and business as usual is exactly what we had. Reality of course is even less than his C scenario. It's ok to overpredict though as long as you are man enough to acknowledge it. Nobody expects infallibility, only honesty.

  49. jgdes,

    NO2 has been addressed in vehicle emissions standards. The original standards regulated CO + HC. This actually made NO2 emissions worse. It's a simple carburetor adjustment to tilt a vehicle towards limited HC + CO emissions. Unfortunately, this increases NO2 emissions. Carburetors just aren't that precise a fuel metering device. The advent of EFI and it's popular acceptance made the precise metering necessary to produce a relatively 'clean burn' possible.

    In 2008 the manufacturing cost of EFI vs Carburetor became negligible. As a result the Chinese started manufacturing EFI everything right down to mopeds and adopted fairly strict vehicle emissions standards.

    Sometimes 'environmentally good' things happen because it makes economic sense.

  50. "NO2 has been addressed in vehicle emissions standards."

    NO2 is a tropospheric ozone ("smog") precursor, but is not significant in global warming.

    It's N2O that's a global warming gas. N2O is actually produced by the same catalytic converters that reduce NO2:


  51. Keep it simple.

    Did Hansen's 1988 scenarios A or B provide any more accurate of a prediction of temp rise than a simple linear trend estimate from prior data? Data to 1950 (prior to massive co2 emissions) or to 1970 (prior to AGW warming) or to 1988 (the year of Hansen's testimony)? NO.

    Using Hadcru data to 1950 the linear trend projection would have given a current anomoly of -0.04, data to 1970 would've given a linear projection of about 0, or data to 1988 provide an anomoly of + 0.02. Hansen's A & B scenarios look like +1.0 to + 1.5?

    We are at + 0.4 according to current data. Hansen's predictions currently look worse than a linear trend line projection from before the industrial revolution.

    Whatever your personal views, and having complete faith in the global temp records, Hansen's scenarios are farther from the mark than a sixth grader with a ruler. How much money has he been granted?

  52. Here's an update on Hansen '88, using GISS data and Hansen's own graph from 2005.


  53. Gavin said: that these models showed it, is just coincidence and one shouldn’t assume that these models are better than the others

    This is true; since none of the models are validated each will fit certain parts of the historical (or future) record more closely, and it's a crap shoot which one will fit which chunk of time best.

    Gavin said: we are getting to the point where a better winnowing of models dependent on their skill may soon be possible

    How does that square with the previous statement? There is some advantage to doing something like Bayesian Model Averaging, but it still doesn't protect you from the fundamental lack of validation.

    TSL (#7) said: there is no reason to believe the model errors are either random or independent

    In fact there is evidence to the contrary, model ensembles remove some of the error, but too fast, because there likely is a sytematic bias.

    6p00e54ed96fd48833 (#33) said: it seems premature to pass judgment on winnowing of models by skill as hype

    I collected some references about this a while back; past performance doesn't seem to be indicative of future skill, model averaging helps, but is no panacea, and it's hype until someone publishes some real results. Bragging about something in your own echo chamber on the intertubes is almost the definition of hype.

  54. jstults
    The trouble is that there is no sound theoretical justification for model ensembles. As combining in a frequentist fashion would have to be done with random inputs and many more runs then yes the combining of models must perforce be some sort of pseudo-Bayesian approach.

    However what some people seem to forget is that Bayesianism relies on solid evidence that the priors compiled by the "experts" have a track record of being substantially correct. Guesses just don't count. Notwithstanding that the experts in this case are either the computer codes (impossibly bizarre) or the selectors of the runs to go in the ensemble (utterly biased). It just doesn't work and should be abandoned forthwith! Every individual model should really be given a percentage error and the most useless ones should be dumped in the skip.

    And in fact every modeler knows it's very easy to correctly hindcast if you have enough loose parameters but yet still be completely wrong about the underlying theory. That's why only forecasts are useful for validation. Mind you I'd accept some spatial validations in part exchange but that's apparently even worse than the temporal results.

    Worse though, and something that even fewer people seem to realize, is that any sensitivity above 1K should be considered as decreasingly likely because all the extra warming above 1K comes from as yet unproven positive feedbacks (discussed currently on Pielke Snr's blog). ie it is not the neat Gaussian distribution of likelihood centred around 3K that is commonly promoted. Such an idea has even less theoretical justification.

    Somehow I'm reminded of the calculation of Black-Scholes options pricing by Finite Element computations. Beautiful in concept but utterly useless in reality thanks to the limitations of the underlying assumptions.

  55. jgdes (#54): "However what some people seem to forget is that Bayesianism relies on solid evidence that the priors compiled by the "experts" have a track record of being substantially correct."

    That's not really the case, you can have uninformative priors, or priors that reflect things like 'negative mass is nonphysical', though in a lot of cases that reduces things down to simple averaging.

    "And in fact every modeler knows it's very easy to correctly hindcast if you have enough loose parameters but yet still be completely wrong about the underlying theory."

    That's true, if you read those references I linked you'll see we don't have it 'completely wrong', sometimes the models are good (we picked the right physics!), but they are unpredictably bad too (crap, we left something important out).

    "Somehow I'm reminded of the calculation of Black-Scholes options pricing ...Beautiful in concept but utterly useless in reality..."

    Things are better in applied physics than in economics because even when we can't do controlled validation experiments we still have basic conservation laws, which are non-existent in markets. The thing that is similar, that one of those linked references points out, is the non-stationary characteristics of climate feedbacks, which is similar to the non-stationary statistics in a market.