12 August 2009

We Lost the Original Data

Steve McIntyre, of ClimateAudit, is a determined individual. While this may be no fun for those who fall under his focus and happen to have something to hide, more sunlight on climate science cannot be a bad thing.

Lately Steve has been spearheading an effort to get the Climatic Research Unit (CRU) at the University of East Anglia to release the data that underlie its analysis of global temperature trends. Such a request should not at all be controversial. Indeed the atmospheric sciences community went to great lengths in the 1990s to ensure that such data would be openly available for research purposes, culminating in World Meteorological Organization (WMO) Resolution 40 on the international exchange of meteorological and related data and products. The Resolution states:
Members should provide to the research and education communities, for their non-commercial activities, free and unrestricted access to all data and products exchanged under the auspices of WMO . . .
WMO recognized the need to protect commercial activities, but placed no restrictions on the exchange of climate information described as follows:
All reports from the network of stations recommended by the regional associations as necessary to provide a good representation of climate . . .
Obviously, the ability to do good research depends upon good data with known provenance. At the time WMO Resolution 40 was widely hailed in the atmospheric sciences community as a major step forward in data sharing and availability in support of both operations and research.

Thus it is with some surprise to observe CRU going through bizarre contortions to avoid releasing its climate data to Steve McIntyre. They first told him that he couldn't have it because he was not an academic. I found this to be a petty reason for keeping data out of the hands of someone who clearly wants to examine it for scholarly purposes. So, wanting to test this theory I asked CRU for the data myself, being a "real" academic. I received a letter back from CRU stating that I couldn't have the data because "we do not hold the requested information."

I found that odd. How can they not hold the data when they are showing graphs of global temperatures on their webpage? However, it turns out that CRU has in response to requests for its data put up a new webpage with the following remarkable admission (emphasis added):
We are not in a position to supply data for a particular country not covered by the example agreements referred to earlier, as we have never had sufficient resources to keep track of the exact source of each individual monthly value. Since the 1980s, we have merged the data we have received into existing series or begun new ones, so it is impossible to say if all stations within a particular country or if all of an individual record should be freely available. Data storage availability in the 1980s meant that we were not able to keep the multiple sources for some sites, only the station series after adjustment for homogeneity issues. We, therefore, do not hold the original raw data but only the value-added (i.e. quality controlled and homogenized) data.
Say what?! CRU has lost track of the original data that it uses to create its global temperature record!? Can this be serious? So not only is it now impossible to replicate or reevaluate homogeneity adjustments made in the past -- which might be important to do as new information is learned about the spatial representativeness of siting, land use effects, and so on -- but it is now also impossible to create a new temperature index from scratch. CRU is basically saying, "trust us." So much for settling questions and resolving debates with empirical information (i.e., science).

To be absolutely clear, none of what I write here should be taken as implying that actions to decarbonize the global economy or improve adaptation do not make sense -- they do. However, just because climate change is important and because there are opponents to action that will seize upon whatever they can to make their arguments, does not justify overlooking or defending this degree of scientific sloppiness and ineptitude. Implementing successful climate policy will have to overcome the missteps of the climate science community, and this is a big one.

31 comments:

  1. One option could of course be that CRU's explanation is true, namely that they didn't have the required resources in the 1980s to store the data in its original form. Sloppy as the decision to discard the raw data may have been, it also reflects research funders' lack of priority to climate data analysis and storage in those days. It's regrettable, but you may wish to consider the possibility that there is no malintent here.

    Now that the need for climate data analysis (and re-analysis) is more widely recognised and supported, I hope CRU can work with WMO and others to restore the original database. That is, if the original data is still available somewhere.

    ReplyDelete
  2. -1-Richard

    Thanks, but my comment has nothing to do with "intent". The boy whose dog ate his homework probably had no malintent either.

    ReplyDelete
  3. Very true, Roger. Perhaps my "you" in the last sentence of my comment's first paragraph is directed less to you and more to those who may be inclined to see CRU's decision in the 1980s as part of a big conspiracy.

    I think it's good that you raise the issue though, because I do agree with you that science can't be hurt by openness. Back in the 1980s computers' storage capacity and scientific funding priorities were different. I don't know if CRU had any alternatives to discarding the raw data - I would like to think they looked for them. Be as it may, I hope CRU won't be defensive about a decision that may well be perfectly justifiable given the circumstances at the time.

    ReplyDelete
  4. rjtklein--
    I think very few people think CRU's decision to not store data in the 80s was a conspiracy.
    But of course they had storage options back in the 80s. Optoins included:
    1) Typing the raw data into the appendix of a "report". (Ph.D. Theses in engineering used to do this back in the 80s.)
    2) Storing it on cards.
    3) Placing all data in manila folders, putting it in a box and sending it to some storage archive. (DOE nathional labs do this with some data, and have since WWII)
    4) Placing it on magnetic tape and storing that.
    5) Placing it on floppy disks and storing those.

    Storage methods have changed, but people have always been able to store data if they value storing the data. They can also generally get funding for this activity if they value it as much as other parts of the project.

    I'll readily admit to the fact that I personally do not always value archiving data as much as a librarian might. Universities often don't value archiving data as much as weapons labs do. Maybe CRU never valued archiving data and did not request or reserve funding for this part of the task. Maybe they preferred to spend money on page charges for peer reviewed journal articles, sending staff to international conferences, or just adding more value to the product. I can't say.

    But even if that is true, that wouldn't mean they had no alternative but to discard the data.

    ReplyDelete
  5. Roger sez:

    “Implementing successful climate policy will have to overcome the missteps of the climate science community, and this is a big one.”

    I am supposed to believe that a government that cannot even come close to competently managing the public forests will competently manage THE CLIMATE?

    Are you freaking kidding me? Come on!

    ReplyDelete
  6. Assuming CRU's explanation is true, is it reasonable to say, with at least one alternative out there in GISS, that this is cause enough for climate researchers to stop using CRU?

    Even if it is, I could see it being very difficult. If my understanding's right that the CRU reconstructions were used quite a bit in past studies, I feel like there could be certain cases where new work building on the past studies would be difficult or impossible without also using CRU...

    ReplyDelete
  7. CRU seems to have stored their raw temperature data in the same banker box with their credibility. Their work product cannot be used to justify any public policy. No speculation about motives can affect the value of their product to the public.

    Or to place the quality of science in context: The Chinese have saved sunspot data for two millennia. Through revolutions, floods, and wars, and they can still retrieve the data with provenance assured.

    ReplyDelete
  8. Roger,

    You write "CRU is basically saying, "trust us."

    I am not sure I agree with this 100%. Their work has been peer-reviewed in several different iterations has it not? While I understand that peer-review is not infallable, to me, its intent is to greatly lessen the reader's feeling of having to "trust" the authors. Yes, there is still some of that, but when I read a peer-reviewed paper, especially in fields other than my own, by first assumption is not one of "well, I'll just have to trust the authors on this one," because I know (or "trust") that folks with far more knowledge than me in the field have read it critically, and agreed that it content is worthy of being considered by the general scientific community.

    That is the whole purpose of peer-review, right?

    I have plenty (probably the vast majority) of papers which I couldn't provide the raw data for if asked, much less many of my analysis routines. But, in my published papers, I include Data, Methods, and Results sections where I describe my work. And the fact that it is peer-reviewed means that someone, somewhere, with some qualifications in the field thought it was reasonable. So readers of my papers don’t simply have to “trust me” even if I can’t provide the data and/or the routines at some later date. If what I have done is wrong, it’ll be replaced by new and improved science, either by me or others—that is one of the primary ways that science moves forward—historically with our without the co-operation of all interested parties.

    Perhaps my way of thinking about this is old-school and a new era is upon us (one which I have yet to fully embrace and not sure I ever will, especially the latter) in which everyone has to use the same data archiving techniques and the same analytical tools and reviewers will be required to precisely replicate the results before they are published—if not the reviewers themselves, perhaps a staff of analysts employed by the journals. But even if this will someday be the case, I don't see how it should be retroactively applicable. Will all the journals be wiped clean of all past material, only to have it reinstated once each and every article has been replicated?

    I don’t agree with Lucia that the only thing left to do in this case is “disregard the data” just because at some step along the way something now is missing.

    Rather than challenging the CRU historical temperature dataset based upon its (potential) warts (some of which it seems now may be impossible for anyone to completely identify, much less rectify, if they exist at all), isn't it far more constructive to go forward into this new era of openness and challenge the CRU history with a new and improved analysis showing how it should be done, rather than how someone wished it were done 20 years ago?

    So if the CRU is guilty of requiring people to just “trust them” them, I would imagine that so too are 90% or more of all the authors ever published in the scientific literature.


    -Chip

    ReplyDelete
  9. Is it also possible that they never documented the adjustments or that this documentation is also lost such that the existing data sets can not be de-adjusted?

    Roger, (and others), this is a very difficult field you've involved yourself in

    ReplyDelete
  10. -8-Chip

    You write:

    "If what I have done is wrong, it’ll be replaced by new and improved science, either by me or others—that is one of the primary ways that science moves forward—historically with our without the co-operation of all interested parties."

    But (as you later allude to) this is exactly the problem. With the data missing the science cannot go forward. It must go back to the beginning.

    ReplyDelete
  11. Back in the day I worked for NCAR as a Fortran programmer responsible for updating data archives from satellite missions (e.g., TOMS) with massive amounts of data, much of which was never even analyzed.

    My job was to take data stored on an old technology (reel-to-reel) tapes and convert it to a readable format on the new technology (either new tapes or cartridge).

    I did this work on the NCAR supercomputers. It was not simple and it came at a cost. So I fully understand the challenges. However, the analyses being discussed here are at the center of major policy debates. If nothing else this experience should serve as a lesson going forward about the importance of ensuring the availability of policy-relevant data.

    Chalk up one more reason for data to be made available at the time of publication.

    ReplyDelete
  12. I don’t agree with Lucia that the only thing left to do in this case is “disregard the data” just because at some step along the way something now is missing.
    Good. Because I didn't say that; kmye did. I disagree with kmye on that point. I think we can perfectly well continue to assume CRU it valid, and compare thing to CRU. At the same time, I think it's best for the basis for CRU to be as transparent as possible.

    I only observe that it's unlikely that CRU's only option was to discard data. They had other options but didn't exercise them. That's a common enough behavior, but it's not quite the same as having been compelled to lose the data. It also doesn't invalidate CRU.

    isn't it far more constructive to go forward into this new era of openness and challenge the CRU history with a new and improved analysis showing how it should be done, rather than how someone wished it were done 20 years ago?
    The data are necessary to do such an analysis and the data are being requested.

    Even though the ideal thing is for someone to improve the analysis, it's still a good for someone to get the original data, examine for lapses. In the process they can also see whether it might be supplemented by other data. Therefor, it would still be useful if the data underlying CRU were available.

    This doesn't mean we ignore CRU in the meantime. It only means that it's a bit disappointing the raw data seems to have been misplaced.

    Of course, all this assumes the raw data have been misplaced. The recent Nature article suggest the data have not been misplaced and Jones is going to post the data requested under FOI within a few months.

    ReplyDelete
  13. "new and improved analysis" assumes the ability to replicate and extend prior work.

    Without the original data, CRU's prior work can be neither replicated nor extended.

    Which is why their work could no longer be considered scientific if Jones cannot produce the data.

    I sincerely hope he will succeed, when all is said and done! How embarrassing.

    ReplyDelete
  14. But Roger, the Jones at al. at the CRU didn't take the data measurements themselves. So if the "data are missing" I don't see where the blame lies with them. The fault would lie with the responsible data archival service (I assume that the CRU does not serve in that role for data collected across the world). Perhaps, some people feel that Jones et al. should have done a better job archiving the data that they used, but they didn't, so, so be it. They did document their work in various publications which up until recently (apparently), seemed to be a perfectly acceptable (and normal) thing to do.

    I have never taken a single measurement that I ever used in writing a paper and have no idea whether all of the original data currently exist somewhere nor have I gone to efforts to save what I used. I documented what I did at the time I did it. If has subsequently been established that the data I used were wrong, then citing my paper in support of something is probably not a good idea. But if the data has been lost, then my results should stand until proven otherwise. They shouldn’t be dismissed simply because the data have been discarded.

    If there is a push for a universal data archival requirement and that practice becomes ingrained in the scientific publication processes, then that will become part of the process and everyone will do it. And it will be taught in school. If that was required back in the 1980s or even now, then most certainly I would have had to do it in order to publish. But it wasn’t and still (widely) isn’t. So holding Jones et al. to some yet-to-be adopted standard doesn’t seem to make sense to me.

    The tragedy for science would be if all the raw temperature data that were ever collected was forever lost. Not that someone’s compilation of it is now hard to replicate. Archiving the data t the time of publication would have been useful, but it is hardly unovercomeable. The science of understanding the past history of surface temperatures is not dead-in-the-water because Jones et al can’t (or won’t) give up their data.

    If there is a need for a new compilation of global temperatures that is clearly documented and openly accessible, then by all means someone ought to set forth to undertake such an effort and not rely on outdated and unverifiable techniques.

    -Chip


    PS. Sorry Lucia, I misread your statement!

    ReplyDelete
  15. Lucia,

    Just to clarify, my above comment wasn't meant to be a point; it was an honest question from a position of ignorance. It seems you've answered it there. Thanks :)

    A further honest question: Would it be reasonable for CRU to consider going back and attempting to recollect that raw data, in case of the eventuality that "new information is learned about the spatial representativeness of siting, land use effects, and so on"? Is that what you're suggesting in the second part of your comment? I'm sure it would be quite a bit of legwork, but doing so would make sense to me...

    ReplyDelete
  16. My experience is that there are two fundamentally different science activities; science for science's sake and science towards some policy goal. It is fine to have the standard peer review for science for science's sake as if scientist 1 finds "A" and scientist 2 finds "not A" scientist 3 may come along and add to the total knowledge; or not if no one else cares.
    I would argue that when scientific results are to be used in policy or in medicine, it calls for a different level of scrutiny. QA QC procedures, data backup and posting, testing of prediction's accuracy by independent panels, etc. may be needed.
    It sounds as if 1) this view isn't universally held (greater risks call for greater scrutiny) and 2) people didn't know in advance how policy relevant their data would be. I don't think we can blame the authors for either.

    We can blame the science community at large for not clarifying/agreeing to any distinction between policy relevant and not. We can blame our human inadequate ability to predict the future for not knowing how important their data would become.

    ReplyDelete
  17. 8-Chip

    You seem to have a confused idea of exactly what peer review is expected to do. Peer review does not certify results. It merely gets you in the door. I can say with great confidence that most peer reviewed scientific papers end up being wrong in the long run. At best, with peer review you get approval to publish - without necessarily agreement - by two busy people, who have their own publishing to do.

    There is nothing - absoultely nothing - in peer review that should make you trust a word of any published paper. Much of the grad school experience consists of learning to tear apart published papers. No doubt you'd be surprised how easy it is to do.

    The need to "show your work" should be familiar to any school-boy. It is not required in scientific publishing to provide raw data - there is an assumption of competence and honesty granted to keep the process going. That does not mean that you don't need to keep the raw data available for later examination. Without it, there is no way to validate statistical manipulations. Steve McIntyre has shown clearly and repeatedly that the climate science community needs to have their work audited. The loss of this original data means that the chain has been broken, and there is no way to know if the statistical manipulations done by CRU has been legitimate. The logic of the scientific method does not allow for 'he's a nice boy' allowances - as Ronald Reagan said - "trust, but verify."

    ReplyDelete
  18. I began working with climate data while I worked for the Swedish Building Research Council in the late 1970's and continued working with that kind of data in Africa and the far east for the next 25 years. Climate data, not to put too fine a point on it is garbage. The weather stations are typically badly located. Instruments therein are most often out of calibration, the kinds of data that would really be useful, like solar insolation and long wave IR environment measurements are rarely, if ever taken. When I see people like Hansen then propose that they know what not just one weather station is doing but the whole world's climate is doing, I'm still left gasping at their chutzpah. Fraud is too kind a word for what people like Hansen, Mann, Bradley and Hughes are perpetrating.

    ReplyDelete
  19. Chip said...

    "So if the CRU is guilty of requiring people to just “trust them” them, I would imagine that so too are 90% or more of all the authors ever published in the scientific literature."

    That would imply that maybe 10% of the authors published have reliable, replicable results. That sounds about right.

    Speaking as a former scientific journal editor, I can say that "peer review" typically bears more than a passing resemblance to Bismarck's famous comment about the making of laws and sausages. It a very clean looking phrase that describes a process which is often anything but clean.

    Mind, I edited during the late 1970's. Scientific publishing and the peer review process seems to have got a lot worse these days and it was pretty bad then.

    ReplyDelete
  20. Mark B: It might come as a surprise to you, but Reagan was actually paraphrasing Lenin, who said, "Trust is good, but control is better."

    But as I understand now from the news item in Nature, this isn't about missing data anymore. Much of the above discussion, while interesting, therefore doesn't actually apply. What still does apply is Roger's argument for transparency in science, which in this case relates to the application of WMO resolution 40.

    If anything good could come out of this, it would be for meteorological offices around the world fully to adhere to the resolution, and exchange their data and products "without charge and with no conditions on use", as Annex 1 stipulates. I know that the (non-)availability of climate data has been raised as an issue by some Parties to the UNFCCC in discussions on adaptation. But to my knowledge Parties have never been requested to adhere to WMO resolution 40 (or take similar measures) for the specific purpose of advancing climate policy (adaptation or mitigation).

    The current debate, embarrassment aside, could convince negotiators to attach more priority to Article 5 of the UNFCCC (http://unfccc.int/essential_background/convention/background/items/1364.php).

    ReplyDelete
  21. A further honest question: Would it be reasonable for CRU to consider going back and attempting to recollect that raw data, in case of the eventuality that "new information is learned about the spatial representativeness of siting, land use effects, and so on"? Is that what you're suggesting in the second part of your comment? I'm sure it would be quite a bit of legwork, but doing so would make sense to me...
    Sort of. Since CRU doesn't work for free, and universities' reward structures do not reward maintaining archives or handing out out, I would suggest that some other agency would be better suited to recollecting and archiving data. The data could then me made available to the public. CRU could write a proposal to analysize data and create a value added product, as could others.

    If the US DOE is going to fund such an effort (and Jones did at different times get grants from DOE) I think they would be wiser set up an archving/distribution program under the national laboratories. ARM has a good trackrecord of making data public. A structure like ARM would make sense.

    ReplyDelete
  22. This is an admirable critique of Hadley's/
    Jones's mislaid data. I am perplexed by only one sentence: "To be absolutely clear, none of what I write here should be taken as implying that actions to decarbonize the global economy or improve adaptation do not make sense -- they do."

    Really? Why? As we add CO2 to the atmosphere, the globe cools. Plants love it, and the world is fed. Sounds pretty good to me.

    RW

    ReplyDelete
  23. To someone who entered academe and 1961 and left it 2007, the claim that CRU lost its original data actually rings true.

    We are, in fact, in a data Dark Ages. Nobody backs up anything. Everyone is too busy to bother.

    Recording media and formats change rapidly, and nobody transfers the old data to the new formats/media. How many punch cards do you have? Can you input them into anything? Do you have data on 5 1/4 disks, wires, paper tape, ...?

    When our department switched over to desktop computers from IBM Selectrics, the secretaries stopped making copies. They wouldn't even backup files on other hard drives. The person in charge of secretarial pool said such backups were insecure. Why the backup drives couldn't be locked up at night was never explained.

    Not only that, but the secretaries wouldn't even keep data files on their own computers. Letters, memos etc were routinely deleted to open up disk space. I was department P&T chair for a number of years, and the only backup P&T records in the department were on my personal computer at home. The originals were on my (insecure) office computer. The front office had no records.

    So, Hadley is merely going with the flow. You might want to quietly investigate record keeping in your own department before you doubt Hadley's claim.

    ReplyDelete
  24. Welcome to 2009 and the damning electronic trail indicating that this data was DELIBERATELY DESTROYED by Phil Jones, et al.

    ReplyDelete
  25. "The two MMs have been after the CRU station data for years. If they ever hear there is a Freedom of Information Act now in the UK, I think I'll delete the file rather than send to anyone."

    Letter from Phil Jones to Michael Mann, 2/2/2005.
    1107454306.txt

    Not ironclad, proof, but wouldn't such a letter, if authentic, point to a criminal inquiry?

    ReplyDelete
  26. @Lucia

    you say:

    "I think we can perfectly well continue to assume CRU it valid, and compare thing to CRU. At the same time, I think it's best for the basis for CRU to be as transparent as possible. "

    what is the basis for this belief. If the original data is not available, why should we assume it is valid, or even assume it is not valid.

    I think that if the original data is NOT available it simply means you can NOT in the proper sense of the word, rely on it.

    ReplyDelete
  27. It is perfectly absurd to make any policy decision using CRU analysis. No amount of crafty finger-pointing can be substituted for pier review. The organization at best has set back any climate change policy decisions, in fact make them less feasible even if their strenuous assertions are true.

    Smoke and mirrors science is not worth anyones breath. What a bunch of poor scientists, or boldface liars.

    ReplyDelete
  28. -28-Shep sez:

    “What a bunch of poor scientists, or boldface liars.”

    Merely as a point of trivia, the phrase is “bald faced liars”. It derives from the fact that a lie is more easily detected by facial expressions which are more easily concealed by a man wearing a beard.

    This -- no doubt -- explains why so many so-called “Progressives” sport facial hair!

    ReplyDelete
  29. I've read this whole blog now and I am shocked that many, if not all, of you more knowledgeable types on this stuff than I have missed the recent revelations that CRU and IPCC scientists were PEER-REVIEWING EACH OTHERS' PAPERS!!! And why the reticence to say "conspiracy"? You worthy global warning community citizens who are still apologizing for this sad group of "scientists" at CRU and IPCC apparently can't see what the rest of us "children" in the world can clearly see: These Global Warming Science Emperors have no clothes!.... and you reveal your unworthiness, or academic dishonesty, by not seeing and saying it also. Aren't you ashamed? Maybe not if you keep upholding each other - - Seems to have worked for your heroes at CRU and IPCC.... for a while.

    ReplyDelete
  30. For a few hours I have been reading and looking for We Lost the Original Data and is amazing and disturbing how many blogs related to generic viagra are in the web. But anyways, thanks for sharing your inputs, they are really helpful.
    Have a nice day

    ReplyDelete