.

Friday, April 25, 2014

Heat sinking, temperatures rising in the US of A

Sou | 10:36 PM Go to the first of 76 comments. Add a comment

By request, I'm closing a previous discussion, which turned into a discussion of the siting of weather stations among other things, and opening up this new thread for a continuation. The other thread was getting way to hard to load and it was almost impossible to follow the discussion.

To help kick off, if you want to repeat your latest comments from the old thread here, feel free to do so. 

Addendum: (I've added the following as a comment.)

If anyone is new to the reason for Evan's work, this article from ClimateProgress, together with Andy Revkin's piece and an article by Jeff Masters of Wunderground are as good as any. The comments at Climate Progress are useful as well.

And then there is this article, too, in which Anthony Watts more clearly explains his mindset.

76 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Major climate data sets have underestimated the rate of global warming in the last 15 years owing largely to poor data in the Arctic, the planet's fastest warming region. A dearth of temperature stations there is one culprit; another is a data-smoothing algorithm that has been improperly tuning down temperatures there. The findings come from an unlikely source: a crystallographer and graduate student working on the temperature analyses in their spare time.

    http://www.sciencemag.org/content/344/6182/348.summary

    ReplyDelete
    Replies
    1. Behind a paywall, but I am very interested to see what Tamino has to say about the smoothing algo, not to mention the pros over at RC....

      Delete
    2. Oh, tamino is cool with Cowtan & Way alright, because the stats work is sound:

      http://tamino.wordpress.com/2014/02/06/cowtan-way/

      Delete
  3. Evan,

    Thankyou for responding to my questions way down at the bottom of the last thread. If I may though can I repeat a couple that you may not have spotted (rewritten in light of your responses):

    What is the breakdown of temperature trend across different site classifications? Are all "non-pristine" site classes equally different from Class 1/2 sites? What is the make-up of your 4 out of 5 "badly located" sites - are they all class 5 or are a decent proportion Class 3 or 4?

    Are Class 3 sites, Class 4 sites and Class 5 sites equally different in trend from Class 1/2 or is there a continuum that you are averaging across your comparison between "pristine" & "non-pristine"? Additionally, is there any difference in trend between Class 1 & Class 2 sites?

    I am also concerned with your statement that you are classifying sites "according to Leroy (2010) this does not chime with the following statements also made by you:


    "Vegetation cannot be judged with any sort of completeness without actually visiting the sites and measuring it specifically, That cannot be judged even with photographs."

    "A more pertinent issue is shade: The thumbnail answer is that the two factors (shade and heat sink) are not independent variables. In the large majority of cases where there is shade, the shade comes from the obstruction, itself (of course). Also, Leroy is not assuming min-max. So he has to be concerned with shade at all times dinurnal. But in the case of min-max, all one has to be concerned with is shade during the short period preceding Tmax. Tmin is not an issue (no shade at 5AM. ... So South shade (for Northern Hemisphere angle) and West shade would be our only concern."
    And
    "We are addressing heat sink only for this study."

    Which all indicate deviation from Leroy (2010) methods. This is perhaps understandable, given the aims of the study, but be carful of making incorrect claims on methodology.

    Also, I'm not sure you got the gist of my second comment - I was asking how well Class 1/2 stations reflected the makeup of the land surface of the US/the globe. If, for example most of the planet would be classified according to Leroy as, say, Class 3 how does that affect your analysis.

    A simple thought experiment to elucidate this point - if you were to stick 1,000 pins randomly into the US land surface on Google Earth & then rated their suitability as weather stations according to Leroy criteria what proportion of these pins would be Class 1/2/3/4/5? Whether this has any bearing on the study I'm not sure, I certainly don't know the answer but it would be interesting to hear yours (and others) thoughts on this aspect.

    ReplyDelete
    Replies
    1. meh, apologies for the typos, writing this whilst on hold phoning the taxman...

      Delete
    2. An important point. If WUWT had followed Leroy exactly, they would have had much less leeway to produce spurious statistical results. Given that all they have is statistics and a workable physical mechanism is missing, that is important.

      Delete
    3. Thank you Sou. I hope we can be happy with each other even as we disagree. It is an honor and a privilege to interact with Dr. Venema.

      I welcome the questions from everyone. I consider them valuable. yes, Quiet Waters, I am talking to you. I promise will address, though I may have to get back to it a little later.

      If WUWT had followed Leroy exactly, they would have had much less leeway to produce spurious statistical results.

      Gosh yes. Heck, I doubt we would have over a dozen Class 1\2 stations in the entire lot. There would be nothing left to compare with.

      I don't mind breaking out the individual considerations and evaluating them in tandem. That works for me just fine.

      But throwing it all together, especially as they are not even all necessarily warming biases, is as bad as, um, homogenization.

      Surfacestations.org is concerned with the heat sink bias discovered by Anthony before we had even heard of Leroy. It is leroy (2010), however, that provides a consistent mode of evaluation for our considerations.

      Other considerations are all very well. But they remain other considerations.

      Delete
    4. Odd: But throwing it all together, especially as they are not even all necessarily warming biases, is as bad as, um, homogenization.

      And odder: Anthony Watts has odd ideas about heat sinks. (archived here)

      Delete
    5. I hope that every time something like that is written, the reader by now will automatically hear my voice shouting in frustration: please provide evidence for your cheap shots against statistical homogenization.

      Delete
    6. Victor, I may be wrong but, as I now understand it, Evan is not looking at the US record as a whole but only at those weather stations that he deems have "heat sinks" located closely enough that they affect the recorded temperature, which he thinks will also affect the trend.

      If what I understand is correct, then it's really a classification exercise of a quality yet to be determined - not an assessment of the reliability or otherwise of the US temperature record.

      I'm not so sure about his thermodynamics, either. And I see you've commented on that aspect too.

      Evan will probably pop in to clarify this point.

      Delete
    7. @Sou: We are looking at the USHCN, which may be considered to be the US record as a whole. Four out of Five stations are so affected. The remaining 20% show significantly lower trends, on average.

      Homogenized results show the well sited station data matching the poorly sited data and yet it barely affects the poorly sited stations.

      So we have three findings:

      -- Well sited station raw trends are significantly lower than those of poorly sited stations.
      -- Well sited stations raw trends are significantly lower than the official adjusted record by nearly the same amount.
      -- Well sited station data is adjusted to match that of the poorly sited stations rather than the other way around.

      We feel those are very important findings.

      I apologize, Dr. Venema, but I obviously feel strongly on this point.

      I think homogenization is the wrong approach. I think if a station is an outlier and is shown to have problems it must be fixed or dropped. The USHCN oversamples. That makes the need for homogenization moot.

      I understand that sometimes adjustment is necessary, although I dislike doing so. But homogenization changes a station's data from what it is to what it is not for no other reason than it is different from its neighbors. This introduces an inherent MoE, whatever else effect it has, and narrows numerical distribution while maintaining sample size. It therefore creates a smaller deviation error bar which effect I consider to be spurious.

      Sometimes homogenization goes to wild extremes. I do not recall exactly which station, but one of those out in the western desert of Australia has "neighbors" at ridiculous distances.

      Where homogenization stabs me in the heart (and makes me want to shout) is that it tends to identify the well-sited minority as outliers and adjusts them to match the poorly sited stations. Therefore not only is the average spuriously increased even further, but all trace of the disparities vanish from the record and you cannot even know there was a problem unless you deconstruct.

      At least if the data is not homgenized you can see some sort of issue. The trend is a little lower and the error bars a little wider. The patient is still ill, but at least on can tell something may be amiss. You can look at the data, bin it and examine it for reasons it may be showing not-typical results.

      But homogenization removes what I consider to be the true signal and it does so without leaving a trace that it ever existed.

      Is killing the signal from my precious Class 1\2s. It silences their song.

      And that is a source of endless frustration to me.

      Delete
    8. As an aside, I am very ambivalent about Anthony's proposed "organization" (I posted my thoughts on that on WUWT). Because it will be incumbent on me to take part, at least at first.

      And I fear that such an organization will homogenize me. All of us. Either that or tear us apart.

      And if I were part of such an organization, it would cut me off from the other side. because they would cut me off. I do not ascribe blame; it is only human nature, after all.

      I need to interact with everyone in this tragic conflict. I earlier said I do not want my thinking to be done by a crowd. But that includes the crowds on both sides. And I cannot think in an echo chamber.

      Delete
    9. Also the trends of the "Well sited stations" can be wrong for other reasons and thus in need of correction. That the trend of the well sited stations after homogenization happens to be near the trend of the badly sited stations in the raw data is thus suggestive, but no proof.

      If the artificial additional "trend" is due to steps it should be possible to remove a part of the artificial trend by homogenization. In case of micro-siting I would expect the problems to occur in steps, because they happen by definition near to the station.

      If the artificial additional "trend" is due to local trends in the majority of badly sited stations, you would be right that homogenization would make the good stations worse. I would be surprised if you could give a mechanism that could do that, especially a mechanism that did not work the last decade, but only in the 1990ies. So we are back to the question you did not answer yet on the physical mechanism.

      Yes, in case of isolated stations (in Australia) homogenization will probably not be able to improve the data (much). You are studying the US of A, however, with a very dense network where homogenization should work pretty well.

      Delete
    10. I'm still not sure I have this straight.

      Stations are discarded if there is a break in record, movement of a station or any other disruption in continuity of the record without adjustment, except for the change in station type.

      Four out of five stations are classified as "poorly sited" according to Leroy and all of these are so classified because they are near heat sinks.

      No station is classified as "poorly sited" for any other reason.

      Two out of five stations are classified as "good" according to Leroy.

      The "poorly sited" stations show a warming trend faster than the "good" stations.

      There are other stations that are not classified as "good" that we have not included because they are not near a heat sink and might have a lower warming trend. (This seems to contradict the other statements about all the poorly sited stations being near heat sinks.)

      I've gleaned the above from putting together your various comments, Evan. Have I got it right yet?

      Delete
    11. Typo: *One* out of five stations are classified as "good" according to Leroy.

      A further clarification question. If I understand it right, sometimes it is known that the categorisation changes (after talking to the curators of the station?). But if the curator describes a change that could change the temperature, but does not change the category, which will especially easily happen for the less good categories, then this change is ignored and the station is see an homogeneous, right?

      Delete
    12. If there is a localized station move and the Class remains the same, we do not drop the station. But there are very few of those (1% of sample maybe?).

      In the overwhelming majority of cases, if there is a move it is either to or from an unknown location and we drop the station.

      Delete
    13. I would be surprised if you could give a mechanism that could do that, especially a mechanism that did not work the last decade, but only in the 1990ies.

      Of course it wouldn't have any effect over the last decade of atmospheric/surface non-warming. there is no trend to exaggerate. That's the point, isn't it?

      It was the '90s that saw a strong warming trend. That's when your difference pop out. If you know what to look for, it sticks out of the data a mile.

      I am not a mathematician. But I am a wargame designer and developer. I roll in the mud with numbers. I can pick up a random bunch of them in each hand, heft them and tell you which one is heavier every time.

      And I am telling you that you have a fatal design flaw, here. You don't get these results with straight dice unless the process is screwy.

      Delete
    14. Strategic games are a lot of fun. At the nice thing is that you can create the rules yourself. That is somewhat of a problem with reality.

      Thus I am looking for the rule that produces an "exaggerated trend" on decadal scales. Why would problems with micro-siting exaggerate the trend on decadal scales? We already know it does not do so on annual scales. The year to year variability of the homogenized data is well matched by the climate reference network.

      I have an idea, but it does not relate to the quality of the measurements, but to a difficulty of your study. I would expect that the information on the station history, which you got by calling the custodians of the stations, is more reliable for the recent past as for the 1990s. Thus the last few years, your approach of doing without decent homogenization might still be acceptable, not too many non-climatic changes will be missed. However, the more you go back in the past the more problems you get with non-climatic changes that are forgotten.

      That is why I would like to encourage you to at least detect inhomogeneities by comparison with neighbouring stations. You do not have to correct the data, if you fear that this least to smoothing, but just detect the stations that have a clear non-climatic jump in them. You could use SNHT, it detects both breaks and gradual trends. Thus you could only remove those stations that have a break (or a very short gradual trend) and not the ones that have a gradual trend over the entire decade (due to your unexplained amplification).

      If the effect is just as strong in such a dataset without stations with an artificial jump as in your current one, that would make the statistical result a lot stronger. I would still be interested in an explanation about what causes such an amplification, however.

      Delete
    15. I will have to answer in chunks. (BTW, thanks again for this invaluable interaction.)

      Strategic games are a lot of fun. At the nice thing is that you can create the rules yourself. That is somewhat of a problem with reality.

      Do the rules write the war, or does the war write the rules? If you are not doing it the second way, you are doing it wrong. At least in terms of simulation. Not a lot unlike science, actually.

      What games developers (and the designers, if they are good) can tell you is how a rule can affect a game system under unimaginably different practical (or not) conditions during playtest. And if an artifact appears, they can nail it then and there.

      That's what the homogenization designers failed to do. They only looked at what homogenization effects would be on a basically good dataset. They failed to account for its effects on a bad dataset. They didn't even consider the possibility.

      A game developer worth half his salt would never have missed that. Heck, I am a game developer (and designer). And I did not miss that. #B^)

      Delete
    16. Thus I am looking for the rule that produces an "exaggerated trend" on decadal scales. Why would problems with micro-siting exaggerate the trend on decadal scales?

      Here's what I think it is. Call it the Delta Sink rule. Works for both a warming and cooling trend (both will be spuriously exaggerated).

      As it warms (or cools), the Ī”temp between the heat sink and the air surrounding the sensor diverges. Therefore, at Tmin the release of heat is proportionately greater (or lesser) and also earlier (or later) in the heat sink release process at Tmin by the end of the study period than it was at the start. This produces a disproportionate trend effect.

      Therefore, a 1C offset in 1979 becomes a 1.5+ offset by 2008.

      In a cooling phase, the process reverses itself, and cooling is exaggerated. That is demonstrated by the data from 1998 to 2008.

      So, having looked at the hypothesis from opposite sides of the coin and seeing it operate in both directions, the hypothesis appears astoundingly robust.

      We already know it does not do so on annual scales. The year to year variability of the homogenized data is well matched by the climate reference network.

      We are looking at decadal trends over 30 years. What may have error bars so large it is meaningless, tightens up and defines itself over time.

      Ī”Trend is a direct function of the Ī”Temp produced by homogenization. Trend cannot be anything but a reflection of the data.

      As the data is adjusted from year to year, the trend effect simply falls into place.

      The year to year variability of the homogenized data is well matched by the climate reference network.

      That is because CRN has only been in operation a few years. There has been no overall trend since then. Heat sink effect is a trend amplifier. When there is no trend to amplify, there will be no divergence.

      I would expect that the information on the station history, which you got by calling the custodians of the stations, is more reliable for the recent past as for the 1990s.

      I obtain the history of moves and TOBS from NCDC/HOMR. USHCN metadata going back to 1979 is quite excellent; it has improved dramatically. Someone in NCDC made a good hire. Pimp my USHCN!

      So we do that quite systematically.

      Besides, all the metadata does is dictate to us which stations we drop. Clean and simple. And the stations we do drop have even lower trends than the ones we do not drop; other than that they show the same results.

      That is why I would like to encourage you to at least detect inhomogeneities by comparison with neighbouring stations.

      Agreed. I think we would not find what you suspect, but that is the way to find out. A good subject for followup.

      First we must construct. Then we shall deconstruct. Besides, if we don't do it, someone else will. You make a good recommendation. Good top-down thinking.

      If the effect is just as strong in such a dataset without stations with an artificial jump as in your current one, that would make the statistical result a lot stronger.

      Yes. Too much for this paper, but definitely worth a look going forward. Rome wasn't burnt in a day.

      I would still be interested in an explanation about what causes such an amplification, however.

      I gave you my own wargamer's assessment. It all comes down to the Ī”t of the sink vs. the ambient temperatures over time in the presence of a real trend (either cooling or warming).

      There may be some fiddling with the details, but we do have a physicist on the team to address that exact question.

      Delete
    17. Thus you could only remove those stations that have a break (or a very short gradual trend) and not the ones that have a gradual trend over the entire decade (due to your unexplained amplification).

      Well, this part I still think is being dome backwards. Inhomogeneous results occur all the time and can be quite correct if there is nothing wrong with the sensor. Sometimes it just gets colder or warmer in that neck of the woods. The data is still good, break or no break. The disparities (assuming they are not artifacts) are simply a part of the song. ("Melodies decaying in sweet disonnence . . . into the ever Passion Play.")

      But, having said that, Mosh and crew do exactly that. They will certainly plug our Leroy (2010) results into their BEST methodic and see how it comes out. Those results will be interesting. (Currently, they are using the obsolete, fatally flawed Leroy, 1999.)

      Delete
    18. I see I missed one of your questions:

      What is the breakdown of temperature trend across different site classifications? Are all "non-pristine" site classes equally different from Class 1/2 sites? What is the make-up of your 4 out of 5 "badly located" sites - are they all class 5 or are a decent proportion Class 3 or 4

      Here is the breakdown:

      Class 1: 7%
      Class 2: 15%
      Class 3: 34%
      Class 4: 30%
      Class 5: 14%

      Delete
    19. Evan Jones: "They only looked at what homogenization effects would be on a basically good dataset. They failed to account for its effects on a bad dataset. They didn't even consider the possibility."

      One of the most common problems of the climate "sceptics" is that they assume that scientists are stupid. If you need that assumption, you most likely have to study the scientific literature more.

      They did consider the possibility. It is known that when the majority of the stations has a gradual inhomogeneity, the algorithms do nor work. They do work when all stations have break inhomogeneities or when some have a gradual inhomogeneity. If there were a break every year, the methods would also not work.

      They just think that such problematic situations are not common and that their methods thus on average improve the usefulness of the data for trend analysis.

      You have not convinced me yet that there was a gradual inhomogeneity in the USA in the 1990s.

      And your story of 20% good and 80% bad stations directly leading to adjusting good stations to bad ones is also too simple. You list the frequency of every class in your last comment. Surely it would be rather artificial to expect classes 1 and 2 to have exactly the same artificial trend and classes 3, 4 and 5 to have the same wrong artificial trend.

      It seems more natural to assume that if there is a problem, it gradually becomes worse, the worse station classes become. In that case you would notice problems as you would see gradual inhomogeneities in many difference time series of one station with its neighbours.

      Delete
    20. What you are saying is that they considered the possibility but, seeing nothing wrong, they applied it.

      I'd have been more interested in the reason for the data spread in the first place. But all the efforts simply went towards making those differences disappear. And disappear they certainly did.

      Then when Leroy (2010) came along they did not re-rate the stations, which would have brought out the problem then and there. They were already using Leroy (1999); it's not as if they never heard of Leroy.

      Be that as it may, they applied it wrongly. And when you deconstruct, the problem sticks out a mile.

      I am saying, quite simply, that the poorly located stations have significantly lower trends than well sited stations over the warming period from 1979 - 2008. That's after accounting for all the objections -- TOBS-bias, moves, MMTS conversion.

      After homogenization, the well sited stations and the poorly sited stations have identical readings. You pointed that out yourself back in 2012 if you will recall. Well, we agree. They do.

      The only problem arises when it turns out that the poorly sited stations were adjusted 0.01C/decade lower and the well sited stations were adjusted 0.14C/decade higher.


      And your story of 20% good and 80% bad stations directly leading to adjusting good stations to bad ones is also too simple. You list the frequency of every class in your last comment.

      Yes. The differences are quite stark. The 20% sing +0.185C/decade and the 80% sing +0.335C/decade.

      I would expect the badly sited stations to be adjusted down a notch and i would expect the well sited stations to be adjusted through the roof.

      Surely it would be rather artificial to expect classes 1 and 2 to have exactly the same artificial trend and classes 3, 4 and 5 to have the same wrong artificial trend.

      Emphatically no! Not in the slightest. I would expect the badly sited stations to be adjusted down a notch and I would expect the well sited stations to be adjusted through the roof. Which is exactly what happened.

      After adjustment, the outliers will be like the majority, because that's what homogenization does. It's what it's for. And if the majority has a systematic defect, then the result will be wrong and the correct (minority) data will be adjusted to look like the majority result -- again wrong.


      That after the application of an algorithm that identifies outliers ans brings them into conformity with the majority.

      It seems more natural to assume that if there is a problem, it gradually becomes worse, the worse station classes become. In that case you would notice problems as you would see gradual inhomogeneities in many difference time series of one station with its neighbours.

      And in this case the gradual divergence just happened to be between well and poorly sited stations. Fancy that. And there is the tipoff NOAA missed. They never looked at the stations most strongly adjusted for a commonality. Well, the "defect" appears to be that they were well sited.

      I know how smart these guys are. But the mistake they made here is one that takes experience to identify. It caused them to adjust in exactly the wrong direction.

      Delete
  4. A question, Evan Jones, why do you not detect non-climatic changes (inhomogeneities) by comparison with neighbors?

    Even if you think that the correction of the data smooths it (which is not true), there is no reason not to detect inhomogeneities to remove inhomogeneous stations. You claim that removing stations is fine.

    ReplyDelete
    Replies


    1. Believe me when I say this. I ran error bars for literally hundreds of sets and subsets of USHCN data. The error bars are always much smaller for homogenized compared with raw (or raw+MMTS). When we archive you can run those numbers.

      there is no reason not to detect inhomogeneities to remove inhomogeneous stations. You claim that removing stations is fine.

      Yes. But there needs to be a reason other than that they are different. One would examine or calibrate or repair. Or even drop it.

      The other problem is that if there is an artifact which affects the majority of the sample, the artifact is enshrined and the correct result vanishes.

      I think that is what is overlooked.

      Delete
    2. A question, Evan Jones, why do you not detect non-climatic changes (inhomogeneities) by comparison with neighbors?

      Before or after fully accounting for microsite and only after that homogenize, then and only then would it do what it was designed to do.

      If you homogenized Class 1\2 stations you would probably get lower trend results because high-trend Class 1\2 ratings are in the minority. We would never know there were some warm Class 1\2s. Homogenizing will have removed all trace. That would be spurious, I think.

      The process is just as pernicious when it helps as when it hurts our hypothesis.

      I say just use Class 1\2 and do not homogenize. Hire good thermometers. Keep them healthy. Keep them away from smooth-talking tall dark buildings and the cheesecake of paved surfaces. Forbid them to hang around strange parking lots. Then let them do their jobs.

      Delete
    3. Evan Jones.

      How many buildings, paved surfaces, and/or parking lots are there near the Arctic sea ice? Near mountainous glaciers? Near pine forests? Near oceanic depths? Near...

      And there's another point to which I would like to see you respond, which I posted a few days ago on the previous thread but which seems to have disappeared in a moderation hole*. If, as you seem to want to prove, the contemporary warming is less than the professional, expert science indicates, then the direct inference from such a result is that the responses to date of the crosphere, biosphere and hydrosphere indicate that each is disturbingly more sensitive to changes in temperature than the respective scientific disciplines understand them to be. Unless you are also going to argue that equilibrium climate sensitivity itself is little more than a fraction of a degree Celsius greater than zero (a position which is becoming more and more indefensible) you need to explain why your work wouldn't indicate that we are already up to our armpits in do-do of the very existential sort.

      And if you are going to argue that ECS is less than 3°C, or even 2°C (heck, under the inference arising from your "not much warming" notion even 1.5°C is dangerous), then you are going to be sorely pressed to dismiss or otherwise refute the mounting evidence to the contrary.

      We await, however, any such response with our curiosities piqued.



      [*A number of people's posts appeared on the "Latest comments" list, Sou, but they never made it to the thread. Is there a backlog somewhere awaiting release?]

      Delete
    4. I can check, Bernard, if I know what I'm looking for. The thread got so long it took me a while to find posted comments. Are there any in particular you are wanting me to resurrect?

      (One thing you might try is empty the cache, load the page, click down the bottom and keep clicking load more or whatever till it stops appearing, then do an F3 search for the text. You can then copy and paste it here. Or if it's disappeared altogether maybe repost the gist of it on this thread.)

      Delete
    5. It could also be that Bernard has to load more comments (button at the end to comments) to see his comment. If the comments were in the latest comments list, they should also have appeared on the page, if they were not later deleted, I would expect.

      Delete
    6. I only moved (deleted from the thread) one comment and it wasn't from Bernard.

      Delete
    7. Thanks Victor, it was a matter of "load[ing] more comments" several times before the missing comments appeared.

      All is well in the world (except for climate change denialism and other human insanitiesm of course...).

      Delete
    8. What, no thanks to me? (just kidding)

      Glad you found the missing comments.

      Delete
    9. Sorry Sou, I worked my way up from the end of the thread and saw and responded to Victor's comment before I even noticed yours.

      And you know that you always have my gratitude... ;-)

      Delete
    10. Actually, I addressed the Arctic issue in the other thread. But I prefer not to address it here.

      Delete
    11. I'll add (to BJ) that I am concerned with the correct readings. Whatever sensitivity (or other) recalc. results from that . . . results from that. The idea s to get it right and let the chips fall where they may.

      Note well that we did not suppress our Fall et al. results. We published even though the results disputed our hypothesis.

      The followup disputes the Fall et al. results, but that is because we now use the upgunned Leroy (2010) ratings as opposed to the older Leroy (1999), which considered only distance to heat sink and not its area.

      Delete
  5. An important problem for the WUWT theory that a large part of the warming in the USA is due to micro-siting, is that satellite data and the US Climate Reference Network show very similar temperature increases as the *homogenized* climate network of NOAA.

    Now the satellite data may not be of great concern. It is anyway highly unreliable when in comes to trends. Satellites come and go out of the dataset and have different instruments on board, the electronics gets old, but cannot be calibrated because they are in space and the orbits change. Thus a large amount of adjustments are necessary and they are hard to estimate as you do not have the oversampling you have for the climate station network. (It is somewhat ironic that the UAH satellite temperatures are so popular with pseudo-sceptics, that simultaneously claim to be sceptical about any adjustment.) Furthermore, the trend in the troposphere is thought to be different from the one at the surface. Another highly uncertain fudge factor.

    The US Climate Reference Network was specially designed for climate change monitoring. It has double instruments to reduce missing data and is located in pristine places that are expected to stay that way the coming decades. Consequently, this reference network has no micro-siting or urbanization problems.

    If micro-siting or urbanization would thus warm the incoming air of the normal stations by a significant and increasing amount, you would see a difference between the temperature trend of the normal stations and the reference stations. However, these reference stations and the homogenized normal stations match very accurate. The main caveat is that the reference network only started in 2004 and thus only has about a decade of data. Still if these data problems were serious enough and Jones claims about 0.15 °C per decade, you would expect to see a difference over the last decade.

    The solution of Jones is to say that there is not an artificial additive trend from mirco-siting, but that it produces an amplification (that it is multiplicative problem) of the US warming on decadal scales. However, what is multiplied here?

    I would expect that the outside air can be heated by micro-siting problems. And with more energy use, you could expect that to create an artificial trend, but that would be, like for urbanization an artificial trend that would always be there, independent from whether the global mean temperature is increasing. It would be additive, not multiplicative.

    One reason for an amplification factor could be that when it is hot, the air-conditioning is running more and produces more heat towards the instrument. However even in this case, the additional factor is small compared to the already present constant one. Thus this cannot produce a purely multiplicative error. Furthermore, I do not think that 80% of the stations are next to an airco and even the WUWT crew would have noticed that the problem is only in the summer.

    ReplyDelete
    Replies
    1. The explanation Jones offers is:

      Evan Jones: Heat sinks such as structures or concrete absorb heat during the day and release it at night. This process begins quickly and steadily slows as the delta between the obstruction and the air temperature narrows. ... So over time, in a warming trend, the obstruction is in a continually earlier stage of release when Tmin arrives. That is why the sensor trend increases during a warming trend.

      My initial reaction was: "I am sorry, but this cannot explain artificial trends on decadal time scales. Heat stored in buildings during the day is released the following night, so much is right. However, the heat capacity of buildings is not sufficient to store heat much longer. I would expect a typical building to be able to store heat no more than a few days (and the heat given off becomes smaller and smaller during that time), a cathedral with meter thick walls may store heat over months, but also not more than a year."

      What I only realized later is that this mechanism is not only too small, but goes in the wrong direction for WUWT, the heat capacity of the buildings dampens short-term temperature changes. It will not amplify. When the temperature goes down, the buildings give off heat for a short time. When the temperature goes up, the buildings cool for a short time.

      Furthermore, as argued above this mechanism is strongest at short time scales, likely days at best. If it were important you would still see differences between the climate reference network and the homogenized data in the year to year variability, which is no there. Thus WUWT would need a physical mechanism that only acts on decadal scales, but does not amplify the year to year variability of the US mean temperature.

      On the long term, if the global temperature trend were constant, the local temperature trend would be the same after a short delay.

      Without mechanism, all WUWT has is a statistical pattern. That would normally be interesting to entice people to search for a physical explanation. Given the reputation of WUWT, I do not expect that the manuscript would have much influence on the scientific literature. Especially as an amplification seems hard to achieve. It would still be useful PR.

      Delete
    2. For bunnies interested in some more background on the Climate Reference Network, see this post.

      Delete
    3. Here's my view from the trenches. I surveyed quite a number of those. metadata clean as a whistle, not that you need any with such a perfect setup. The network is so beautiful it staggers the mind.

      I don't think I saw a Class 3 among them yet and most are so Class 1 it hurts.

      Makes you think America is a big country all of a sudden.

      They have triple redundant, PRT sensors and all the equipment is compatible, which is great, and it is no longer max-min, so no TOBS and full coverage. Too wonderful for words. In 20 years we will be proud of the CRN tree NOAA has planted.

      I do not know how PRT calibrates with MMTS, CRS, or ASOS, though. I do know MMTS adjustment is a very interesting dog. Not to Tmean so much, but to Tmax and Tmin.

      Delete
    4. I see I failed to address the satellite issue.

      I will take this to the bottom of the thread and do so.

      Delete
    5. I see I failed to address the satellite data. (Also a bit more.)

      I agree there are the problems with LT satellites, particularly because of the necessity to sacrifice lookdown capability for a wider sweep (resulting in poor polar coverage). And I am sure we all remember the problems with satellite drift that was resulting in spuriously reduced trends. And then RSS and so on and so forth. And, after the dust all clears, UAH is running noticeably higher than RSS.

      But that is LT and not surface. Dr. Christy hypothesized some time ago that warming should be ~20% greater in mid latitudes for satellite than for surface. He was perplexed that this discrepancy never appeared in the data. But there it was, it didn't. (Thinking back to the Menne and Fall et al. days.)

      Furthermore, After Watts et al. (2012 pre-prelease which started all this mess) we drew criticism from William Connolley that while it might be expected for our results to be 20% below but we were way on the bottom end of that, and I had to agree there was a discrepancy.

      One of his posters also made a remark to the effect he would not believe the results until all station moves were dropped, so I resolved to improve on that using the new HOMR metadata. So we now address all three issues, adduced by criticisms of the pre-release: moves, TOBS, and MMTS.

      By the way, MMTS adjustment seems small, but the effect on Tmax and Tmin in are quite large (as per Menne, which we use as a basis) and produce fascinating results and explains much. It is quite necessary.

      As a result of these corrections and continually improving Google Earth #B^) our Tmean trend results are ~19% higher than the W-2012 results.

      It now turns out we are 25% below UAH (and less so for RSS) if I have my numbers straight.

      P.S., All of this has been said before in various places, but this is the concatenation. The only breaking news story I have for you is that I made lots and lots of copy edits for the 2012 pre-release that somehow did not make it into the actual release. #B^U

      Delete
  6. Evan Jones: Now, at Tmax, the obstruction is busy absorbing (and radiating) heat towards the sensor.

    The sensor (thermometer) is protected against (heat) radiation. That is why thermometers are placed within Cotton Region Shelter, Stevenson Screens or multiple white cones shields and more and more often also mechanically ventilated and made smaller. Such direct heating errors should be very small nowadays; they were important in the 19th century before Stevenson screens were introduced. If there is an influence from the heat radiated by the buildings and ground nowadays it is because buildings heat the air.

    ReplyDelete
    Replies
    1. They do partly protect. Tmax trend does increase for poorly sited. The difference is there but is not great.

      We were going to embark upon a statistical study alone, but we have a physicist on the team who will be able to answer you better than I, or even shrug unknowingly, but with better knowledge than I. I took only a little physics.

      It is at Tmin that a very large difference occurs. So much so that Tmean is heavily affected.

      Note that while Stevenson Screens were absolutely the most important step up, MMTS is said to have better designed gill structure and that in a conflicrt between CRS and MMTS, that MMTS provides more accurate data.

      My favorite are those amazing Hazen Screens. There are only three or so left.

      Delete
    2. My apologies for my confusing language. When a scientist asks, could you provide a physical mechanism, a normal person would write: your data looks very weird, I could not think of anything that could produce such a thing, could you at least give a hint on what the hell could have happened, so that we could have an intelligent discussion about it? What happened that produced an artificial trend in the 1990 and stopped doing so as soon as the climate reference network was installed in 2004?

      A mechanism, a mechanism, my kingdom for a mechanism.

      Delete
    3. It appears primarily at Tmin. For some reason, the object radiating the heat will release disproportionately more heat as temperatures increase over time.

      Perhaps it is related to the rate it takes for a heat sink to release its heat. A heat sink releases it heat the colder the surroundings. It has still not completely done so by Tmin. and that is why your Tmin. reading will be too warm. That explains the reading but does not explain the trend.

      What I think is going on is that as temperature rises the heat sink is earlier in its cycle of heat release when Tmin arrives. That means it is releasing heat disproportionately faster. This creates a spurious amplifying effect to the trend of a poorly sited during a period of warming.

      This also creates a spurious amplifying effect to the trend of a poorly sited during a period of cooling, as the phase of release is increasingly closer to completion at ATmin.

      Bottom line: Poor microsite causes stations to exaggerate both warming and cooling trends. The only reason that the net effect of microsite is a warming bias is that it has occurred during a net warming phase.

      That is what I think is going on. That's me. I do not know what a physicist would make of it. But that's my hypothesis: The warmer it is the earlier in the sink's release cycle it is at Tmin.

      Delete
  7. I will be back to address all of this. I will cease utterly with all off-topic discussion. We-all can spitball over positive feedback and nearterm paleo reconstructions some other place and time.

    ReplyDelete
    Replies
    1. Agree, I would suggest not to discus climate sensitivity on this thread. It was not fruitful in any way on the previous thread and it distracts from the siting issue where Evan Jones has some expertise and it only creates a bad atmosphere.

      Delete
    2. I am not going to continue that discussion. But please note that I did not initiate it. If Evan had only kept to the original exchange with you, Victor, none of that would have happened. But he didn't, so it did.

      Delete
    3. I agree on all points. And I regret going down that path.

      I will answer as best I can. There may be some aspects I have to be careful of out of respect for my co-authors. As I remarked earlier, we have been burned and data I personally had a hand in collecting was used without permission twice, one of those times incorrectly, in light of leroy (2010).

      But I will do the best I can with what I have. I need to do some living-earning at the moment, but I will answer.

      Delete
    4. BBD, I wasn't blaming anyone. As any moderator knows, off-topic topics somehow appear out of the nothing.

      Delete
    5. @BBDApril 26, 2014 at 3:43 AM

      Right. I made an offhand comment, got unexpected response, elaborated, and made it spin out of control. I will not do so forthwith.

      Delete
  8. Also, I'm not sure you got the gist of my second comment - I was asking how well Class 1/2 stations reflected the makeup of the land surface of the US/the globe. If, for example most of the planet would be classified according to Leroy as, say, Class 3 how does that affect your analysis.

    Ah. One or two percent would be Class 5 (cities). There would be maybe a couple percent class 3 and 4 in suburban areas, though most of those would be Class a &2. All the rest would be Class 1. 95% or so.

    Even moreso for the rest. It's a class 1 world. 80% of land mass is not even populated.

    ReplyDelete
    Replies
    1. That is for heat sink only of course.

      Delete
  9. In case of micro-siting I would expect the problems to occur in steps, because they happen by definition near to the station.

    Leroy (2010) is not a sophisticated tool. There are several ways Class 3, 4, and 5 can occur. Examining these differences should be a subject for followup.

    ReplyDelete
  10. Anthony Watts has odd ideas about heat sinks.

    Over the study I think it can be said that we have all had some pretty odd ideas about heat sink. #B^)

    ReplyDelete
  11. Also the trends of the "Well sited stations" can be wrong for other reasons and thus in need of correction.

    Certainly. Any outlier should be examined for problems with the station. That is a red flag. Perhaps the station is in some way defective and the defect must be addressed. But maybe it just got colder or warmer in that neck of the woods. If the station is in good working order and there is no other outside impact, then the stat should be used.

    I say that only Class 1\2 should be used in any event, and they should ideally be located representative of overall mesosite (2% in urban, 20% in cropland, etc., etc.).

    That the trend of the well sited stations after homogenization happens to be near the trend of the badly sited stations in the raw data is thus suggestive, but no proof.

    Tmin trend is so seriously affected that it is very hard to believe this is coincidence.


    If the artificial additional "trend" is due to steps it should be possible to remove a part of the artificial trend by homogenization.

    If you must insist on homogenizing it is absolutely essential to correct for microsite first and then -- and only then -- homogenize. Either that or just drop the station. You cannot homogenize until any and absolutely all adjustments are made.

    If the artificial additional "trend" is due to local trends in the majority of badly sited stations, you would be right that homogenization would make the good stations worse.

    That is what is happening. Homogenization lowers the very bad station trends very slightly and raises the good station trends by a very large amount.

    I would be surprised if you could give a mechanism that could do that, especially a mechanism that did not work the last decade, but only in the 1990ies. So we are back to the question you did not answer yet on the physical mechanism.

    There was unequivocal longterm warming over the period.

    But I do have 1979 - 1998 and it tells the same story. And my 1998 - 2008 data shows the process reverse as it cools a bit. That suggests the hypothesis is doubly supported: shown to work in a warm trend and reverse itself in a cool trend.

    We are not using this data because it is less than 30 years and we prefer longer term data, but we do have both sets and we can dust them off if you wish.

    Yes, in case of isolated stations (in Australia) homogenization will probably not be able to improve the data (much).

    It will take good data and turn it into something that has nothing to do with the actual reading. That is not much improvement.

    You are studying the US of A, however, with a very dense network where homogenization should work pretty well.

    Except it didn't. It spuriously raised an already-too-high trend -- and it deep sixed the evidence for the true signal. I'm not blaming anyone, that's just what the infernal process does.

    I am and old hexpaper wargamer and I have designed, developed and playtested games of great mathematical complexity. I spot this sort of error in game rules all the time. Well, it's happening here.

    And the denser the stations, the greater the oversampling and the less need for homogenization in the first place. You are teaching the sensors how to take the temperature. But the sensors already know how to take the temperature.

    Let them do their job and examine them for defects if an outlier occurs. And if the outlier is well sited and functioning -- check the surrounding sites for bad microsite. That is the cross-check that is missing. But that does not occur.

    Besides, the denser the stations the greater the overampling and therefore all the less reason to homogenize in the first place.

    Attend to the sensors health and then just let them do their job. You make them nervous reading over their shoulder.

    ReplyDelete
    Replies
    1. And I should point out that the raw data from NOAA is not (and should not be) 100% raw. The very first thing they do is check for outliers and transcription errors. Thy could cap them at a given max., but they choose to remove them (which I prefer).

      So you do not need homogenization to do your outlier spotting for you.

      CRN has triple-redundant PRT sensors so the problem is addressed for the future. That will be a wonderful set of data someday if they have good distribution (funding has been an issue, I think). By this time I would guess distribution is okay, but I haven't checked.

      Delete
  12. I seem to remember that Chris Colose (?) knocked up a little script that sampled the contiguous USA temperature record and that with as little as 7 or 8 random stations the warming signal could still be discerned - and that this signal was apparent even with restriction to the highest quality stations.

    I'd like to see Evan Jones unpick Chris's work on this subject.

    ReplyDelete
    Replies
    1. Yes, of course the warming can be discerned. In order for a warming trend to be exaggerated there must be a very real warming trend to exaggerate in the fist place.

      But let's take your 7 or 8 random stations.

      Case 1: Are they well sited (Class 1\2)? If so, they will be adequately representative. If you homogenize those results, the process will work as intended.

      Case 2: All are poorly sited. Your results will be too high, but homogenization will not make it any worse, on average. Just a few more whisks at an already unfortunately scrambled egg.

      Case 3: But say that 5 out of 7 are average poor site trend sites (warmer trend), and 2 are average good sites (cooler trend). If you take a normal mean your results are only off by the extra warming of the 5 warmer stations. But if you homogenize the results, the 2 cooler stations are identified as outliers and heavily adjusted upwards to match the poorly sited station trends. The result is that you are now off by the warming of 7 stations. You adjusted -- in exactly the wrong direction.

      And you have made your error bar pleasingly small: after all, you have just reeducated your outliers, haven't you? They are now just the same as all the other good citizens.

      The fly in the ointment turns out to be what happens when the majority of good citizens (i.e., your sample set) turn out to be bad citizens. You cannot use 7 or 8 sensors to cover the US unless those sensors are providing a correct signal.

      In the old SPI days, they would have scolded a developer for missing a bug of this nature during playtest. But these things happen. There was usuaaly at least some errata for SPI games. (And that got worse over the years.) But I am okay with that, provided always that they correctly address the bug.

      Delete
    2. "In order for a warming trend to be exaggerated there must be a very real warming trend to exaggerate in the fist place."

      Which begs the question for a third time - do you realise that your insistence that the warming is less than that described by the consensus climatology directly implies that the biosphere, cryosphere and hydrosphere are profoundly more sensitive to warming than indicated by mainstream science?

      And that this is a Very Very Bad Thing Indeed?

      Delete
    3. Not addressing ECS in this thread.

      Delete
  13. Major climate data sets have underestimated the rate of global warming in the last 15 years owing largely to poor data in the Arctic, the planet's fastest warming region. A dearth of temperature stations there is one culprit; another is a data-smoothing algorithm that has been improperly tuning down temperatures there. The findings come from an unlikely source: a crystallographer and graduate student working on the temperature analyses in their spare time.

    I assume there are many issues with GHCN regarding distribution. And those algorithms are the point where things can get very difficult. It would be better to use an adequately dense, evenly distributed network and not need to rely on algorithms. I have often wondered why GHCN does not include the DMI Arctic network.

    But in our study is what, if any, impact heat sink proximity has on trend data for surface stations.

    Stipulating that the study demonstrates that microsite significantly affects siting not only at Tmin, but also at Tmean, then it becomes necessary to evaluate the GHCN (indeed, all other surface networks) for microsite.

    That will prove difficult, and may be impossible, depending on the state of the metadata. Even the nicely documented Dutch network provides coordinates to only two minute decimal places. Fatally imprecise by a factor of ~100 for my purposes. I need to be able to pinpoint them on Google Earth, or in many cases I cannot evaluate them, even with a photo. Also, Googearth is getting very good for the US, but maybe not so much for Inner Mongolia, and stations need to be everywhere.

    Be that as it may, I think we can demonstrate that ground stations -- GHCN, in paticular -- needs to be examined for microsite bias, and until that happens its results are suspect.

    ReplyDelete
  14. If anyone is new to the reason for Evan's work, this article from ClimateProgress, together with Andy Revkin's piece and an article by Jeff Masters of Wunderground are as good as any.

    ReplyDelete
    Replies
    1. However, it is important to note that the Revkin article was written in 2010. That was before even Fall et al. And it wasn't until months after Fall et al. that I made my initial re-rating of the stations, updating the rating from Leroy (1999) and Leroy (2010).

      So we had not made any of the findings we are discussing now at the time the article was written.

      Delete
    2. The Wunderground article (also from 2010) cites Menne (2009), which I actully use as a basis for my work regarding MMTS adjustment.

      But at that point the stations had not been rated using Leroy (2010) but using Leroy (1999). It also was incomplete, unreviewed, and neither TOBS, moves, nor equipment conversion were accounted for. And that was the set Dr. Menne used.

      It was not until late 2011 or early 2012 that I made the Leroy (2010) ratings which is what we are currently discussing.

      Delete
  15. I'm still not sure I have this straight.

    Stations are discarded if there is a break in record, movement of a station or any other disruption in continuity of the record without adjustment, except for the change in station type.


    Not quite. Climate can shift sharply and still be a correct signal. We drop moved stations (we retain a very small handful where the move is localized and the rating has not changed).

    We use HOMR metadata to determine if there is TOBS bias, and drop any station that will suffer from TOBS bias. (J-NG ran the actual TOBS-adjusted numbers for the stations I retained in order to confirm that work was done correctly.)

    There are a very few we never found, plus a much larger of number which were long since closed and there is no way to locate them.

    Four out of five stations are classified as "poorly sited" according to Leroy and all of these are so classified because they are near heat sinks.

    Yes, we rate for heat sink, only.

    No station is classified as "poorly sited" for any other reason.

    Yes. We do not rate for shade, slope, or vegetation. Shade is not an independent variable because the shade usually comes from the obstruction itself. The other two are impossible to measure with the resources at hand. But we may address such issues in followup.

    Two out of five stations are classified as "good" according to Leroy.

    Roughly one out of five. And that is for heat sink alone.

    The "poorly sited" stations show a warming trend faster than the "good" stations.

    On average, yes.

    There are some poorly sited stations with low trends and some well sited stations with high trends.

    We do address the "9 regions", because what may be a high trend in the Southeast would, of course, be a low trend in the Southwest. We take this into account.

    There are other stations that are not classified as "good" that we have not included because they are not near a heat sink and might have a lower warming trend. (This seems to contradict the other statements about all the poorly sited stations being near heat sinks.)

    A thousand times, no. If a station has not moved and it is not biased by TOBS, we retain it, no matter what the readings are. Anything else would be a travesty.

    I've gleaned the above from putting together your various comments, Evan. Have I got it right yet?

    The first two points are correct. If I pulled any shenanigans like that point three my head would be on a pike faster than you can say "as an example for the next ten generations".

    Our other major finding (and the most "red-meat") is that the raw Tmean Class 1\2 (good station) trend average is over a third lower than the Tmean trend of the entire sample (all Classes) final adjusted trend average.

    That is no more important a scientific observation than any of the others (e.g., the Tmin aspect). But it is the potboiler of the saga.

    ReplyDelete
  16. To clarify:

    To apply to all of the above, we are not rating according to shade, vegetation, or ground angle. We do not even know how the stations would be rated in these other categories.

    We rate for heat sink and heat sink only.

    We are only rating according to what Leroy (2010) considers to be a heat source. We would never remove a low-trend poorly sited station because it had a low trend. And likewise, we would not remove a station if it is well sited and has a high trend. That would be cherrypicking and the results would be a travesty.

    If a station passes muster for siting and TOBS it is included not matter what its rating or trend.

    We may do a followup on these other factors and maybe even deconstruct Leroy a little and examine the pieces (Different cases of Class 3, etc.). I have tagged a number of the considerations here as possible feed for followup. Plus a few I haven't mentioned, such as altitude. But we cannot address all of this at once in one paper.

    I look forward to examining all of these issues.

    ReplyDelete
    Replies
    1. "We rate for heat sink and heat sink only.

      We are only rating according to what Leroy (2010) considers to be a heat
      source."

      You do realise that they are two completely different things, don't you?

      Delete
  17. What you are calling heat source we are calling waste heat. Yes the two have different effects.

    Leroy's definition of "heat source" covers what we refer to as "heat sink" (see the paper). We distinguish between heat sink and waste heat and use those terms.

    Waste heat may actually dampen trend, particularly at Tmax. It raises the offset, of course, but the trend can be swamped. Waste heat tends to be more variable.

    Heat sink is a steady influence. The overwhelming majority of poorly sited stations are subject to heat sink and only a small handful are exposed to waste heat.

    Those stations exposed to waste heat are usually Class 5 stations and this may explain the lower Tmax trends of Class 5 stations (they have elevated Tmin, however.) Class 5 trends are higher than Class 1\2, but are lower than Class 3&4.

    ReplyDelete
  18. Has this study produced a paper yet? Or was the 2015 AGU poster the last of it?

    ReplyDelete
    Replies
    1. Hi Katy, according to something I saw recently, Evan Jones says they are still hoping to publish a paper but haven't finished their analysis yet (what is it, at least five years down the track?).

      Would be strange if they could get it published anywhere decent, especially if the data still stops in 2008.

      Delete
  19. Ah! thank you, I couldn't resist continuing my own 'work' on it so am hoping it won't be in vain & I eventually get something to critique.

    ReplyDelete

Instead of commenting as "Anonymous", please comment using "Name/URL" and your name, initials or pseudonym or whatever. You can leave the "URL" box blank. This isn't mandatory. You can also sign in using your Google ID, Wordpress ID etc as indicated. NOTE: Some Wordpress users are having trouble signing in. If that's you, try signing in using Name/URL. Details here.

Click here to read the HotWhopper comment policy.