Saturday, June 7, 2014

Ridiculous Richard Tol sez 12,000 is a strange number...


Anthony Watts has put up another promo for Richard Tol (archived here, latest here). Richard is an economist who agrees there is an overwhelming consensus among the experts that global warming is real and caused by human activity. Over the last year or so, however, he's been on a crusade to try to argue that 97% isn't 97% or something.

I've already written how Richard's "arguments" range from the idiotic to the preposterous and have been well and truly demolished. For a more orderly, less snarky and highly readable account, see the paper by John Cook and co where they identified at least 24 major blunders in Richard's silliness.

This time, because there have been a number of articles in the UK Guardian about Richard, his errors in his economic papers and now his "verging on the lunatic" crusade against John Cook and SkepticalScience.com - the Guardian allowed him an article of his own.


Richard contradicts himself


Richard doesn't start off his article too well, contradicting himself right up top, writing:
I show that the 97% consensus claim does not stand up. 
At best, Nuccitelli, John Cook and colleagues may have accidentally stumbled on the right number.

It gets worse from that point onward. (If you're on the home page, click here to read on...)



Richard thinks 12,000 is a strange number


Richard's Guardian article is as silly as all the rest of his silliness on the subject. Richard, as you're aware, is an adviser to the UK science-denying lobby group, the GWPF. I don't know if he's acting under Nigel Lawson's instructions or if he's playing the jester all by himself. At one point he wrote:
Cook and co selected some 12,000 papers from the scientific literature to test whether these papers support the hypothesis that humans played a substantial role in the observed warming of the Earth. 12,000 is a strange number. The climate literature is much larger. The number of papers on the detection and attribution of climate change is much, much smaller.

Is Richard Tol that dense or is he deliberately deceitful?


Richard is being, deliberately I presume, deceitful here.  (It's hard to imagine he is that dense.)

Firstly he's wrongly implying that Cook and his colleagues went through and themselves included or otherwise the list of abstracts they selected. That isn't correct. There were around 12,000 papers returned from a search of Web of Science, using key terms to filter for papers relating to global warming. Richard is likely to be correct that climate literature is greater than that, though he provides no evidence in the Guardian article.

Secondly he's wrongly acting as if the Cook study was only focused on research about the detection and attribution of climate change. Richard is trying to fool you. Theirs wasn't a study of papers on "the detection and attribution of climate change". What it was, as described in their paper, was an analysis of the extent to which the climate change literature accepted the findings of those detection and attribution studies.

That is an important and fundamental distinction. Richard has been missing this fundamental distinction for months - since the paper first came out, in fact. It's hard to tell if he's really that dense or if he's just acting as a mouthpiece for his denialist masters, the GWPF. In the Guardian article, Richard writes: "Most of the papers they studied are not about climate change and its causes,". No they aren't. And it was never intended that they be so.

The research was looking at the extent to which the scientific literature on climate accepts the basic science - that it's human activity, such as human emissions of CO2, that is causing global warming. It wasn't looking at the basic research itself.  Rather, the extent to which the basic science has been accepted by experts in the field. (The equivalent would be looking at the extent to which the scientific literature accepts evolution, or gravity, or the expanding universe, as facts.) As described in the abstract, Cook13 (excerpt):
We analyze the evolution of the scientific consensus on anthropogenic global warming (AGW) in the peer-reviewed scientific literature, examining 11 944 climate abstracts from 1991–2011 matching the topics 'global climate change' or 'global warming'. We find that 66.4% of abstracts expressed no position on AGW, 32.6% endorsed AGW, 0.7% rejected AGW and 0.3% were uncertain about the cause of global warming. Among abstracts expressing a position on AGW, 97.1% endorsed the consensus position that humans are causing global warming. 

Richard and his "tired" theory - the "sleepy scientist syndrome"


I've written before about how Richard claims the abstracts got "tired", mistaking the researchers for market research subjects when if there's any parallel with market research or public polls, it's the abstracts that would have got "tired". This time he goes further. He writes:
There are patterns in the data that suggest that raters may have fallen asleep with their nose on the keyboard. 
Except that Richard doesn't have any "data" that could possibly show this. He's been playing around with the data, making all sorts of wrong assumptions, in an effort to "prove" not that the results were wrong, but that the method was wrong or that the researchers got sleepy and messed up. (Which would imply that the editors at ERL and the paper's reviewers were also sloppy if not sleepy.)

I think that what might have happened here is that Richard sorted the abstracts by year and within each year, alphabetically by title (or author). He wrongly assumed that the researchers went through the abstracts from oldest to most recent abstract.  There was an increasing consensus over time as described in the paper. This was discussed a bit last year on various blogs - and Ridiculous Richard became a laughing stock for this reason alone.  The wrongness of his false assumption was pointed out to him by numerous people but Richard couldn't let it go.  Addendum: My info on this is out-dated. Tom Curtis has clarified what happened in the comments below. - Sou.

Anyway, his claim of "falling asleep" points to utter nuttery.  Do researchers generally fall asleep when they are doing their research? If they do, does it mean that all scientific papers suffer from the "sleepy scientist syndrome"?

The method wasn't rocket science.  All that happened was that a team of researchers categorised abstracts into one of several categories. There's nothing magic about it. The data is all there. Most of the work has already been done so that had Richard wanted to, he could have by now categorised all the abstracts himself. He would only have had to rate fewer than 32 abstracts a day for the past year and he'd have finished the job. Something he could have done if he'd spent half an hour or so on the task before he went to bed each night. But no - lazy Richard, like many science deniers, finds it easier and less arduous to make completely unfounded and nonsensical statements.


Ridiculous Richard lurches from one flawed assumption....


Richard also goes for more silliness and flawed arithmetic. He wrote:
The data is also ridden with error. By Cook’s own calculations, 7% of the ratings are wrong. Spot checks suggest a much larger number of errors, up to one-third.
First of all, the "data" are the abstracts. What I think Richard is trying to argue is that the results are wrong. But he's relying on an erroneous assumption. Papers were rated by two people (at a minimum). Where the two people agreed there was no need for an umpire. Where they disagreed then either they considered it again or a third person looked at the abstract and made the final call. This, obviously, happened mostly with "line-ball" categorisations. For example, was the abstract implicitly accepting AGW or was it explicit. Or was it implicitly accepting AGW or was it neutral. On rare occasions it might be between implicitly accepting humans are responsible for most warming or humans have only a minimal impact on warming. Anyway, it's not John Cook who calculated that 7% of the ratings were "wrong". Indeed the validation by the authors of the papers demonstrated that the Cook13 assessment could hardly have been any closer to actuality - with Cook13 finding a 97.1% consensus and the authors reflecting a 97.2% consensus.

What Richard is basing this on is in his ridiculous paper, where he wrote:
Cook reports “disagreement” on “33% of endorsement ratings”. If errors are random, 18.5% of abstracts were incorrectly rated. That implies that 0.6% of abstracts were identically but incorrectly rated. About half of the discrepancies were solved by reconciliation; the rest was referred to a third rater. Assuming the same error rate in reconciliation and re-rating, 6.7% of ratings are wrong.

First of all, Richard is assuming that "disagreement" can be equated to "wrong". That's not so. It could just as well be that the abstract wasn't clear enough to determine which of (usually two) categories it should be placed in.

Secondly, Richard assumes that "errors are random". Yet there's no basis for that assumption either. It is much more likely that there is less clarity between certain categories than others. For example, whether the abstract indicates implicit acceptance of science or whether it's neutral and doesn't indicate one way or another the position on AGW. In other words, Richard has no grounds for assuming that 18.5% of abstracts were "incorrectly rated".

Thirdly, Richard assumes that abstracts that were resolved by reconciliation or a third rater had "the same error rate".  This is a stretch. If both raters agreed in the end, then there is less likely to be an "error". If a third person came in and, in the light of the two prior ratings made a call, then equally there is less likely to be an "error".

That means that Richard's assumed "implication" that 0.6% of all 12,000 or so papers were "incorrectly rated" is flawed. His calculations put an upper limit on the number of ratings that could be in "error" - not an absolute definitive number. Even then it's stretching to say they are in "error". It would be more correct to say the abstract may not make it clear to which category they belong.


...to a flawed analogy: Researchers are thermometers


Ridiculous Richard gets even more ridiculous, lurching from one failed analogy to another. He wrote at the Guardian:
At other times, Cook claims that the raters are not interviewees but interviewers.
The 97% consensus paper rests on yet another claim: the raters are incidental, it is the rated papers that matter. If you measure temperature, you make sure that your thermometers are all properly and consistently calibrated. Unfortunately, although he does have the data, Cook does not test whether the raters judge the same paper in the same way.

He's obviously doubling down on his "abstracts got tired" meme. What Richard's now arguing is that researchers are thermometers. In his analogy it's the abstracts that are the thermometers. The researchers would be the people who report the temperature displayed on the thermometers.


Scientific consensus is irrelevant - every bit of research has to go back to first principles

I'll just comment on one more bit of idiocy from Ridiculous Richard. He claims that:
Consensus is irrelevant in science. There are plenty of examples in history where everyone agreed and everyone was wrong. Cook’s consensus is also irrelevant in policy. They try to show that climate change is real and human-made. It is does not follow whether and by how much greenhouse gas emissions should be reduced.

Consensus is hardly irrelevant. If it were then knowledge could never be built upon or added to. Every bit of research would first have to go back and "prove" the knowledge on which it was built. Imagine if a virologist had to "prove" the existence and role of DNA and RNA before being able to publish a new paper on virology. Imagine if every astronomer had to "prove" that earth existed within a larger universe before writing up research on black holes.

As for policy not being based on the scientific wisdom of the day - imagine if policy makers said that about public health. How would Ridiculous Richard feel if cities went back to open sewers? How would he feel if governments removed any requirement for hand-washing from hospital accreditation? Perhaps he'll be arguing for the tobacco lobbyists next - although he's missed the boat there.


From the WUWT comments


Although most of this article is pointing out the ridiculousness of Ridiculous Richard at the Guardian, Anthony Watts pointed his readers to the article too. So let's see what they think about it all. There aren't too many comments yet.

Brute says:
June 6, 2014 at 10:28 pm
Tol’s dedication is impressive.

Perhaps I shouldn't pick on the simpletons, but I will. norah4you writes some gibberish and says:
June 6, 2014 at 11:27 pm
There never ever been a consensus of 97% scientists among scholars who knows and live up to Theories of Science.
But what’s worse for those who still believes that academic titles no matter in what subject or a high degree or a Professor’s title show proof when and if a concensus ever happens,
what’s worse for them is that they all show complete lack of knowledge of differences in using Fallacies in argumentation which they show above all, on one side and true facts leading up to valid arguments permitting a sound conclusions. One isn’t the other and vice versa……

Charles Nelson knows that Richard Tol is one of those nasty "alarmists" and says:
June 7, 2014 at 12:00 am
They’re starting to scratch each other’s eyes out now.
When a single discordant note is heard in that hallowed choir that is The Guardian, you sense that end times are nearing for CAGW!
Siberian Hussey and Rusty Bed-springs won’t like the dissent, they won’t like it one little bit! 

NikFromNYC is a known paranoid conspiracy nutter. This time he says:
June 7, 2014 at 12:10 am
There is a sociopathic consensus. 

Stephen Richards (and Sandi) doesn't realise that Cook13 was a review of the scientific literature, not a primarily poll of scientists (although there was a validation done by asking scientists to assess their own papers), and says:
June 7, 2014 at 1:08 am
Sandi says: June 7, 2014 at 12:38 am The real scientists are the ones who knew better than to reply to Cook’s survey.
They are the cowards that allow the scam to continue.


Cook, John, Dana Nuccitelli, Sarah A. Green, Mark Richardson, Bärbel Winkler, Rob Painting, Robert Way, Peter Jacobs, and Andrew Skuce. "Quantifying the consensus on anthropogenic global warming in the scientific literature." Environmental Research Letters 8, no. 2 (2013): 024024. doi:10.1088/1748-9326/8/2/024024

Cook, John, Dana Nuccitelli, Andy Skuce, Robert Way, Peter Jacobs, Rob Painting, Rob Honeycutt, Sarah A. Green, Stephan Lewandowsky and Alexander Coulter. "24 Critical Errors in Tol (2014)." Skeptical Science (2014)

50 comments:

  1. In his Guardian article Tol says that "Science is not a set of results. Science is a method. If the method is wrong, the results are worthless."

    It's difficult to state how wrong the first two sentences are. There is no point doing science if there are no results. In fact, results are the basic fuel for science. The method is the means by which results are achieved. For an intelligent man, Tol does make a good impression of an idiot. I'd go so far as to say that Tol knows what he is doing and is being deceitful. I think he is, like Bengtsson, a denier who hasn't come out of that particular closet yet.

    Oh, and he's a rather clumsy one judging by the number of mistakes he makes and poor assumptions he uses. Or is this typical of economics?

    ReplyDelete
    Replies
    1. I enjoyed your article about that very point, Catmando.

      http://ingeniouspursuits.blogspot.com.au/2014/05/richard-tol-says-something-incredible.html

      Delete
    2. That's actually one of the only things he got right (almost). Science *is* a methodology (or a set of them, depending on the nature of the subject studied), and it is *also* a set of results. It's both. His problem is he is not using a correct methodology, so his results are off. That's the problem with deniers in general. They look for the result they want, and half-ass the methodology (the hard work) it would take to get there. You have to have both.

      Delete
  2. It's become almost a compulsion for deniers to deny the consensus. Remember the interview with Maurice Newman on Lateline. They HAVE to deny it, and say that any studies that attempt to quantify it as 'flawed'. They have to. It's all part of their mentality. To justify the denial of basic physics, which if it was magnetism or optics or geology, would be reasonably ridiculed, but when it's climate science, it's somehow different. I think that it's just plain crazy, just as crazy it is to deny gravity, or magnetic forces, or the polarisation of light. Yet we still get these seemingly intelligent people doing just that. Why is it that Kirchoff's law of thermal radiation is somehow in it's own category? It's a physical law just like any other, but since it's consequence is global warming, it is denied. Just bizarre.

    ReplyDelete
    Replies
    1. I agree, Dave. What's even more bizarre is that these laws only cease to work when they apply to climate science. If Kirchoff's Law really stopped working, so would all their electrical and electronic appliances. Their fridges and air conditioners still function normally, despite the denialists' claims that AGW is not consistent with the Laws of Thermodynamics ...

      Delete
  3. I only wish Tol had explained why there is a magical nanny goat in his list of acknowledgements. How does the goat fit in? I hope it isn't anything to do with entrails.

    ReplyDelete
  4. I thought it was standard procedure not to read the last 500 words of any paper, because by then the writer would have been sleepy. I hope I haven't been missing anything important all of these years...

    ReplyDelete
  5. For some reason, the art of economics seems to attract numbers of alpha-male practitioners. A-type personalities who just know that they are invariably correct. Often, they thrive within their discipline due to a combination of smarts and aggression (think silver-back gorillas in gorilla tribes).

    However, it can get more difficult when they seek to play the same game outside their discipline. Outsiders don't defer in the same way as same-discipline colleagues, and the silverbacks find they are expected to work to the same standards as other practitioners, and can no longer rely on damping disagreement by letting their reputation precede them.

    Professor Tol would appear now to be in this position. Having had his initial 'pronouncement from on high' dissected, he has been forced to justify his initial response, but this subsequent piece of transparent flakiness has only led to further public embarrassment for him.

    Bunnies are by nature burrowing animals, but in this case, the old advice that 'when you find yourself in a hole, the first thing to do is to stop digging' may be relevant.

    ReplyDelete
    Replies
    1. This is very close to my own reading of Tol's behaviour and his incredibly weak response in the Guardian. In the self-referential world of Economics his orthodoxy shields him from questioning and he has come to feel that his opinion on anything, once arrived at, should be the end of the matter. In the wider world he's completely adrift and reduced to bluster.

      Delete
  6. Tol exists on the marginalia of Economics. I read numerous economic blogs daily and have for a decade - I've never heard his name mentioned outside of climate blogs. He found a niche working in climate impacts - but that's not macro or micro where the big boys play.

    A better analogy is that he was a big fish in a small isolated pond, but now the pond has a river running through it and has brought much bigger fish into play. He will never again be the big fish and he resents it.

    ReplyDelete
    Replies
    1. I've noticed that Tol tries to hide behind econometrics being an arcane subject on which only he and a select few others are qualified to comment. this is even when the points being brought up are about general statistical and mathematical subjects, not his own gremlin-haunted speciality.

      When examined, Tol's work makes a great deal of reference to other Tol works and the model he uses is essentially specified by, you've guessed it, Tol. When the model is shown to throw up potential divides-by-zero he shrugs that off with "We are aware of that and monitor for it", when in fact it's a very solid sign that there's something fundamentally wrong with the model. Infinities are Nature's way of telling you you've screwed up somewhere. Of course if you're Feynman you can find a way through them, but Tol is no Feynman. In fact he seems to be a bit of a dick; no wonder the Denialati used to like him so much.

      Delete
  7. Sou:

    "I think that what might have happened here is that Richard sorted the abstracts by year and within each year, alphabetically by title (or author). "

    The original release of the ratings record was ordered by year, and alphabetically within the year. Tol did not notice this, and performed his analysis showing "rater tiredness". When it was pointed out that raters received abstracts randomly selected from those not yet rated so that the order of the data he received contained information about rater tiredness, he amended his response to Cook et al to say:

    "The Web of Science presents papers in an order that is independent of the contents of the abstract: Papers are ordered first on the year of publication, and second on the date of entry into the database. Abstract were randomly reshuffled before being offered to the raters. The reported data are re-ordered by year first and title second.
    In the data provided, raters are not identified and time of rating is missing. I therefore cannot check for inconsistencies that may indicate fatigue. I nonetheless do so."

    Since then, however, he has received a file of the ratings in chronological order, and in theory that data can be assessed for consistency or rater performance as he purports to do. To do so, however, 1st ratings, 2nd ratings, and later ratings must be strictly separated from each other, and rated separately, something Tol did not do. Further, raters are likely to have had small differences in the ratios of assignment to particular rating values. Because they rated at different rates, and in blocks rather than evenly distributed in time over the full period in which they rated, those small differences may be sufficient to generate the patterns Tol purports to have found. Consequently Tol has analyzed the wrong data (again), and not allowed for a confounding factor rendering the analysis useless.

    ReplyDelete
    Replies
    1. Thanks for the clarification, Tom. I'll link to this comment in the main article.

      Delete
    2. It is this data that Cook has threatened to sue if anyone reveals to the outside world.

      Delete
    3. No, Shub. You're probably thinking of the letter UQ wrote to some script kiddie / hacker who stole some stuff recently. AFAIK, those files contained data about the raters themselves that is considered confidential and is irrelevant to the research itself.

      Tom is referring to the ratings data file available on SkepticalScience. (The website seems to be down at the moment or I'd link to it.)

      Delete
    4. Below is a link to the data file that Tom Curtis was talking about, with the description:

      First and second ratings by our team. Ratings are ordered sequentially. E.g., in order that original ratings were made (Article Id #, Original endorsement rating, Original category rating, Endorsement rating after consultation stage, Category rating after consultation stage)

      From this SkS web page:

      http://skepticalscience.com/tcp.php?t=home

      Delete
    5. No, Sou. Cook has threatened to sue if anyone analyses ratings data from individual volunteers and presents anonymised results. Rating streams from volunteers with timestamps are needed to test for fatigue, stereotypy and bias. The composite first and second rating is, ... composite. It contains ratings from different volunteers in sequence and is not suited to draw inferences about fatigue effects as individual raters' data would be.

      Delete
    6. What has he threatened to sue them for? Or, to save time, you could link us to where you got this from.

      It occurs to me that Tol's egregious errors in his recent work could be down to tiredness. Until we see his timesheets there's no way to know, of course. As for Lindzen and Spencer, at their advanced age I imagine they're easily tired, which would explain a lot.

      Delete
    7. Shub, the tiredness thing is a red herring. I've just spent an evening marking exam papers. It's just gone 10.30pm. I have recognised I'm tired and have given up for the night. You have to be pretty stupid to think that tiredness is going to show up in the data. In the past I've got up in the middle of the night to make use of the peace and quiet to mark and regularly mark at six in the morning at the weekends. And since Tol bases his argument on one out of context stolen comment by someone who has contradicted what Tol says, it is pretty dead as an argument.

      If you think it is worth pursuing, I'd forget it.

      Delete
    8. Shub said... "Rating streams from volunteers with timestamps are needed to test for fatigue, stereotypy and bias."

      Um... no it's not needed. Cook13 already tested for bias by getting researchers to self-rate their papers.

      And, as has been stated about a million times now, 1) there is no time stamp data, since it wasn't collected, 2) even if it had been collected, it would have been useless because of how the ratings were collected in groups of five, and 3) would have been useless because the raters were working at their own pace and on their own time, and 4) it's already been shown that "interviewers" exhibit increased efficiency over time.

      If you, or anyone else, wants to get a more exacting figure for the scientific consensus on climate change, all you have to do is perform your own study. That's how real science is done.

      Delete
    9. Not so much a red herring as a dead parrot. What the Denialati can't handle is that the data is there - all the data that's needed anyway. Cook et al were hardly so naive as to leave that expedient open. So now they take the usual next step of demanding data that cannot possibly be provided and declare a conspiracy.

      This how they deal with cognitive dissonance. It was the same with Lewandowsky. If they ever use data that hasn't been hacked it leads to such disasters as BEST and Tol's 91%. But the cherry on the cake is the way they object to people using their public statements as data for their own analyses.

      Delete
    10. I'm not aware of John Cook threatening to sue anyone. I've no idea what Shub is referring to if he's not interpreting the letter to the script kiddie from UQ. Shub - who did John Cook threaten and when? Who else hacked a private website and stole data from the research apart from the script kiddie who boasts about it?

      Also, the "abstracts got tired" is dumb dumb dumb. Richard not only got his facts screwed up and confused the raters with the ratees, he couldn't even analyse the data he was given. Research shows, interviewers (the closest parallel to the raters) get more proficient with time, not less. Perhaps Richard is projecting and he's blaming his mistakes in his economics papers on the fact he got "tired". Should we disregard all science on the grounds that researchers got tired? A lot of science is repetitive observation and interpretation.

      The research isn't measuring how tired the abstracts got, it's a classification of the content of the abstracts. The only sound way to check that is for other people to categorise the abstracts.

      Oh, wait - that's already been done by the authors of those very same abstracts. Hmmm..they didn't categorise the abstracts they categorised the content of the papers themselves. And guess what - 97% again!

      Why do deniers like squirrels so much? Why do they never do any research themselves?

      Did you know that fewer than 3% of scientific papers published about AGW are from deniers? With all the noise they make you'd think more of them would try to put their money where their mouth is. Too busy sounding off about squirrels I guess.

      Delete
    11. Shub is concerned that individual raters got tired. At what point did they get tired? If they rated 50 abstracts in a day were they tired for the last one abstract? The last two abstracts? The last ten abstracts? All 50 abstracts? Did tiredness mean that they miscategorised an abstract?

      Say each researcher was tired 10% of the time and of that 10% of the time they miscategorised 50% of the abstracts. Yes, that's pushing the envelope hugely, but stay with me. What are the chances that the second person rating those same abstracts independently was also tired rating those very same abstracts. Let's say on the balance of probabilities it was 10% of the time and 50% of those were miscategorised. Now given there are seven categories, that would mean that there could have been the same miscategorisation on 1/7 of that 50% of 10% or whatever.

      What's that come out at - around 0.036% of the abstracts could have been miscategorised through "tiredness". Well within the error margins given by Cook13. And that's with hugely inflated estimates!

      That's why I refer to denialism as utter nuttery!

      Delete
    12. And let's pause to not forget that shub has failed to provide evidence of his 'threatened to sue' - beyond the privacy issues - claim.

      I submit that you appear to be trying to establish a (convenient) 'truth' in the minds of the susceptible - possibly including yourself - via repetition rather than (inconvenient) evidence. Prove me wrong.

      Also, the obvious thing for Deniers in general to do is to re-do the study yourselves, and yet, no matter how noisy, none of you has done so. I put it to you that this is because you are only too well-aware of what the result would be. Inconvenient. Hence all the cries of 'I can see smoke! I can see smoke!'

      Again: prove me wrong.

      Delete
    13. I don't believe that John Cook has threatened to sue anyone. Shub hasn't provided any evidence nor indicated who John Cook could have threatened, when or where or by what means (private email, public comment). It would have been all over WUWT for starters - and it's not.

      Delete
    14. Good point, Sou; it would surely be smeared across WUWT like ice-cream on a baby's face. Apart from the whining and victimhood (always popular with the Denialati) it would be groundwork for the Mann v Steyn case's likely outcome.

      Delete
    15. Sou and others, there has been a letter from U Queensland that Brandon Shollenberger better not release the data he obtained illegally:
      http://davidappell.blogspot.dk/2014/05/the-university-of-queensland-letter.html
      (also on WUWT, but I don't want to link there)

      Delete
    16. Yes, Marco. That's what I think Shub may be referring to. As I've been saying, that isn't from John Cook. So either Shub knows something that no-one else knows or he's mistaken. There is a third possibility, of course.

      Delete
    17. I agree - the UQ letter is not, to any reasonable mind, what shub is describing (or should I say 'purveying'?) above.

      Actual evidence of this assertion, please; It is this data that Cook has threatened to sue if anyone reveals to the outside world. (Which rather implies Cook will be suing the data, we might add, but we all know what is being suggested.)

      Delete
    18. Nanny goat is spewing pellets. John Cook isn't suing anybody.

      Truly amazing how goats will eat anything and then "poof," out it come from the other end, all over the landscape. Phew.

      Delete
    19. Sorry, Sou, but here I think it is OK to state that John Cook will sue, when it is in reality his employer. If John wouldn't tell them, I don't see them do anything about the release of that data.

      So, that makes both of you correct in that John Cook will (not) sue.

      Delete
    20. Fellows,
      I am a layperson, not a lawyer. As far as I am concerned, the University of Queensland and John Cook are the same w.r.t this episode. UQ would not have gone after Shollenberger had it not been for Cook. Cook has the capability to call off UQ's legal threats. The data belongs to Cook. There is zero intellectual property in the data set apart from its value as scientific data. So, this line from UQ's lawyer:

      "any and all activities involving the use or disclosure in any manner of the IP"

      as part of a legal threat, is in effect a threat against analysis.

      For the record, I told Shollenberger that he Cook should give the go ahead for such data to be released, as it is his data. Shollenberger agrees as well, which is why he did not simply release it.

      Fatigue is a real concern in research of this kind. If you perform repetitive tasks of any kind, you would know. Go to Google Scholar and put in 'survey fatigue'.

      Delete
    21. We await your re-do with considerable interest then. But you won't, and absolutely everyone here, including yourself, knows why.

      As for the rest - and how did Schollenberger get hold of this material again? What's the actual issue here; is it really the one it's convenient for you to claim it is? How, precisely would you respond, or would you expect your institution to respond, in the circumstances?

      And did you read Tom's comment, or what?

      Rater fatigue; pfffft! The 97%, plus-or-minus not-much, consensus is rock solid, and if that wasn't important, and didn't gall you to the nth degree, you wouldn't be here.

      Delete
    22. "Say each researcher was tired 10% of the time and of that 10% of the time they miscategorised 50% of the abstracts. Yes, that's pushing the envelope hugely, but stay with me. What are the chances that the second person rating those same abstracts independently was also tired rating those very same abstracts."

      That's not how abstracts were rated. A single person's first ratings for say 50 abstracts would get second ratings from x different volunteers. Which is why you need rater ids to access abstracts rated by individual volunteers. Furthermore, if such data is made available, it would still be realized that the same set of n abstracts were rated by two different raters at discontinuous time periods.

      There are further issues. The abstracts retrieved contains no consensus information for most part. There would be no way of separating a fatigued volunteer dumping abstracts into '4' (the no position) versus their 'true' ratings being '4'.

      There would be no difference between two raters, rating an abstract/minute, with the first one falling asleep on his keyboard with his forehead on the '4' key for 70 minutes, then waking up and completing the remainder, and a diligent rater who stayed awake for 100 minutes to rate abstracts. Contrary to Tol, I actually believe Cook et al - most of these abstracts have no signal. They think it is ~70%, I calculate this close to 90%.

      The above means that rater ids and rating time stamps are needed to assess fatigue.

      Delete
    23. Shub, perhaps you should look for "survey fatigue" yourself. You might learn something. Most, if not all, of the available literature is not relevant to Cook et al. Tol also hilariously refers to literature that is completely irrelevant. Then again, he also manages to clearly misinterpret Andy Skuce's comment. It's what happens when you are out on a mission: everything is evidence that you are right, even if an objective analysis shows it is not.

      Delete
    24. First of all, it wasn't a survey. It was an analysis of abstracts. Shub is mixing up different things, you'll have noticed. First the ridiculous "sleepy" allegation. Second the separate allegation that people got worse at analysing abstracts as they did more. And finally that the researchers were dishonest.

      The researchers would have got more proficient over time. They worked at their own pace, not to a clock.

      As for this:
      A single person's first ratings for say 50 abstracts would get second ratings from x different volunteers

      That would all make it even less likely that dishonest ratings that Shub makes up out of thin air would have carried through the whole process.

      Shub isn't interested in checking whether there was 0.036% error or 0.0036% error - he's just making mischief. He's been around climate blogs long enough to know that almost all scientific papers on the topic support AGW, just like Cook13 and similar studies show.

      As for the implied allegation of dishonesty - well, we all know who are the dishonest people wanting to smear on no basis at all. People like Shub would be beneath contempt if they were worth that much. Since they aren't I won't even waste my contempt.

      No more fake made up allegations, Shub. Next step is the door.

      Delete
    25. What dishonesty are you talking about, Sou? None of my comments above imply dishonesty.

      [1] From the '24' document:

      "The raters performed the function of a survey interviewer in the process of rating abstracts".

      In surveys, the questions are the passive agents and the survey takers are subject to fatigue. In Cook's project, the abstracts are passive agents and the survey interviewers are subject to fatigue.

      The literature on survey fatigue is entirely relevant here.

      [2] "The researchers would have got more proficient over time."

      Available data contradicts this. If true, ratings would progressively get more consistent. What is actually seen is the opposite: the second ratings are less than 50% consistent with the first. In other words, the probability of distinctly identifying a consensus category is worse than a coin toss. Among disagreeing ratings, ~80% belong to exact opposite ratings, i.e., 80% of '3' is identified as '4' and 80% of '4' as '3'.

      The passage of time did not improve rating accuracy.

      Delete
    26. Marco, Shub isn't in the business of doing research. He's in the smear, uncertainty, doubt business - much less demanding and "no answer" is always the right answer in deniersville.

      Here's a link for you. They are all about respondents not interviewees. Here's a sample where I've replaced "respondents" with "scientific abstracts", which if there was any equivalence (there isn't) would be the equivalent. It's an article about how to make sure the researchers can make the survey interesting to the abstracts, so the abstracts don't suffer survey fatigue and either not respond or get silly responses from the abstracts or have the abstracts complain that they are too tired to take part.

      From surveygizmo:
      ...we see so many surveys fail to collect useful data simply because they’re not designed to keep the scientific abstracts interested.

      Delete
    27. Shub you've made foul allegations that the researchers were dishonest and just filled in any response they liked.

      You're wrong about the false equivalence you've made. The researchers weren't taking a survey, they were evaluating abstracts and categorising them. The equivalent of a market researcher analysing responses to open ended questions at best. You've not provided any evidence of anything. Just made up stuff.

      I don't know where you got your numbers from or why you think they are relevant. Even if they were (though I doubt it), categories 3 (implicit endorsement) and 4 (no position) could often be line ball calls. The majority of first and second ratings agreed (66%). When they didn't they were either reconciled between the researchers or went to a third independent person. You might be working off Richard's numbers, which are up the creek. From the Cook response:

      The assertion of rater drift is based on analysis of the average endorsement level, using ordinal labels from 1 to 7. However, average endorsement level is not an appropriate statistic for making inferences about consensus percentages. C14 replicates T14’s analysis using the more appropriate consensus percentage calculated for 50-, 100- and 500-abstract windows. We find no evidence of the claimed rater drift. Consensus among initial ratings in a window falls outside the 95% confidence interval 2.8%, 3.2% and 1.7% of the time for 50-, 100- and 500-abstract windows respectively.

      What you have failed to come up with is a single bit of evidence to show that 97% of scientific papers that attribute a cause to the current rapid warming is in error. Now why not focus on that?

      If you spent an hour a day doing it all by yourself you'd be able to finish the job in about six months. If you worked with a mate, you'd get it done in three months. Work with two mates each for an hour or so a day, you'd get them all categorised in less than two months. Then you could double it and do each others if you wanted to. Or you could rope in another couple of mates and get it done in no time at all. The researchers have prepared the abstracts for you and handed them to you on a platter, saving you a lot of work.

      Instead you pretend "something must be wrong" without the slightest shred of evidence - all you're good for is bluster and sleazy innuendo. What a lazy, good for nothing so-and-so you are. A credit to the denier illiterati - and that's about it.

      Delete
    28. Shub, survey *interviewers* are quite different from survey *takers*. This is a crucial difference, and also the main reason why the available literature on survey fatigue is largely irrelevant, as that focuses on survey *takers* getting tired of "yet another survey" or a "too long survey" which they have limited interest in. The survey interviewers, especially in tis particular case, do have a special interest in the survey. Moreover, in this particular case it was short surveys and time availability was completely up to the 'respondent', unlike with yet another known survey fatigue issue: having to respond to the interviewer's questions rather quickly. No walking away for a few minutes.

      And as noted by many, not only is there a potential "fatigue", there also is "experience" that increase with time. I remember my own time spent on reading scientific papers, easily spending 2-3 hours to understand a 16-page paper. Currently I can do this in 30 minutes if I really want to understand the details, 5 minutes to just do a simple BS check and get the main story.

      Delete
    29. Captain FlashheartJune 11, 2014 at 12:33 AM

      Shub, go look up the Cochrane Database of Systematic Reviews. People working on systematic reviews routinely rate 1000s of abstracts. Of course you and Tol wouldn't understand this, because you're both systematic review amateurs (well, Tol is; I doubt you even know what a review is). You're putting your stock in someone who can't even get their own systematic review of 12 papers right (maybe he got tired, poor dear). Why?

      Delete
    30. It's not much but this is worthwhile:

      http://ruptresearch.weebly.com/9-untitled.html

      It is a piece of research on examiner fatigue which is much more relevant because the assessment of the abstracts is somewhat like marking exam scripts. For Shub, the final two sentences say:
      "Finally, despite the fatigue experienced in this OSCE and regardless of the method in which the examiners scored student performance, all examiners were able to concentrate over time. These results suggest that an examiner can concentrate despite experiencing fatigue. "

      Not saying that's the end of the matter but it is another nail. Finding it difficult to find more room to bang in more nails.

      Delete
    31. "I don't know where you got your numbers from ..."

      Sou, the numbers come from Cook's data.

      If you consider disagreement ratings, volunteers rated 1869 abstracts as '3' the first time. The same volunteers rated 1510 of the same abstracts as '4' the second time, an inconsistency of 81%.

      Volunteers rated 1089 abstracts as '4' the first time. The same volunteers rated 904 of the same abstracts as '3' the second time, i.e., an inconsistency of 83%.

      The above is one-tailed. You get similar results the other way around.

      The data presented graphically: http://nigguraths.wordpress.com/?attachment_id=4219

      What are the odds of Cook's method identifying a '3', assuming their final as true positive? 0.41(1187/2910). Would you get a test for cancer if you were told the chances of ruling out cancer were worse than a coin toss?

      Since Cook's group cannot reliably identify '3', it cannot reliably come to the '97%' conclusion.

      Delete
    32. Given that only 59% of Cook's rankings are wrong, I can understand why it's so hard for shub to come up with an example of one miscategorized abstract.

      Delete
  8. Sou:
    "Secondly, Richard assumes that "errors are random"."

    The errors certainly aren't random. Allowing for that only reduces the initial "error" rate to 17.3%, and increases the final "error" rate to 7.28% from Tol's estimate of 6.67%. As you point out, reconciliations and adjudications are likely to have a lower error rate than initial (first and second) ratings so that that represents an estimate of the upper bound.

    The fun thing here is that "A Scientist" has shown a method which demonstrates Tol's analysis of the "actual" consensus rate to be a load of codswallop. Unfortunately A Scientist used the wrong error matrix (S), and the wrong error rate in their demonstration (they should have used 6.1% rather than 11.8%). Correcting for these errors, however, shows that Tol's method predicts that Cook et al showed approximately -230 abstracts rated at 5 after first and second ratings. That is a straightforward reductio of Tol's method, at least as applied with his error matrix. As you know, using the correct error matrix makes virtually no difference to the consensus rate.

    ReplyDelete
  9. Econometrics is just the feeble attempt to analyse 'variables' that are not governed by any real physical laws. These 'variables' can seem to show patterns of behaviour if one throws in enough assumptions and then suspends disbelief when predictions are worse than random. People who are used to looking in the rear view mirror as a sure guidance to where they are going are quite deluded. This explains Tol's utter Inability to see the errors he is making in judging work outside his field. To an economist all variables are random! That is why they have elaborate nonsensical statistical tests to prove how little they really know. They may as well be arguing about how many angels can dance on the end of a pin. Bert

    ReplyDelete
    Replies
    1. As it happens I class economics with theology in that they're entirely self-contained, not constrained by observable reality, and having influence only through the agency of human belief. Ditto string theory, but best not go there :)

      Delete
  10. I forgot to say that it is obvious when angels do a slow waltz they occupy less space compared to when doing a jitterbug! Or any other sort of modern sinful dancing Min. Bert

    ReplyDelete
  11. Tol is sinking along the same demented trajectory as Curry. Interesting phenomenon.

    ReplyDelete
  12. I'll mention this here as it seems to have escaped most except at DesMogBlog which produced a noteworthy assessment of Tol's position in Richard Tol Dons Cloak of Climate Denial, where a

    '...small gathering of climate science deniers, including Conservative MP and member of the UK House of Commons Select Committee on Climate change, oilman Peter Lilley, and Conservative MP for Monmouth, David Davies, met in a small room buried on the third floor of the UK’s House of Commons in London last week.'

    And one Piers Corbyn was up to stupid stunts:

    'Tol’s talk was weirdly interrupted by whooshing sounds from the back of the room where Piers “sunspots” Corbyn was inflating a huge plastic globe.'

    This shower would surely belong in a circus were it not for the Danse Macabre that will be enacted because of their dragging effect on the taking affirmative action that will help mitigation.

    ReplyDelete

Instead of commenting as "Anonymous", please comment using "Name/URL" and your name, initials or pseudonym or whatever. You can leave the "URL" box blank. This isn't mandatory. You can also sign in using your Google ID, Wordpress ID etc as indicated. NOTE: Some Wordpress users are having trouble signing in. If that's you, try signing in using Name/URL. Details here.

Click here to read the HotWhopper comment policy.