Tuesday, June 18, 2013

How an economist seeks fame and riches...


Update: Apparently Richard Tol can't even categorise the abstracts to his own papers correctly, so he has a bit of cheek jumping up and down trying to find fault with Cook13.



This one is funny peculiar.  Anthony Watts of WUWT was so irate that yet another study showing the 97% consensus among scientists who work in the area that humans are causing global warming, that he told big fat lies about the study.





See here and here and here and here for previous studies that found there is an overwhelming scientific consensus on the human causes of global warming.

Now Anthony has reported that an economist, Richard Tol, who happens to agree that humans cause global warming and doesn't appear to dispute the 97% consensus, has had a comment on the Cook paper rejected.


How (not) to become rich and famous


Tol tweeted that he wanted to become "rich and famous" by courting deniers at WUWT (Curry-style) . Tol figured he'd write a formal comment to the journal  that published the Cook et al study, Environmental Research Letters.


Maybe they got tired  ....


One of Tol's 'arguments' against the Cook et al paper was his speculation that the researchers surely got tired assessing so many abstracts.  I'm not kidding.  This is from the rejection letter as published on WUWT:
The author offers much speculation (e.g. about raters perhaps getting tired) which has no place in the scientific literature
Tol didn't make any rational argument that the method was unsound (which might have warranted a comment) or that he had come up with a different number using the same or different method (which might have warranted a comment or maybe a paper).  No - he argued that the authors might have got a bit sleepy.

Oh my!  What can I say.  Perhaps he's projecting his experience onto others?  Might be a new argument against all the hockey sticks that keep popping up in the literature - all the climate scientists are tired :)


It's a conspiracy!


As for Anthony Watts, he of Kenji fame decides it must be a conspiracy of one, writing:
Also, it appears the opinion of ONE board member is all it takes, so much for consensus.

Anthony doesn't know much about comments on scientific papers.  He says he thinks Tol's paper might have got rejected because Dr Gleick is on the ERL Board, because Dr Gleick helped expose Heartland Institute's dirty linen.  I wasn't aware of a relationship between Richard Tol and the Heartland Institute - maybe by way of the GWPF?   (Richard Tol is a member of the Academic Advisory Council of the Global Warming Policy Foundation - along with climate science deniers like David Whitehouse and Ian Plimer).  Anyway, Anthony Watts implies there is a connection and he should know I suppose.


Dogwhistling the dwindling, raggedy, dispirited troop of deniers


While Anthony Watts conspiracy theorises, Richard Tol takes a guess at which Editorial Board member wrote the rejecting report.  That's enough for Anthony Watts, who posts the credentials of the Editor In Chief (which are very impressive) and blows his dog whistle calling for WUWT readers to spam that Board member, posting a link to the editor's email address "for those that wish to query him" (most WUWT readers don't know how to use a search engine).


Unabashed and uncaring...


Unabashed and uncaring of his professional reputation, Richard Tol has published the rejection letter and his rejected comment on his blog for all the world to see.  He really must want that "fame and riches" very badly.  Seeking a career change perhaps?  Maybe Richard Tol is tired of being a lead author of the IPCC AR5 report.

Time to take a nap.




Wake up to the 97% consensus


Okay, I'm awake again and have read a couple of the comments below, which brought to mind a tweet from a wise man who wrote that Anthony Watts at WUWT just "doesn't get it":

Science isn't strong because of the consensus; 
the consensus is strong because of science.





When is it time to stop digging the hole you've dug yourself into?




Perhaps when new-found "friends" say it's time?

One side show is the three way fight among the denialati: poptech vs Shollenberger vs  Tol, sort of.


Bad Hair DayEli points out that this silly episode was just one of three losses the deniosaurs had recently!

103 comments:

  1. Really? He published the rejection letter? If not straight-out wrong, that's definitely inappropriate.

    Also, I love Watts's passive-aggressive "so much for consensus." Does he not understand what "consensus" means?

    ReplyDelete
  2. The sad thing is that Tol and Co. won't even consider the possibility that the comment was rejected because it was crap, even though a whole bunch of people said exactly that when Tol submitted it.

    ReplyDelete
    Replies
    1. I beg to differ, Dana. IMO Tol and Co know very well that his comment was crap. Numerous people pointed out to him the weakness and errors in his logic (and his arithmetical errors) and it was obvious that he didn't have a leg to stand on.

      Richard Tol expressed his motivations and they were nothing to do with uncovering mythical weaknesses. His plan was very poorly executed. I wonder if it will backfire on him in the long run or sooner?

      Delete
  3. Ah yes. It's all one big conspiracy. By 97% of the world's climate scientists. Wait. Wasn't the conspiracy paper in a different journal?

    Maybe there another explanation: Maybe the original paper was, um, good, and the comment on it, not good.

    ReplyDelete
  4. Interviewee fatigue is a common problem in survey research. If an interview is too long, the interviewee loses interest and starts giving random answers.

    In the Cook survey, 12 volunteers performed on average 1922 tasks. That should wear anyone out.

    There are standard statistical tests for fatigue. I applied them to the Cook data, and the null hypothesis of no-fatigue was rejected at the 5% level.

    ReplyDelete
    Replies
    1. ROTFL - You are a funny one Richard. "That should wear anyone out"! You don't give up easily do you.

      These researchers were not conducting interviews, were they. Not "back-to-back" interviews, not even one interview a day. In fact they didn't do interviews at all.

      Using your misplaced "logic", practically all of science is "wrong" because you seem to think researchers get tired so easily. Just imagine how "wrong" these researchers must be. Not only would they be excessively "fatigued" with all their tests and analysis of thousands of records, but they'd be cold as well.

      http://www.sciencemag.org/content/339/6123/1060.abstract

      Delete
    2. Richard, you could have saved yourself a lot of grief if you had investigated the standard methods used for meta-analysis and systematic reviews. You would then have seen that the paper is on solid ground.

      Can you explain why you have published the rejection letter? In my experience this is extremely unusual, unprofessional and impolite behavior.

      Delete
    3. @Sou
      Question 1. Answer on a scale of 1 to 7.
      Question 2. Answer on a scale of 1 to 7.
      Question 3. Answer on a scale of 1 to 7.
      ...
      Question 1922. Answer on a scale of 1 to 7.

      I don't really see the difference with a survey.

      You could, of course, reinterpret the ratings as a repetitive task in the work place. Fatigue is a big concern there to.

      @Anon
      I am well-aware of meta-analysis, indeed have published both applications and new methods.

      Cook is of course not a meta-analysis. It falls far short of the standards of systematic review. For example, they report results for a single query to a single database.

      The ERL letter is anonymous, so there are no issues with privacy. Science is better in the light of day.

      Delete
    4. Oh, I reread your comment, Richard, and realise you seem to think the researchers were *being* interviewed, not *doing* interviews.

      That's wrong, too. They weren't being interviewed let alone being interviewed by a "too long" interviewer.

      (I wonder how many tasks the average person performs in a year and if they take a nap some time in between, like 365 times.)

      Edit: Looks as if Richard didn't even read the paper in its entirety before making his comment. There were 21 people listed as contributing to the work (authors plus acknowledged). So in total the search yielded 12,465 papers divided by 21 people = 593.5 on average.

      In regard to the papers that were categorised as expressing an opinion and that needed greater attention to detail, there were just under 4,000 which would be fewer than 200 papers per person on average.

      Luckily Richard wasn't involved in rating the abstracts, he might not have been too good at bothering with the details :(

      Delete
    5. @Sou
      There were 24 raters. 12 dropped out after performing an average of 50 ratings.

      The other 12 raters did the rest.

      All papers were rated twice. 16% were rated three times, and another 1000 were rated four times.

      Delete
    6. Richard, you have co-authored two meta-analyses, neither of which goes so far as to even specify the database it uses or the search terms by which the query was constructed, and neither of which is a systematic review. Are you sure you are "well-aware" of the standards of systematic review? Your recently rejected comment to ERL certainly suggests you are not, though if you are you could take ERL up on their suggestion, and do your own systematic review to a higher standard than Cook et al did. Do you think you would get a different result?

      The issue with publishing the ERL letter is not anonymity, but the fact that you published private correspondence containing critical opinions, without seeking the consent of the other party to the correspondence. This is generally looked down upon in academia. You did know that, right?

      Delete
    7. @Anon
      Do search a bit harder. You might find more meta-analyses.

      There is no such thing as private correspondence between two employees of public universities in democratic countries.

      Delete
    8. I'm sure they're as good as the two I found on your homepage. But do tell me, if the normal standards of meta analysis are to use multiple databases, why did you publish two (listed on your own homepage) which don't even go so far as to identify the database from which the reviewed journals were obtained?

      There may be "no such thing as private correspondence between two employees of public universities in democratic countries," but you were in correspondence with someone in their capacity as a journal editor, writing to you on behalf of the editorial board of a private organization. And there is also such a thing as "good manners" and "professional behavior." Have you heard of those things?

      Delete
    9. @Anon
      I suspect you picked the two with meta-analysis in the title. One used multiple databases, including an earlier meta-analysis, EVRI, Scopus, WoS, EconLit. The other followed a similar strategy.

      Delete
    10. "There is no such thing as private correspondence between two employees of public universities in democratic countries." - You wish! Democratic Republic of North Korea is your ideal, Tol?

      Delete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. Keep it nice please :) Snark is good but take note of the comment policy.

      Delete
  6. Ha ha ha - now *seven* questions is "too long"? Lol - I'd hate to think what Richard might make of twenty questions!

    Maybe Richard Tol thinks the research team did all this work in a day. That would be wrong. And how could he have missed the bit about each rating being checked at least once and often more than once by different people. And how did he miss the the fact that comparison with the ratings by the scientists who authored the papers yielded a similar result?

    I would avoid asking Richard Tol to do research if you want a thorough and meticulous job done.

    ReplyDelete
    Replies
    1. The above was written before Richard's later comment appeared. What he's now saying is that not only did the researchers get it wrong in the first check because they got 'tired'. But the people who double checked also got it wrong, presumably because they were 'tired' too. And they must have made exactly the same "mistakes", even though Richard didn't do his own analysis to support his hypothesis.

      All this double-think is making me tired, fatigued, exhausted :)

      Delete
    2. @Sou
      You can compute the error rate.

      In the original ratings, 19% were in error. After reconciliation and re-rating, 4% or more were in error.

      There are three duplicate records. 33% error.

      The hypothesis that paper ratings and abstract ratings are the same is rejected at the 0.1% confidence level.

      Delete
    3. Richard, I don't really want to start this whole discussion again, but anyway I'll have one more go. Indeed, the hypothesis that paper ratings and abstract ratings are the same is rejected at the 0.1% level. However, that was obvious from Cook et al. There was no claim that they were the same and, indeed, no claim that they should be the same. There are perfectly reasonable arguments as to why they may not be expected to be the same (does every paper with a position on AGW make that position clear in the abstract?). It's certainly clear that you can't use an abstract to determine the position of an individual paper with respect to AGW. However, it appears that one can use abstracts to determine the consensus within the literature with respect to AGW - which was the goal of the Cook et al. survey.

      In a sense the last part of your comment is an illustration of why I don't think your paper was very good. It's a statement that is correct. But there's no analysis of the implications of the statement. Simply doing some kind of statistical test and getting a result doesn't necessarily tell you if the test is appropriate or not, or if the result indicates anything significant.

      Delete
    4. Yeah, I don't know if Richard has figured out yet that the study was categorising abstracts, not papers. I don't get his stats either. 19%? 4%? Three duplicate abstracts out of 12,000 or so equals 33% "error"? How does that work? Especially when Richard hasn't classified the papers himself, so he's not claiming "I'm right and those scientists are wrong about their own papers!" (Those are rhetorical questions and I'm not looking for an answer. I expect Richard can come up with an answer that is every bit as stunning as his "tired" hypothesis :))

      Yet AFAIK he agrees there is a consensus.

      Maybe he will get some fleeting "fame" among the hard-core denialiti. I'm not so sure about the riches, but who knows what this hissy fit of his will lead to. In any case, it might be time to...

      Delete
    5. Any of you have considered the independent qualities of RT: his ongoing energy and persistence, his reputation, his intelligence, his technical perfection, his public responsibility and search for openness in the public field and his anti-group think? Within science there should be a certain prize for this kind of noble behavior for the sake of science. Particularly as it seems to evoke attacks on the personal level from other "scientists" (sometimes even anonymous) who seem to look for and find support from the mass by trying to undermine the professional scientist on a personal level. Not even able to understand the irony of his remark about “rich & famous”. You go pro's!

      Delete
    6. Anonymous, in fairness some of what has been directed at Richard has been a little personal. I've tried to avoid doing so, and think I've been fairly balanced (others may disagree). Admittedly Richard has attempted to publish a paper in which he has accused the authors of another published work of being "secretive" and "incompetent". He's also accused me - and others - of being "apologists for pseudo-science" and suggested that comments I've made make me (in his view) like Phil Jones or Mike Mann - neither of which comparisons were intended to be complementary, I suspect (I wasn't offended in this case though).

      As much as I would prefer these discussions to remain polite and professional (and some of what has been directed at Richard has been less than pleasant) I don't think Richard's behaviour has been beyond reproach either.

      Delete
    7. Hmm, yes, anonymous, my educated guess is that most people reading about the saga have considered all of the above.

      Richard Tol does indeed rate very highly on persistence as evidenced here. His score on the other attributes leaves much room for improvement as far as this little episode goes as evidenced here and elsewhere.

      AFAIK he has in the past done some quality research and undoubtedly some that may be somewhat lacking. However, he is a lead author of the IPCC WGII so that's an indication of regard by the IPCC Bureaux for his professional expertise.

      It's when he steps outside his professional area of competence (as in this instance) and/or lets his emotions get the better of him (again as in this instance) that he seems to run into most trouble.

      Critiques by real scientists focus on the poor quality work in this case rather than the personal. Criticism by bloggers such as myself raise questions such as motive and other factors, which were first raised by Dr Tol himself in his numerous tweets.

      As for irony you might want to check your irony meter. I believe you'll find most readers are well aware of the sarcasm intended by Dr Tol when he tweeted he was seeking "fame and riches". No one would believe that getting accolades from a denier blog like WUWT would lead directly to either "fame" or "riches" (although it might attract a stipend from one or other additional denier lobby groups, or at least win him some brownie points from the GWPF).

      Delete
    8. @Wotts
      You've been polite.

      I did not compare you to Mann. I did compare you to Jones, but only after you argued, like Jones, that it is okay to hide data from people who intend to find errors in that data.

      Hidden data have no place in science.

      Delete
    9. Richard, thanks. I was going to say that you mis-represented what I said, but maybe you didn't. I think it is okay to not give your data to someone who "intends to find errors" because if they "intend to find errors" then that suggests that they'll find them whether they're there or not. My comment was also slightly more nuanced than simply that, but we don't really need to go through all that again.

      Delete
    10. @Wotts
      That's why data and algorithms should be out in the open.

      I suspect something is wrong with the Cook data. In fact, there are strong indications that that is the case.

      If they make the data available, I'll run tests and see. I may be proven wrong.

      I could of course be proven wrong and try to hide that.

      There'd be nowhere to hide though, because anyone could replicate what I did and reveal my deceit.

      I thus intend to find something wrong but publish whatever I find.

      Delete
    11. Ah yes, Richard is referring to here. That's what I'd call misdirecting and, arguably, misrepresenting both Prof Jones and Wotts Up With That Blog. In any case, I'm with Prof Jones and Wotts Up With That Blog. A good example of why scientists prefer a critic to do the research themselves rather than snipe from the sidelines ignorantly and with obvious malice.

      Insisting researchers hop to immediately and hand over not just data (most of which is available in the case of Cook et al, or at least described sufficiently that anyone can get it) but also all their "code" is a very predictable pattern that climate scientists have learnt to be wary of. Open that door and you find you have only Sophie's Choice.

      BTW "hidden data" has a very big place in many fields of science (think commercial science eg biotech and normal competition between researchers to be the "first" to make a new discovery). However "hidden data" is not a feature of climate science, which is arguably the most transparent and publicly accessible sciences of all.

      Frankly your comment comes across as malice against one of the most polite climate bloggers in cyberspace and should be beneath you, Richard.

      Delete
    12. Richard, are you joining McIntyre's clan in accusing anyone whose results you don't like of fraud? Is that what you mean when you say "I suspect something is wrong with the Cook data"? As soon as you start throwing those kinds of accusations around you lose all credibility.

      Also, sharing data is not the essence of science. In many cases it is impossible. It's only when you marginalize yourself in a small clique of conspiracy theorists that you start to suspect fraud in every result.

      And once again: if you think there is something wrong with it, collect the data yourself using the methods you prefer and show it yourself. If you aren't willing to do that, don't expect other people to share their data with you. Anyone can gather the data that Cook et al used, it's not proprietary. Do it yourself and do it better, or forget it.

      Delete
    13. @Anon
      I first suspected something was wrong with the Cook data when the authors heaped abuse on me for daring to question their data quality and sampling strategy.

      I then investigated. The sample is not representative. There are errors in the data. There are systematic biases in the data. The data are externally invalid.

      All that can be shown with the published data (which is about 13% of the total data).

      The hidden 87% of the data could be used to test hypotheses on the causes of these problems.

      Such tests would take away all concerns about foul play.

      Delete
    14. Richard, quick question. In your test for skewness, were you simply testing the distribution with respect to the ratings? Also what did you assume for the mean (or what was the mean)?

      Delete
    15. Oh yes, I remember, Richard.

      You suspected things were wrong because of something poptech wrote - ROTFL. And you were miffed that "only" ten of your papers were included - ROTFL 2. And that your papers were rated purely on the anonymised abstracts rather than reading your mind - ROTFL; and then you argued "consensus is not relevant to science" -ROTFL; and before you'd even understood how the study was designed and implemented you were already making allegations of it being "a silly idea" and "coming apart".

      At least be honest. You were trying to claim "something was wrong" way before anyone called you on your wild and uninformed allegations.

      Delete
    16. @wott
      Skewkess is as defined by Matlab: Third central moment, standardized.

      Delete
    17. Richard, you're doing the same thing as you did when I asked about the chi-squared test. I know what skewness is. What I'm asking you to do is explain what distribution you were testing. With all due respect, your paper isn't very clear in this regard.

      Delete
    18. "I first suspected something was wrong with the Cook data when the authors heaped abuse on me for daring to question their data quality and sampling strategy.

      I then investigated."

      Well, there's ya problem, mate. In science you first investigate before you draw conclusions.

      Marco

      Delete
    19. @Wott
      I don't test for a distribution. I don't assume a distribution either.

      Delete
    20. Richard, maybe I'm think of a different type of test then. As far as I'm aware

      "in probability theory and statistics, skewness is a measure of the extent to which a probability distribution of a real-valued random variable "leans" to one side of the mean."

      I didn't say you were testing for a distribution, I asked what distribution you were testing. If you weren't testing a distribution, what were you testing?

      Delete
    21. I am testing for excess skewness in parts of the sample.

      The test reveals two things.
      1. In the first 20% of the sample, there is statistically significant drift towards greater endorsement.

      2. In the remaining 80% of the sample, endorsements are clustered in a way that cannot be explained by chance.

      1 could be explained by the fact that no pre-tests were done. The raters discussed the interpretation of the rating system while rating.

      I cannot explain 2 without the raw data.

      Delete
    22. Okay, I think I see it now (you really didn't make this particular clear in the paper, but maybe that's just me). So you did a skewness test on a sample of 50, 100 or 500 and then rolled across the whole sample hence comparing how the skewness varied as you moved along the sample.

      However, if you've tested the sample presented in the Excel spreadsheet then it is listed in chronological order and hence the relevance of this skewness is not immediately obvious?

      In fact, your result appears entirely consistent with Figure 1 in Cook et al. which shows the endorsement fraction decreasing from over 40% in 1991 to an approximately constant value - of about 30% - from 2000 onwards. If I look at the Excel spreadsheet the first 20% of abstracts were published between 1991 and 2000. So, haven't you just "quantified" something that was presented openly in Cook et al.?

      Delete
    23. Wotts, my (educated) guess is that the consensus has grown over time, which is not at all surprising and can be inferred from the charts provided in the Cook13 paper.

      Richard appears to be jumping to his usual illogical conclusions (no surprises there) but then I don't know what he means by "the first 20%" of the sample. Not that I'm particularly interested in finding out, unless he publishes a paper on the subject.

      On other blogs Richard has been way off the mark and I expect he is still very wide of the mark.

      Delete
    24. Sou, your interpretation of the variation in skewness could well explain the result.

      What I have (and what I think Richard is working from) is an Excel spreadsheet provided by the Cook et al. team that lists all the papers, with their rating, in order of publication year (or date I guess). If this is what Richard has tested then all he has found (I believe - unless he can correct me) is that the skewness changed with time and that this is consistent with Figure 1 in Cook et al. The first 20% in the spreadsheet were published between 1991 and 2000.

      That's fine, but one didn't need to do a skewness test to know that (just look at the original paper). Also, given that the abstracts were distributed randomly to the raters (I believe) this tells you nothing about the consistency of the rating unless, for some reason, the random distribution somehow resulted in a particular set of raters preferentially rating papers published in the 1990s. Given that the paper says they were distributed randomly, this would seem unlikely.

      Delete
    25. @Wott
      You have the data. You have an hypothesis. Now go and test it.

      Delete
    26. Indeed, and in a sense I have. We're going slightly in circles here so I'll make what might be one last comment.

      I've now looked at two of your tests. Your chi-squared test and your skewness test. Your chi-squared test indicated that the author rated sample is not consistent with the volunteer rated. Indeed this is true, but is also obvious from the results presented in Cook et al. I've now had a better look at your skewness test. This indicates that the skewness of the distribution changes with time (endorsement fraction decreases with time). Again, this is presented in Cook et al.

      What I'm getting at is that so far your statistical tests are simply presenting results that are essentially presented openly in Cook et al. Of course, you're welcome to interpret these as indicators of a problem, but Cook et al. have addressed these and either interpreted them differently or concluded that the issue isn't significant.

      The point I'm making is that your paper appeared to suggest that these tests were indicating some as yet unknown issue, when in fact they were simply quantifying what was already presented - and interpreted - in Cook et al.

      Delete
    27. @Wott
      You explain clustering of skewness by a trend in the mean. How does that work?

      Delete
    28. No, I don't think so. If one considers the distribution of the ratings then if the fraction of those that endorse AGW changes with time then one would expect the skewness to change with time. That's all I was saying. Maybe I'm wrong about this, but I wasn't suggesting a trend in the mean.

      Maybe I can ask you the reverse question. If you consider Figure 1 in Cook et al. which shows the fraction of endorsers decreasing with time and the fraction of no position increasing with time, wouldn't your skewness test produce essentially the result you get?

      Delete
    29. @Wott
      A trend in the mean could imply a trend in skewness, but does not have to.

      However, after abstract 2000, there is no trend in skewness. There is clustering.

      Chance cannot explain clustering. A trend cannot explain clustering. Fatigue cannot explain clustering (of skewness).

      Delete
    30. @Richard

      Okay, I've redone your skew calculation using the data in the Excel spreadsheet. I do indeed get the same figures as you present in your paper. My general interpretation doesn't change though. The general change in the skew is roughly consistent with the Figure 1 of Cook et al.

      Okay, so I too see this "clustering" that you mention. You seem to interpret this as clustering in the abstracts that endorse AGW. If you've done this in Excel, here's something you can try. Run down the list of skew numbers until you find the point at which the skew suddenly changes (one of the jumps in your figures). Then look at the list of abstract ratings the are used to calculate the lower of the two skew values on either side of this jump. You'll discover that the last number in this list is 7 (reject with quantification).

      Therefore, the clustering you see in your figures is not clustering of abstracts that endorse AGW, it is simply a consequence of sudden jumps in the skew when the calculation suddenly encounters a 7 (or stops including a 7). I haven't actually counted the number of 7s, but from your figure S8, I would predict that there are about 8.

      To summarise, as far as I can see, the sudden changes in the skew are not clustering in abstracts that endorse AGW, it is simply a consequence of the very small number of papers that reject AGW with quantification. I would argue that this indicates that your skew test does not really indicate any particular problem with the ratings of the abstracts in Cook et al. Maybe you still disagree though.

      Delete
    31. Bravo. Richard's letter has been rejected. You have demonstrated at least one good reason why. Perhaps we can now clap a stop over this nonsense.

      After all, the damage is done. Nice one, Richard. No doubt they'll be delighted with you at the GWPF.

      Delete
    32. @Wott
      That's exactly what the bootstrap does. You shouldn't get more than 5% of the time out of the 95% confidence interval.

      Delete
    33. @Richard

      What do you mean? You haven't really responded to my comment. The clustering you see is a consequence of the very small number of quantified rejections, not because of clustering of abstracts that endorse AGW. That seems like a significant criticism of your interpretation of the skew, that you've just responded to by some statement about bootstrapping.

      Delete
    34. Richard is not getting any support from anywhere. Not here, not on Wotts' blog, not on WUWT, not at Lucia's. You'd think he'd have got the message by now. He is seeing patterns where none exist. From what I gather Richard has fixated on the notion that the abstracts were rated in date order, despite being told over and over and over again that he's wrong. Therefore any deductions he makes that depend on date order are meaningless.

      Richard started out from the very beginning wanting to slam the survey. Looking at your blog, Wotts, he was reminded of his early aggro tweets only a day or so ago but he still comes here and repeats the lie that he was goaded into this. He wasn't. Richard set out right from the beginning to try to find flaws. He hasn't found anything worth a cracker.

      Someone complained about getting "personal". Well Richard hasn't demonstrated anything but "personal" in his approach. The study demonstrates clearly that there is an overwhelming consensus in the literature that humans are warming the world.

      If Richard were seriously on a crusade to correct what's wrongly written about climate science, he would start by correcting all the disinformation put out by the GWPF and dissociating himself from them. Then he might move on to all the disinformation put about by Watts and crew and all the other deniers out there. After this ongoing and extended display I'm not convinced he has what it takes to do any meaningful analysis.

      Like they wrote at realclimate.org years ago:

      "However, there is clearly a latent and deeply felt wish in some sectors for the whole problem of global warming to be reduced to a statistical quirk or a mistake. This led to some truly death-defying leaping to conclusions...."

      http://blog.hotwhopper.com/2013/06/a-very-predictable-pattern.html

      Delete
    35. @BBD

      Maybe you're right. Time to put this behind us.

      Delete
    36. @Wott
      You explain clustering by thin tails. Why would the bootstrap be fragile to that? It is precisely designed to avoid such problems.

      Delete
    37. @Richard

      Well, maybe I misunderstand what you're seeing as clustering. However, why don't you do the test I've just done. Replace all the 7s with 6s. There's only about 10, so it really shouldn't affect the result if the clustering is due to clustering of papers that endorse AGW. Rerun your skew calculation and see if the clustering remains. I've done this and I think the clustering is far less evident, but - hey - what do I know :-)

      Delete
    38. @Wott
      Did you re-estimate the confidence interval too?

      Delete
    39. Sou nails it so well I am going to repeat what she just said:

      If Richard were seriously on a crusade to correct what's wrongly written about climate science, he would start by correcting all the disinformation put out by the GWPF and dissociating himself from them.

      End of.

      Delete
    40. @Richard

      No I did not, so maybe doing that will prove me wrong :-) I was simply trying to establish why there were very sudden changes in the skew. These appear to be primarily from the very small number of abstracts rated as 7.

      Delete
    41. @Wott
      The skew is jerked around by the 1s and the 7s.

      The question is not whether this happens, but whether it can be explain by chance.

      Delete
    42. Indeed, and I haven't check the effect of the 1s. You do, agree, however that the large spikes in S7 and S8 - and probably a lot of the structure in S9 - are quite strongly influenced by the 7s.

      Delete
  7. On the subject of tiredness, I have done online examination marking of hundreds of similar answers. I know when I feel tired and I know when to stop or take a break. I expect, though I can't be certain, that the raters of the abstracts took a similar line. They were volunteers after all. I was under contract with a completion date to meet.

    ReplyDelete
    Replies
    1. Fatigue is an issue. Marks are variable. Markers make errors. Tired markers make more errors. Therefore, fatigue implies that the standard deviation of the marks varies.

      Delete
    2. Fine. Now, please read and answer the second sentence by Anon.
      Then there is a third sentence.

      Delete
    3. @cRR
      When I mark, or second-mark an exam, I always do a consistency test.

      Unfortunately, Cook did not report consistency tests for their data.

      So, I ran a few consistency test for them. The null hypotheses were rejected.

      Delete
  8. Anony
    Repetitive tasks where resolution is clear-cut improve on repetition. Differentiation tasks where resolution is unclear deteriorate to where arbitrary differences make classifying easier on repetition.

    One of the authors classified 4,145 abstracts. The development of a mental shorthand is inevitable under such extreme circumstances. If the shorthand matches actual differences, it is fine. If the shorthand plies on non-existent differences, once you train you brain, it'll see them every time.

    Fatigue effects Tol calculates are likely best visualized on the 67% fraction of the ratings where ratings were non-discrepant. These ratings were retained in the final data and reflect the original rating exercise. In 33% of cases, ratings were discrepant and were changed further and as such lose meta information such as the effects of fatigue. The effects Tol computed could be higher.

    ReplyDelete
    Replies
    1. @shub
      Bang on.

      Small inconsistencies in the reported data are indicative of large inconsistencies in the hidden data.

      Delete
    2. There you go again, Richard, this time making allegations of incompetence with not the slightest shred of evidence.

      You tried "tweeting" and that got you some attention from the usual idiots, which you enjoyed. So then you tried writing a foolish and factless "comment" to the journal and that didn't work.

      Don't think you can use my blog to make unfounded and unsupported allegations. Keep this up and I'll have to delete your posts for not complying with the comment policy.

      I suggest you pack up while you can and go back to WUWT. You can write what you like over there. Although even there (like at Lucia's place) I see that you're not getting unanimous support from climate science deniers, given your lack of rigour and too obviously malicious intent.

      The only people who are cheering you on at WUWT are the "I don't understand what you wrote but it's brilliant" brigade.

      (I must say the similarities to "born again" Curry are quite remarkable. All attention-seeking and not caring where it comes from.)

      Delete
    3. This comment has been removed by a blog administrator.

      Delete
  9. Sorry, anonymous, no quotes without links please.

    ReplyDelete
  10. @Marco, Richard can't even present the truth there. Before any author queried Richard on his allegations he first apparently tried to ridicule the study; then a couple of days later he tweeted that the 97% was "a load of nonsense", then it developed from there.

    ReplyDelete
  11. I am sure tired reading all these comments.

    ReplyDelete
    Replies
    1. Here's a summary for you.

      1. Richard Tol sez Cook13 is rubbish and a truism all at the same time. He gets very upset by the Cook13 paper and tweets a lot.

      2. Then he writes a comment paper that sez "I hate Cook13 and it's wrong coz they got tired". His paper gets rejected so he whines some more.

      3. Then he sez "I'll play with some numbers". He plays with some numbers looking at abstracts in date order and sez "something's funny". Someone asks "What can be funny? It's just a list of abstracts in order of publication date. It's not in the order of classification or the order in which they were processed." Richard writes a lot about what he thinks is "funny/odd".

      4. Nobody gets what he's on about and neither does he, but some people enjoy playing with him and his numbers.

      The end.

      Delete
    2. Alternative summary

      1. There are glaring errors in Cook13, conveniently ignored by many.

      2. There are subtle errors in Cook13 too, that require more than an undergrad understanding of statistics. Most of the discussion above is about that.

      3. Advanced statistical methods would be redundant if Cook13 had released their full data.

      Delete
    3. Like I say, for someone who lies about when you first got stuck into Cook13 on 16 May, and again here on the 18 May and tried to pass the buck (23 May), who can't even categorise his own abstracts properly, who hasn't identified even one "glaring error" let alone any "subtle errors" you sure have a lot of hide. You're also not finding too many friends even among hard core deniers, except maybe Benny Peiser and Lord Lawson.

      Time to slink off and lick your wounds IMO. But given your obsession, you'll quite possibly end up a sad excuse blathering on a blog somewhere like poor old McIntyre.

      As you can probably tell, I've had just about enough of your contemptible behaviour. If you had anything other than handwaving and insinuations you would have found it by now.

      The survey is not hard to understand. The data is all there for anyone who wants to check it.

      In case anyone can't tell what I think of his behaviour let me say this. I now view Richard Tol as a contemptible little man whose warped world view and political ideology will likely be the ruin of him academically. He might be better off going into politics where his kind of behaviour is if not tolerated at least not unexpected.

      Delete
    4. As for "full data" you've had access to the paper, the entire database of papers and supplementary info just like everyone else. The database lists the year of publication, the paper's title, the journal, the authors, the broad category of paper and the classification in terms of endorsement of AGW.

      If all that is not enough for you to show these 'glaring errors' then I don't know what is.

      If you are too lazy to design and conduct your own survey of the scientific literature you can use the one that Cook13 provides and has made public. They've made it even easier with their tool on skepticalscience.com where you can look up the abstracts themselves.

      I very much doubt you'll find any different result to that found by Cook et al and all the other similar surveys on the subject. The vast body of evidence supports the overwhelming consensus that humans are causing global warming.

      Time for you to move on and expose all the errors made by Anthony Watts, poptech, Morano, the GWPF and the disinformation constantly being spewed out by your newfound "buddies".

      Delete
  12. Richard Tol strikes me as emblematic of the denier movement at this juncture, in that he's playing defence of what he's said about something he claims not to dispute - the overwhelming consensus. His output, dripping with implicit malicious intent, becomes the comfort-blanket of the wider community.

    The self-centred defensiveness, with an ever-rising whine component, has been very obvious since at least the "misunbderstood" Heatland billboard campaign. There's no denier attack left that isn't ridiculous (see Watt's site for examples). So much ado about nothing, as pure distraction from unfolding reality.

    ReplyDelete
  13. There is much confusion about how rolling skewness can reveal clustering. So I ran an alternative test, measuring the distance between observations. I then estimated the expected distance and its 90% confidence interval.

    Rate Obs p05 Exp p95
    1 187 168 182 193
    2 13 13 13 14
    3 4 4 4 4
    4 1 1 1 2
    5 221 221 235 254
    6 796 796 1171 1706
    7 1327 1327 1603 1991

    Weak endorsement are closer together then they should be, as are all types of rejections.

    ReplyDelete
    Replies
    1. What "distance between observations" are you talking about? Distance from where to where?

      Delete
    2. The distance between the first and the second observation is 1. The distance between the first and the last observation is 11943.

      Most observations are 4 (neutral). So the average distance between successive 4s is small. In fact, it is 1.5.

      Few (only 9) observations are 7 (explicit quantified rejection). So the average distance between them is large. The observed distance is 1327 (which equals 11944/9 by coincidence). The expected distance is 1603.

      Delete
    3. You haven't answered my question. What distance? For example, if you have an ordered list and you are talking distance between observations going down the list? If so in what way have you ordered the list? By date? By author? By journal? By title? Is the distance from year to year? From one author's name to another? What?

      Sheesh, no wonder Richard can't get a comment published, he can't even answer a simple question. He was the same with Wotts.

      Delete
    4. I really should know when to give up, but I just can't resist. It's a weakness of mine, I know. For 1, 2, 3, 4 the observed distance - according to your calculation - is entirely consistent with what is expected. For 5, 6, 7 they're within the 5 - 95% range. So, there is a minor discrepancy for abstracts rated as reject, but not that statistically significant. These are, however, quite small number (very few fall in the 5, 6, 7 ratins) so more of an inconsistency is not that surprising. I must say, however, if I produced a table like that above where the observed distance for 3 of the categories exactly matched the 5% range, I would double check that something funny wasn't happening.

      What I don't know is how you got those numbers. Presumably the observed is an average of the distance between successive numbers? How did you get the estimate and the range? I know it's probably a stupid question for someone as statistically literate as you are, but humour me.

      Delete
    5. @Sou
      Sorry. Distance is distance between observations in the order of reporting -- that is, order on year first, title second

      @Wotts
      The rejections all hit the fifth percentile. What are the odds of that?

      The expectation and confidence interval are bootstrapped.

      Delete
    6. @Richard

      That was kind of my point. If I had a table like that, I might check that I hadn't made some silly mistake. Also you haven't really answered my question. How did you get the expected distance and the range?

      Delete
    7. Richard, what are you going on about, clustering? Are you trying to argue that the 9 observations of 7 arose in clusters? Clustered in what? Time? Reviewer? And how can you have a single observed distance between the 9 observations of 1327? Do you mean the average distance? Do you understand that this distance will be exponentially distributed, not normally distributed? Why are you presenting 5 and 95 percentiles for a group of 9 observations? How can you reject a null hypothesis of no clustering when there are 9 events out of 11944 trials? Do you understand anything about rare events?

      I guess you are also unaware that the skewness statistic is sensitive to outliers? Do you think it is appropriate to calculate skewness for a variable with only 7 possible values? Tell us, what is the skewness of a binary variable, or a variable with three levels? For a variable with 7 levels, what are the maximum and minimum skewness, and what meaning do you think we can assign to any particular skewness value for such a distribution?

      What's going on here is the classic Mcintyre Muddle: you are using statistical methods you don't really understand to produce false results that you can claim provide evidence of a problem in the data, when in fact there is no problem and you could just actually repeat the experiment yourself on the available data if you wanted. It's easier to make underhanded, evidence-free assertions of "something is wrong" than to actually do the analysis and find, oh, that the results are robust.

      It's a little late in the game to be doing the Mcintyre Muddle, though, Richard. The world has moved on from silly denialism, and the riches and fame you can get from it are diminishing fast. Better to back-pedal now ...

      Delete
    8. @Richard

      I'll be honest, I don't really understand what you mean. How do you get the expected distances and the ranges from bootstrapping. Did you actually run Monte Carlo sims?

      Delete
    9. The odds of that? I don't know. It could be quite high.

      What fields of research have you compared it to and with what criteria? In my experience, research tends to be a bit like fashion. Topics get favoured and there is a rash of publications then something else emerges as flavour of the decade or three year research cycle, (or five or seven year cycle depending on the source of funds). Even world events can influence what words researchers choose to use in their abstracts so their paper gets noticed.

      All you've done is found what you regard as idiosyncrasies in the gross timeline of research publications. And even that's a bit bamboozled by the fact that within each year papers are ordered alphabetically by title.

      Your numbers mean nothing at all. They say nothing about the extent to which the papers have been classified correctly. To do that you'd have to look at the abstracts themselves.

      But good luck with that, given you couldn't even categorise your own abstracts properly.

      It's time you stepped back from your mathturbations and if you must persist, give some thought to how you would determine the extent to which the abstracts demonstrate an endorsement of the fact that humans are warming the world.

      Delete
    10. @Sou
      3*5% = 0.01%

      @Wotts
      I bootstrapped. Therefore, I ran a Monte Carlo simulation.

      Delete
    11. Anonymous, Richard can't cluster or order by reviewer. The papers were randomised before they were reviewed.

      I've no idea what Richard thinks he's doing, but he's sure invested a lot of time doing it. That might explain why he can't let it go. Too much personally invested in chasing phantoms. He might see some value in ordering the papers the way he has but I fail to see his logic and he hasn't explained it to anyone as far as I know.

      If you want to see 'trends' over time the more interesting observation is the reduction in stating the obvious as time goes by. Unless you're doing an attribution study, what scientist these days would need to keep reaffirming that it's greenhouse gases that are causing Earth to heat up?

      Could be the opposite for deniers. They seem to be getting more shrill protesting that ooh it can't be greenhouse gases (ie all but one of the 7s is post 2000 n=only 9 in total though). Overall the proportion of deniers is heading downwards - might be a correlation with age :D

      Delete
    12. @Richard

      Okay, that would explain how you got those numbers.

      I must admit that I'm with anonymous above on this. Having discussed numerous aspects of your analysis I have to conclude, unfortunately, that you're rather misusing statistical tests that either you don't understand or are using too literally and aren't checking whether or not they're valid or appropriate.

      On my blog we went through your Chi-squared test. Firstly, there was actually a mistake. This in itself is fine. It happens. But ultimately all your Chi-squared test was showing was that the distribution of the author ratings was inconsistent with the distribution of the abstract ratings. No great surprise there. Obvious from what Cook et al. presented. If this is a problem you could have discussed why it was a problem without doing a Chi-squared result. In fact your claim of a problem appears to be simply that they're inconsistent (from the Chi-squared test) without any attempt to interpret whether or not this Chi-squared test is appropriate or whether or not the result had any significance with respect to the Cook et al. work.

      We've now gone through you skewness test. It's clear that the results are strongly influenced by the outliers (as I pointed out above and as claimed by Anonymous above). In fact, looking at all your figures (autocorrelation, standard deviation, skewness) they all seem to have odd jumps that I would guess are due to the outliers rather than indicators of any kind of clumping or other problem.

      You've now done a distance test and claimed that some are closer than they should be even though they are, strictly speaking, statistically consistent with what you might expect. And, the ones that are closer tend to be the ones for which the sample is very small. For example, there are 9 rated as 7, so you can't really be claiming that your distance calculation hints at clumping of the 7s as none of them are anywhere near each other.

      Also, all the endorsement and no position ones seem perfectly fine. In some sense, this is completely the opposite to what you were claiming from your skewness test. This is a majority of the sample, so do we really care if there is a small hint that the 5, 6, and 7 rated abstracts appear slightly closer than expected (but still within the 95% range).

      So, you've regularly implied that I should be willing to accept some issues with the Cook et al. paper because you've presented some analysis that suggests possible problems. What I've seen from you is a set of statistical calculations that appear to be completely inappropriate for testing the Cook et al. work. Furthermore, whenever we delve deeper into your calculations, we seem to find something wrong or some reason why your interpretation of the test result is flawed.

      I'm not one to suggest that you should back-pedal now or not. That's entirely up to you, but I know what I would do.

      Delete
    13. No, Richard. Your 0.01% is based on an assumption, probably that there should be an even distribution or something. There's no valid reason for making that assumption. Especially with the smaller categories as anon pointed out.

      I'd classify your results as codswallop of the "so what" nature. It says nothing about the study itself nor anything at all about the accuracy of the authors' categorisations.

      Delete
    14. @Wott
      I added the clustering test because people like you do not understand the skewness test.

      The clustering test shows the same as the skewness test: There is a pattern in the responses that cannot be explained by chance.

      Those who claim that the Cook data is fine, may want to put forward some evidence.

      My data and code are freely available to anyone who wants to find fault.

      Delete
    15. Richard, have you ever stopped to ask yourself why no one is taking your number crunching seriously?

      Hint: it's not because you've made wild unfounded allegations or that you've written a sloppy and foolish 'comment' or that you've not been behaving professionally or that your malicious intent is so visible or that you've jumped from one wrong assumption to another over the past few weeks. It's because your 'analysis' is crap.

      Delete
    16. @Richard

      Hopefully, this will be my last comment. I do understand the skewness test and have looked at your data and code. I commend you for making it available. You appear to have acknowledged that the skewness test is strongly influenced by the outliers (the 1s and 7s), although maybe you've reversed your position with regards to that.

      I think the order of what should happen with regards to the Cook et al. paper is different to what you suggest. It's a single piece of work and so, by itself, is not definitive. It, however, adds to other work that produce similar results. That's how the "scientific method" works. We shouldn't rely on a single piece of work, but on a body of work. It may even be that there is insufficient information now to make strong statements about consensus, but there is virtually nothing credible that suggests that a consensus does not exist.

      So, I don't claim that the Cook data is fine. I claim that it is a published piece of work that adds to existing work and gives more credence to the view that a consensus (with regards AGW) exists in the literature. Further work could still be done and there may well be issues with the Cook et al. data. Others may continue this research and further our understanding of the level of consensus (and what this means) in the literature and improve how such surveys should be undertaken

      In my view, it's those who are claiming that there are significant problems with the Cook data that need to make sure that their test are appropriate and suitable and that they aren't over-interpreting their results. I still maintain that none of your tests have convincingly shown there to be an issue with the Cook data. Maybe you think there is, but just because you say so, doesn't make it true.

      Delete
    17. @Wotts
      Let's leave it with that.

      Cook et al. do not have to make sure their tests are appropriate and suitable and that they aren't over-interpreting their results.

      Those who criticize Cook et al, do.

      Delete
    18. This probably provides insight into Tol's otherwise inexplicably unprofessional behaviour since the day Cook13 was released and his more recent efforts to bamboozle the unwary through his dodgy number crunching. It also sheds some light on the fact that he accepts science but bunkers down with deniers. Not enough to explain it though.

      Delete
    19. I have to take back the first comment. There was a coding error. Hat tip to Chris Hope.

      These are the actual results:
      Observed distance p05 Expected distance p95
      1 186.6 155.1 189.5 234.2
      2 13.0 12.3 13.0 13.6
      3 4.1 4.0 4.1 4.2
      4 1.5 1.5 1.5 1.5
      5 221.2 181.0 225.4 284.4
      6 796.2 542.9 854.7 1327.0
      7 1327.0 853.1 1517.9 2985.8

      Cook passes this clustering test without a bother.

      Delete
    20. @Richard

      I was not going to comment again but credit where credit is due. Impressed that you've clarified this. Personally I would have checked these earlier given the rather odd result that so many sat right at the 5% level - hold on, I've already said that.

      Delete
  14. I feel that I have to congratulate Richard Tol on several counts.

    He's documented for all to see the fact that even apparently clever people can make serious errors of analysis, and that such errors are quickly detected by other clever folk. I suspect that Tol's response to Cook et al will be used as a case study in more than one undergraduate course in statistics.

    Tol's notion of rater fatigue raises a parallel idea of ideology fatigue, where respondents become in short order refractory to further confrontations that refute their ideology, in spite of overwhelming evidence to the contrary. This contrasts with the scientific method where conclusions are based on the best evidence and analysis (else why would Tol even be participating in this curernt exercise?), and it underscores the fact that the denialism of human-caused climate change is based on a process that is at odds with best scientific practice and intellectual rigour.

    Tol's conduct in the current debacle informs greatly the interpretation of his previous commentary on global warming, and thus helps to categorise the credibility of his previous work.

    For these reasons and more I am thankful that Tol has put his cards on the table. A wise person once commented about keeping quiet, but I am pleased that Tol has acted to remove all doubt.


    Bernard J.

    ReplyDelete
    Replies
    1. @Bernard J.

      Elegant and astute.

      Delete
  15. . . . or as I like to say:

    The AGW consensus is NOT formed by scientists.
    The AGW consensus IS compelled by the evidence.
    ~ ~ ~ ~ ~ ~ ~

    One Directional Skepticism Equals Denial

    ReplyDelete

Instead of commenting as "Anonymous", please comment using "Name/URL" and your name, initials or pseudonym or whatever. You can leave the "URL" box blank. This isn't mandatory. You can also sign in using your Google ID, Wordpress ID etc as indicated. NOTE: Some Wordpress users are having trouble signing in. If that's you, try signing in using Name/URL. Details here.

Click here to read the HotWhopper comment policy.