Sunday, March 29, 2015

The fall and fall of Gish galloping Richard Tol's smear campaign

Sou | 6:42 PM Go to the first of 40 comments. Add a comment
A short while ago I wrote an article demolishing Richard Tol's latest demonisation of Cook13, the well known 97% consensus paper. (Update: there's still more to the saga - see here.)

"The consensus is of course in the high 90s" - Richard Tol

As you know, Richard agrees that of all the scientific papers that attribute a cause to global warming, the percentage that attribute it to human activity is "in the high 90s". Here is his confirmation at ATTP's blog:

Richard Tol says (my emphasis):
June 14, 2013 at 11:44 am
The consensus is of course in the high nineties. No one ever said it was not. We don’t need Cook’s survey to tell us that.
Cook’s paper tries to put a precise number on something everyone knows. They failed. Their number is not very precise.

So why does he think Cook13 failed, even though it "put a number" that "everyone else knows"? He doesn't say - anywhere.

Richard Tol's smear campaign

Instead, because of an apparent personal grudge with John Cook and his co-authors of the 97% study (I can think of no other reason, apart from a misguided quest "to become rich and famous"), he embarked on a smear campaign. He has been trying, and failing, miserably, for two years, in his attempts to impugn the credibility of the research and the reputation of the researchers.

I won't go over every mistake Richard has made, while flailing about looking for his "something wrong". Many of them have been well documented already. In addition to Friday's HW article, there are more demolitions at HotWhopper (here and here and here and here), at SkepticalScience (here), in a booklet by John Cook and colleagues (here) and in a rebuttal paper to Richard Tol (here) as well as an article in The Guardian by Dana Nuccitelli (here).

Richard's Gish Gallop

I'm writing this because Richard provided an opportunity to demonstrate how Gish gallopers like him operate, and how they respond - or should I say don't respond, as each of the gallops comes to a dead stop.

Signs of a Gish galloper

Gish gallopers are easily recognised. They will usually:
  • admit nothing
  • ignore their failed arguments, and 
  • generate new flawed arguments as soon as their others have been demolished.

Richard didn't bother addressing any of his mistakes to which I drew attention in Friday's article, except for one, where he pointed out I got it wrong. Refusing to acknowledge mistakes and ignoring those errors is a clear sign of a Gish galloper.

Tol gallop number 1 - the sample

Richard's first comment on yesterday's thread, was to point out that I misinterpreted a claim he made. But he didn't retract his claim when it was shown to be unsubstantiated.

He claimed that he was unable to replicate the sample database. He claimed to have found an extra 1500 papers. He gave up on that line of argument, when it was pointed out to him that this was most probably because of one or more of the following:

Richard refused to provide his search parameters. He refused to respond to this fairly simple request. He claimed to have run a query with parameters that would return results collected at the same time as Cook13 did their final query run, but didn't indicate:
  • How he knew the date and time of the final query that Cook13 ran
  • Exactly what search tags one can specify that will return papers added into WoS at a particular date and time
  • What his own search parameters were.

Even before Richard gave up on Gallop Number 1, he had moved onto to his next in true Gish galloping fashion.

Tol Gallop number 2 - getting tired

In a rare admission, after initially claiming that time stamps were recorded, Richard acknowledged that the Cook13 research only recorded the date of uploads. It did not record the hour and minute when the researchers uploaded their categorisations. (Most were done and uploaded in bulk in any case, so it would have told him little.) Then he slipped in another lie and claimed that John Cook denied the existence of date records. This was a silly and unsubstantiated lie.  The time of uploading is irrelevant to the survey results. Researchers were free to categorise abstracts in their own time whenever they chose. They were not working to a clock and even had they been, it would say nothing about the accuracy of the categorisations. That can best be determined by checking against the authors own assessment - which was very similar to those of the researchers' 97%.

This is an example of Richard resurrecting a claim that has already been debunked, by the very person whose research he cites to support his silly claim.  Richard wrongly claimed that reviewers would become less accurate in their ratings over time. On the contrary, as described here at HotWhopper, and in Tol's Error 15 in the SkS booklet, "interviewers" typically become more proficient over time, not less. This was confirmed by Dr Biemer himself, the author of the article that Richard cited!

If anything got tired, using Richard's misplaced analogy with market research surveys, it would have been the abstracts, not the researchers. And even if the abstracts were a bit tired, the words written in them wouldn't change :) (See also Tol's Error 6 in the SkS booklet, about how he confuses a literature search with a market research study.)

Richard used this issue as an excuse for more unsubstantiated allegations - that "Cook first did not want outsiders to look at them and later denied their existence". Which is what disinformers and smear merchants do. They don't back up their false claims (they can't) and they impugn nefarious intent.
(There are a very good reasons for John Cook withholding these date data. Apart from it being irrelevant to the findings or the methodology, his researchers were assured of anonymity. Individual ratings would not be attributed to any researcher. Because the SkS forum was hacked and private discussions stolen, it would have been possible for unscrupulous people to work out who rated which abstracts, and then attempt to twist that information to discredit the people and the research. Richard knows all this, but decided to attempt to smear John Cook anyway. Elsewhere, Richard also used the stolen discussions out of context, attributing a different meaning, to bolster his flawed arguments. This is described in Tol Error 13 in the SkS booklet.)

Richard didn't pursue that Gish gallop any further. Instead he moved ahead to another.

Tol Gallop number 3 - The sample: why didn't Richard ask John Cook?

In between, Richard tossed in a third Gish gallop. He claimed that "Cook's data have 12,876 papers. Cook's paper mentions 12,465 papers, of which 11,944 were used."

It took a lot of time before Richard responded to people asking what he was referring to. Turns out he didn't have a clue about where his 12,876 number came from. Not even after it was pointed out to him. He flailed about, variously asserting that it came from the ERL website (it didn't), then that he got it from the "paper ID's" - without saying where. He ignored my comment suggesting it came from the SkS download page.

He also ignored various suggestions as to why there could be more Article ID's listed than there were papers in the sample. As it turns out, the reason was simple, as I discovered by going straight to the source:
Being of a curious nature I did some more digging. In addition, I asked John Cook himself about the numbering. He let me know that I wasn't far off track. 
Turns out the IDs were assigned sequentially automatically, as expected. Some duplicates were accidentally added when John re-imported to his database from WoS, so he deleted them. This meant there were gaps in the article IDs.
My own digging supports this. Richard could have done the same if he'd been interested in finding out, instead of just wanting to imply nefarious activity.
I was able to account for all but two of the Abstract IDs in three lots of sequential IDs that have no abstracts attached. This indicates the removal of duplicates, inserted then removed in a batch. It's highly unlikely that there would have been this many sequential non-peer reviewed, for example, or anything else. So that leaves duplicate entries. Here are the numbers of sequential IDs:
  • IDs 5 to 346 inclusive = 342
  • IDs 1001 to 1004 inclusive = 4
  • IDs 2066 to 2128 inclusive. = 63
Total = 409 - the other two are probably isolated somewhere.
Bang goes the last of Richard's gish gallop of protests.

The numbering of the sequences suggest that there were duplicates in early downloads, which could have been removed even before the ratings commenced. Then duplicates in another batch or two, as the database was updated with the latest. I don't know the exact timing and don't see how it's relevant to anything anyway.

In a personal communication, John Cook has confirmed that no abstracts were deleted. Why didn't Richard ask John Cook?

Why didn't Richard bother to do the same exercise as I did? It only took me about five minutes to isolate the sequential Article ID's. Richard has been banging on about this for months - years.

The answers are obvious.

Tol Gallop Numbers 4 and 5 - A late addition of nuts!

As I said, the answers are obvious. I was about to finish this article when I saw that Richard has now added a new gallop, building on his failed one above.  He took the explanation of the extra Abstract IDs and, instead of apologising or acknowledging that he should have investigated himself, he went as far as saying it "may be" - and then launched into another Gish Gallop:
@Sue (sic)
That may be the explanation. The paper indeed speaks of two data downloads. If you are correct, then Cook did not just remove duplicate abstracts. He removed duplicate abstracts that had already been rated -- thus denying himself another opportunity to test inter-rater reliability.
Furthermore, if you are right, Cook replaced ratings from the earlier rating period with ratings from the later rating period. The two periods are markedly and significantly different.

Notice what Richard's done? He's made two further unsubstantiated claims.

On reliability

First he alleges something about "inter-rater reliability". This is a fixation of Richards. That is, I presume he is referring to differences between researchers in how they categorise papers. This was explicitly addressed in the paper itself and in the research design:
Each abstract was categorized by two independent, anonymized raters. A team of 12 individuals completed 97.4% (23 061) of the ratings; an additional 12 contributed the remaining 2.6% (607). Initially, 27% of category ratings and 33% of endorsement ratings disagreed. Raters were then allowed to compare and justify or update their rating through the web system, while maintaining anonymity. Following this, 11% of category ratings and 16% of endorsement ratings disagreed; these were then resolved by a third party.

There is no evidence that the duplicate papers had their ratings erased and had to be done again. Richard just made that bit up to raise another flawed argument. Even if that happened, does Richard honestly think that there would have been difference in ratings of 3% of papers, that have been rated by at least two people, which would have made a difference to the outcome?

That's nuts!

Not satisfied with solely relying on the researcher's categorisations, the research team took it on themselves to ask the authors of these papers to categorise them. The response confirmed the assessment. In fact, the research team's assessment (97.1%) was very slightly more conservative than that of the authors (98.4 97.2%). (The correction is because 98.4% is the percentage of authors, not papers. That is, people who authored papers that attributed global warming to human activity. A subtle but important distinction that was just pointed out to me.) [Correction made by Sou at 9:49 pm Sunday 29 March 2015.]

Time of ratings

As for his claim that there are differences between early and later ratings - he provides no evidence. Not only that, but as described above, there were checks and balances in the ratings - by having at least two people categorise each abstract and by having the authors categorise their own papers.

Not only that, but how would 3% of papers, even were they rated three to five times instead of two or three times - how would that make any substantive difference to the 97% result? It wouldn't.

The SkS booklet provides further demonstration that Richard is barking up the wrong tree in his fixations. See the analysis in Tol's Error 14 in the SkS booklet. It's not quite the same issue, but it is related.

Tol Gallop number 6 - jumping to wrong conclusions

My goodness. I can't keep up with Richard's Gish Galloping. He is a master at jumping to wrong conclusions, isn't he. Here is his latest comment:

Sou finds that the abstract with lower IDs were removed from the data. Lowest IDs were removed disproportionally. The default data dump from WoS is latest first. Cook's second data dump focused on recent papers.
The date stamps show that the second data dump was done after first and second ratings were completed for the first data dump.

How does Richard know that the first "cleaning out of duplicates" (the earliest duplicates) didn't happened before the ratings started?

Not that it makes any difference - see the Tol Gallop numbers 4 and 5 above.

Where is the apology? Where is the retraction?

Do not expect any acknowledgement or retraction, let alone an apology to John Cook and the Cook13 team. That is not part of the Gish Galloper Handbook. Nor is it part of the Smear and Disinformation Handbook.

I don't know if Richard will try on any more gallops. Just when you think he's run out of steam he comes up with new ideas - all imputing nefarious intent.  That's par for the course with Gish Gallopers and smear merchants.

Continued here.

References and further reading

Cook, John, Dana Nuccitelli, Sarah A. Green, Mark Richardson, Bärbel Winkler, Rob Painting, Robert Way, Peter Jacobs, and Andrew Skuce. "Quantifying the consensus on anthropogenic global warming in the scientific literature." Environmental Research Letters 8, no. 2 (2013): 024024.  doi:10.1088/1748-9326/8/2/024024 (Open access)

From the HotWhopper archives


  1. I don't know if you've pointed this out already, but this comment by Andrew Gelman would seem to be making the same kind of point that you are.

    1. Very appropriate. It's consistent with how Richard has been behaving here and elsewhere. It demonstrates that he's not a man I'd trust in his own field, let alone when he wanders out of his area of expertise, like here.

  2. >How he knew the date and time of the final query that Cook13 ran?

    From Cook's paper.

    >What his own search parameters were

    The same as Cook's.

    1. Yet you refuse to provide any evidence, Richard. Your claims are empty and unsubstantiated. (All the Cook paper states is the month of the last update (May 2012) - not the precise date.)

      Unlike Cook13, you consistently refuse to to list the list of abstracts and journals returned from your search.

      You also consistently refuse to divulge your search parameters.

      You appear to not have made the slightest effort to see why there is a difference between what you got and the Cook sample. You have jumped right into conspiracy ideation of some sort of nefarious intent.

      Thing is, if the sample size was 100,000 papers, the results would most likely have been very similar. There are virtually no published papers these days that dispute the human cause of global warming.

      It's called a red herring - among other things. And is getting very old.

      You are trying to restart your failed Gish Gallop. What a glutton for punishment.

    2. Captain FlashheartMarch 29, 2015 at 8:03 PM

      Once again Richard, please provide Cook's search string, your search string, evidence that your search string replicates the database at the time Cook searched, and evidence that searches are consistent across platforms, locations and institutions.

    3. And the code of course, as well as the makefile. If you run a scripted language, then we need the version of the interpretor as well as the OS exact version. And the random generator seed, if any was used.

      (yes I know, I usually extract a seed from highly volatile places such as /tmp)

  3. Richard,
    Except, IIRC, the Web of Knowledge search engine (search page - whatever you want to call it) has changed a bit since 2012. When I did the search in 2013, I got the same kind of result as Cook et al. If I do a search now using the WoS Core Collection and restrict it to articles only I get 14205. If I then select More Settings and then select Science Citation Index Expanded (SCI-EXPANDED) --1900-present, I get 12603. Given that these databases are updated, one wouldn't expect the number returned today to be the same as in March 2012. So, which search is equivalent to the done by Cook et al?

    Furthermore, WHY TF does this even matter? It just makes it seem as though you are searching for any reason to find fault in something you've already accepted as returning an answer that noone disputes. Okay, yes, it's obvious that this is what you're doing. The big question is WHY? The honest answer is almost certainly not something that would reflect well on you. Of course, that appears to not be something that bothers you particularly.

    1. The WoS search page did indeed change, some time in the middle or second half of 2014 I think - I'm surprised that Tol's published protocol doesn't detail his accounting of this.

      Oh, that's right...

      Scopus also upgraded in 2014 FWIW, although their's was much more subtle. The WoS change especially pissed me off somewhat, because the previous format was more user friendly IMHO.

  4. Hey, my Richard Tol’s 97% Scientific Consensus Gremlins didn't get a mention! :P

    The most frustrating part about Tol is that when he claims something, often that is either obviously wrong or makes you wonder where he got it from, he then either refuses to talk about it or just gives cryptic one-liners.

    1. Thanks, Collin. Great article - sorry for the omission.

      Richard is well known for refusing to provide his "data and code", among other transgressions.

    2. Considering how pen Cook was, and is, with the data it makes it really odd that he accuses Cook of hiding data.

      Though the most frustrating part of dealing with Tol is that he almost always answers with one-liners to questions that don't answer your questions. I had to hammer it into him once five times that he wasn't making any sense if he didn't provide context before he finally explained what he was on about. This wastes so much time and effort when simply answering a question would move the discussion forward.

    3. It's reprehensible of him to accuse Cook13 of hiding data that's there in plain site, in spades, and has been from the outset. The hide of him (Richard, that is), when Richard Tol is the one who refuses to provide his data when asked for evidence of his silly (and irrelevant) claims.

  5. It's unwise to cross a Tol !!:

  6. Come on, Richard. If you don't like the results, do your own literature survey using methodologies you think are better. Then publish (share your search parameters this time). If you don't want to do this, find someone who will.

    What's the point of nit-picking methodological methods if you can't show how they altered the results substantially. It just reminds me of the obsession over the original Mann et al paper. Could they have used better stats/methods? Yes. Would that have changed the results? No---and we know that because they, and others, did use better methodologies/different proxies, and very little changed.

    Now go and do likewise. Show everyone that Cook's methods were flawed enough to make a substantial difference. Imagine how vindicated you would feel if you could do that.

  7. This comment has been removed by a blog administrator.

    1. See the comment policy. Links to disinformation websites are not permitted here.

    2. but Sou, you constantly refer to yourself

    3. Weak, Richard.

      This is a blog that demolishes disinformation. You, instead, write disinformation.

      You've misrepresented what I wrote, and falsely attributed notions to me. Your mistakes are too obvious, such that only deniers suffering serious confirmation bias would be fooled. But that's your target audience, I presume.

      A case in point was your link, which is why it was categorised as a disinformation blog.

    4. So Richard has time to post comments in breach of site policy, but no time to present answers to pertinent questions about his search string.

    5. This is absolutely typical of denier behavior. Wouldn't it be far easier to just post the requested data, and let the conclusions stand for themselves? The amount of time and energy it takes to defend the indefensible claims would be better spent on anything else. Rhetoric and sarcasm should not be necessary to make a scientific point. And what is the point of all this? If it's that hard just to say how he did it (as opposed to the paper itself, which fully disclosed what they did) then the nefarious intent falls on Tol.

  8. "He removed duplicate abstracts that had already been rated -- thus denying himself another opportunity to test inter-rater reliability."

    This one especially had me amused and bemused.

    Richard Tol, why was it so important that they retain the duplicate imports and rate them as well, rather than simply using non-replicates distributed more than once between different assessors, or even having individual assessors going back through their catalogues at a later date to reassess, and calibrate their own work that way?

    Why is the deletion of duplicate entries so heinous? What's different about the duplicates that the original entries wouldn't serve as well?

    1. Not only is this a really dumb comment on Richard's part, he provides no evidence the articles had already been rated before the duplicate entries were removed.

      Looking at the ID numbers of the duplicates, they are all below about 3000, which suggests the duplications occurred during the first download batches, before the ratings even started.

      In fact, Richard didn't even know where the Article IDs came from. He just got the number from somewhere but he didn't know where. He kept pointing to the wrong page.

      (BTW - he's talking about 411 out of 11,944 papers. You'd think it was 50%, not 3% the way he's carrying on.)

    2. If they had not cleared duplicate entries, they would be attacked for inflating the number of papers by countin them twice. If they followed the same rating patterns as the rest of the papers, most (if not all) would be found in support of the Cook13 conclusion and deniers would claim that there was nefarious intent in including some papers twice.
      Scouring the list of duplicates shielded them from this accusation. It was due diligence and good intent that took those papers out of the circulation.
      If Tol is that worried about how those papers were rated, he should simply replicate the study from beginning to end (which it has already been pointed out that the results will come in so close to the same that it's not worth the time or money).

    3. This comment has been removed by a blog administrator.

    4. Did you ask Cook about this? It seems that if that happened, he'd be honest enough not to lie about it.
      And IF somehow that did happen after a round of reviews had taken place, are you suggesting that he cherry picked the more supportive result?

    5. Tol, let's sum up: if you are trying to accuse the cook13 team of something, come out and say it. Put all your chips on the table.
      Otherwise, apologize and get off your high horse.
      There can not possibly be enough variation in that small sample to swing the result the way you are implying. So why doggedly make half truths and loose implications that clearly can be shown to be A) false or B) irrelevant.

    6. @Someone else. I think Richard is drawing on snippets of stolen data that he doesn't understand and doesn't know how to interpret. Foolish is an understatement. Unethical is too mild a description for his behaviour.

      He embarked on his smear campaign almost before the ink on the research paper was dry. Before any supplementary data was made available. He's been floundering about ever since. For two years, would you believe.

      You can't take anything he says at face value.

    7. Richard's comment has been reposted by Sou at the the HotWhoppery

    8. I apologise to Sou in advance, but Tol's response to me contained statements was so against anything that a professional scientist would do in the execution of the work that it can't go unanswered.

      "Removing data is always bad, and removing data without telling is worse.

      You are constructing another strawman, as appears to be one of your your signature manoeuvres. This is an egregious thimble-rigging, because you are accusing Cook et al of scientific misconduct when in fact they did nothing of the sort.

      The real, actual fact is that NO RELEVANT DATA WERE REMOVED. The entries removed were duplications, which should explicitly be corrected or they might prejudice the result. Note though that they would almost certainly not affect the result if their collection was random with respect to the harvest alogithm employed. Sou has persistently pointed out to you that the data you claim is "missing" would not affect the result and yet you not only make the claim that they would, but you fail to demonstrate how they would.

      But back to the general point. Using your logic, quality assurance of data would involve creating bifurcating datasets every time an error or other trivial change was made to a dataset, irrespective of whether there is no impact whatsoever to the final version. Also implied is that every bifurcating version should be analysed in toto following the study's methodology, and compared to every other iteration so that it can be demonstrated that there are no alterations to the final result. This is patently ridiculous.

      And the funny thing is that I don't see this happening in any of your work, which is demonstrably riddled with errors. Remember the + 2.5* that should have been a - 2.5? Where's the re-analysis comparing both iterations? For that matter where are all the comparisons of all the other mistakes with which you have littered the literature and the interweb?

      "In this case, data from the first round of rating seem to have been replaced by data from the second round of rating. Ratings were materially and significantly different between the two rounds, so the final results are affected."

      If data only "seem" to have been "replaced", how then can you be explicit and say that "[r]atings were materially and significantly different between the two rounds" (my emphasis) as a result of this 'seeming' replacement, and where is your analysis proving this?

      Argument by assertion is a logical fallacy - but that's apparently your natural habitat...

      Seriously Richard Tol, would this level of "analysis" by you be acceptable in any other area of economics? Would you accept from climate scientists the same sloppiness that you demostrate yourself, even as you accuse them of sloppiness (and worse) with little or no proof of the existence and/or materiality of the same?

      (*I know that economics can be a rubbery art but seriously, is getting things arse-up and effectively not even blinking takes the cake, especially when you're tilting at windmills at every turn.)

    9. Richard take a break, go for a walk, consult with dispassionate and smart people but stay away from that computer. This stuff is very bad for you and you need to dump it today.

    10. I've seen nothing to support Richard's contention that the duplicate abstract entries had previously been categorised. Quite the contrary.

      I'd say he just made that up so he could create a a new Gish gallop, after all his others had failed.

    11. See my comment below. I was correct. Richard was wrong.

      The duplicate abstracts were removed *before* the ratings commenced.

  9. This became a matter for the University of Sussex in 2014.

  10. Sou: Kudos on an yet another excellent post.

  11. Richard Tol is increasingly reminding me of TV lawyer Saul Goodman, for whom no tactic is unethical, no argument too flimsy. Lawyers of this kind can admit that the main conclusion of the other side is right, while searching for a technicality that can somehow justify declaring a mistrial.

    -If your case is thrown out by one set of editors, shop around for another journal.
    -If your requests for private data get tuned down, try repeated FOI requests. When that fails, send off some nastygrams to your opponent’s employer.
    -Make insinuations of dishonesty in the Murdoch press. Odd data sequences and, pauses in the ratings process surely cannot have innocent explanations, can they?
    -Never apologize for making a false accusation or a lousy argument. Just move on quickly to the next one.
    -Make a formal complaint about a smear campaign to the Guardian. When they dismiss it, claim victory anyway. When you’re a victim, losing is vindication.
    -Mouth pieties about the sanctity of the scientific process, while using stolen private data to bolster your case. S’all good, man.

    If you need someone to tirelessly defend the indefensible: Better Call Tol!

    1. When you’re a victim, losing is vindication.

      In Tol's case the reverse seems more appropriate: When you're a loser, victimizing is vindication.

  12. Sou: I just posted a link to your OP on the Skeptical Science Facebook page. You will probably see an uptick in visitors as a result.

  13. I can now confirm that the 411 duplicates were removed from the database well before the ratings exercise began. John Cook has clarified this to me privately (and in no uncertain terms).

    This confirms what I myself deduced.

    I will be clarifying this in the main article later today.

  14. Sorry, did I miss the bit where Richard Tol admitted he was wrong and then decided to not embarrass himself further? Because it looks like the first part happened but then he forgot to do the second bit.


Instead of commenting as "Anonymous", please comment using "Name/URL" and your name, initials or pseudonym or whatever. You can leave the "URL" box blank. This isn't mandatory. You can also sign in using your Google ID, Wordpress ID etc as indicated. NOTE: Some Wordpress users are having trouble signing in. If that's you, try signing in using Name/URL. Details here.

Click here to read the HotWhopper comment policy.