The fall and fall of Gish galloping Richard Tol's smear campaign

HotWhopper

Global warming and climate change. Eavesdropping on the deniosphere, its weird pseudo-science and crazy conspiracy whoppers.

.

Sunday, March 29, 2015

The fall and fall of Gish galloping Richard Tol's smear campaign

Sou | 6:42 PM Go to the first of 40 comments. Add a comment

A short while ago I wrote an article demolishing Richard Tol's latest demonisation of Cook13, the well known 97% consensus paper. (Update: there's still more to the saga - see here.)

"The consensus is of course in the high 90s" - Richard Tol

As you know, Richard agrees that of all the scientific papers that attribute a cause to global warming, the percentage that attribute it to human activity is "in the high 90s". Here is his confirmation at ATTP's blog:

Richard Tol says (my emphasis):

June 14, 2013 at 11:44 am
The consensus is of course in the high nineties. No one ever said it was not. We don’t need Cook’s survey to tell us that.

Cook’s paper tries to put a precise number on something everyone knows. They failed. Their number is not very precise.

So why does he think Cook13 failed, even though it "put a number" that "everyone else knows"? He doesn't say - anywhere.

Richard Tol's smear campaign

Instead, because of an apparent personal grudge with John Cook and his co-authors of the 97% study (I can think of no other reason, apart from a misguided quest "to become rich and famous"), he embarked on a smear campaign. He has been trying, and failing, miserably, for two years, in his attempts to impugn the credibility of the research and the reputation of the researchers.

I won't go over every mistake Richard has made, while flailing about looking for his "something wrong". Many of them have been well documented already. In addition to Friday's HW article, there are more demolitions at HotWhopper (here and here and here and here), at SkepticalScience (here), in a booklet by John Cook and colleagues (here) and in a rebuttal paper to Richard Tol (here) as well as an article in The Guardian by Dana Nuccitelli (here).

Richard's Gish Gallop

I'm writing this because Richard provided an opportunity to demonstrate how Gish gallopers like him operate, and how they respond - or should I say don't respond, as each of the gallops comes to a dead stop.

Signs of a Gish galloper

Gish gallopers are easily recognised. They will usually:

admit nothing
ignore their failed arguments, and
generate new flawed arguments as soon as their others have been demolished.

Richard didn't bother addressing any of his mistakes to which I drew attention in Friday's article, except for one, where he pointed out I got it wrong. Refusing to acknowledge mistakes and ignoring those errors is a clear sign of a Gish galloper.

Tol gallop number 1 - the sample

Richard's first comment on yesterday's thread, was to point out that I misinterpreted a claim he made. But he didn't retract his claim when it was shown to be unsubstantiated.

He claimed that he was unable to replicate the sample database. He claimed to have found an extra 1500 papers. He gave up on that line of argument, when it was pointed out to him that this was most probably because of one or more of the following:

WoS returns may depend on subscriptions "to which you are entitled"
WoS is a dynamic database, constantly updating entries including older papers. Returns will vary, depending on when you run the query.

Richard refused to provide his search parameters. He refused to respond to this fairly simple request. He claimed to have run a query with parameters that would return results collected at the same time as Cook13 did their final query run, but didn't indicate:

How he knew the date and time of the final query that Cook13 ran
Exactly what search tags one can specify that will return papers added into WoS at a particular date and time
What his own search parameters were.

Even before Richard gave up on Gallop Number 1, he had moved onto to his next in true Gish galloping fashion.

Tol Gallop number 2 - getting tired

In a rare admission, after initially claiming that time stamps were recorded, Richard acknowledged that the Cook13 research only recorded the date of uploads. It did not record the hour and minute when the researchers uploaded their categorisations. (Most were done and uploaded in bulk in any case, so it would have told him little.) Then he slipped in another lie and claimed that John Cook denied the existence of date records. This was a silly and unsubstantiated lie. The time of uploading is irrelevant to the survey results. Researchers were free to categorise abstracts in their own time whenever they chose. They were not working to a clock and even had they been, it would say nothing about the accuracy of the categorisations. That can best be determined by checking against the authors own assessment - which was very similar to those of the researchers' 97%.

This is an example of Richard resurrecting a claim that has already been debunked, by the very person whose research he cites to support his silly claim. Richard wrongly claimed that reviewers would become less accurate in their ratings over time. On the contrary, as described here at HotWhopper, and in Tol's Error 15 in the SkS booklet, "interviewers" typically become more proficient over time, not less. This was confirmed by Dr Biemer himself, the author of the article that Richard cited!

If anything got tired, using Richard's misplaced analogy with market research surveys, it would have been the abstracts, not the researchers. And even if the abstracts were a bit tired, the words written in them wouldn't change :) (See also Tol's Error 6 in the SkS booklet, about how he confuses a literature search with a market research study.)

Richard used this issue as an excuse for more unsubstantiated allegations - that "Cook first did not want outsiders to look at them and later denied their existence". Which is what disinformers and smear merchants do. They don't back up their false claims (they can't) and they impugn nefarious intent.

(There are a very good reasons for John Cook withholding these date data. Apart from it being irrelevant to the findings or the methodology, his researchers were assured of anonymity. Individual ratings would not be attributed to any researcher. Because the SkS forum was hacked and private discussions stolen, it would have been possible for unscrupulous people to work out who rated which abstracts, and then attempt to twist that information to discredit the people and the research. Richard knows all this, but decided to attempt to smear John Cook anyway. Elsewhere, Richard also used the stolen discussions out of context, attributing a different meaning, to bolster his flawed arguments. This is described in Tol Error 13 in the SkS booklet.)

Richard didn't pursue that Gish gallop any further. Instead he moved ahead to another.

Tol Gallop number 3 - The sample: why didn't Richard ask John Cook?

In between, Richard tossed in a third Gish gallop. He claimed that "Cook's data have 12,876 papers. Cook's paper mentions 12,465 papers, of which 11,944 were used."

It took a lot of time before Richard responded to people asking what he was referring to. Turns out he didn't have a clue about where his 12,876 number came from. Not even after it was pointed out to him. He flailed about, variously asserting that it came from the ERL website (it didn't), then that he got it from the "paper ID's" - without saying where. He ignored my comment suggesting it came from the SkS download page.

He also ignored various suggestions as to why there could be more Article ID's listed than there were papers in the sample. As it turns out, the reason was simple, as I discovered by going straight to the source:

Being of a curious nature I did some more digging. In addition, I asked John Cook himself about the numbering. He let me know that I wasn't far off track.

Turns out the IDs were assigned sequentially automatically, as expected. Some duplicates were accidentally added when John re-imported to his database from WoS, so he deleted them. This meant there were gaps in the article IDs.

My own digging supports this. Richard could have done the same if he'd been interested in finding out, instead of just wanting to imply nefarious activity.

I was able to account for all but two of the Abstract IDs in three lots of sequential IDs that have no abstracts attached. This indicates the removal of duplicates, inserted then removed in a batch. It's highly unlikely that there would have been this many sequential non-peer reviewed, for example, or anything else. So that leaves duplicate entries. Here are the numbers of sequential IDs:

IDs 5 to 346 inclusive = 342

IDs 1001 to 1004 inclusive = 4

IDs 2066 to 2128 inclusive. = 63

Total = 409 - the other two are probably isolated somewhere.

Bang goes the last of Richard's gish gallop of protests.

The numbering of the sequences suggest that there were duplicates in early downloads, which could have been removed even before the ratings commenced. Then duplicates in another batch or two, as the database was updated with the latest. I don't know the exact timing and don't see how it's relevant to anything anyway.

In a personal communication, John Cook has confirmed that no abstracts were deleted. Why didn't Richard ask John Cook?

Why didn't Richard bother to do the same exercise as I did? It only took me about five minutes to isolate the sequential Article ID's. Richard has been banging on about this for months - years.

The answers are obvious.

Tol Gallop Numbers 4 and 5 - A late addition of nuts!

As I said, the answers are obvious. I was about to finish this article when I saw that Richard has now added a new gallop, building on his failed one above. He took the explanation of the extra Abstract IDs and, instead of apologising or acknowledging that he should have investigated himself, he went as far as saying it "may be" - and then launched into another Gish Gallop:

@Sue (sic)
That may be the explanation. The paper indeed speaks of two data downloads. If you are correct, then Cook did not just remove duplicate abstracts. He removed duplicate abstracts that had already been rated -- thus denying himself another opportunity to test inter-rater reliability.

Furthermore, if you are right, Cook replaced ratings from the earlier rating period with ratings from the later rating period. The two periods are markedly and significantly different.

Notice what Richard's done? He's made two further unsubstantiated claims.

On reliability

First he alleges something about "inter-rater reliability". This is a fixation of Richards. That is, I presume he is referring to differences between researchers in how they categorise papers. This was explicitly addressed in the paper itself and in the research design:

Each abstract was categorized by two independent, anonymized raters. A team of 12 individuals completed 97.4% (23 061) of the ratings; an additional 12 contributed the remaining 2.6% (607). Initially, 27% of category ratings and 33% of endorsement ratings disagreed. Raters were then allowed to compare and justify or update their rating through the web system, while maintaining anonymity. Following this, 11% of category ratings and 16% of endorsement ratings disagreed; these were then resolved by a third party.

There is no evidence that the duplicate papers had their ratings erased and had to be done again. Richard just made that bit up to raise another flawed argument. Even if that happened, does Richard honestly think that there would have been difference in ratings of 3% of papers, that have been rated by at least two people, which would have made a difference to the outcome?

That's nuts!

Not satisfied with solely relying on the researcher's categorisations, the research team took it on themselves to ask the authors of these papers to categorise them. The response confirmed the assessment. In fact, the research team's assessment (97.1%) was very slightly more conservative than that of the authors (~~98.4~~ 97.2%). (The correction is because 98.4% is the percentage of authors, not papers. That is, people who authored papers that attributed global warming to human activity. A subtle but important distinction that was just pointed out to me.) [Correction made by Sou at 9:49 pm Sunday 29 March 2015.]

Time of ratings

As for his claim that there are differences between early and later ratings - he provides no evidence. Not only that, but as described above, there were checks and balances in the ratings - by having at least two people categorise each abstract and by having the authors categorise their own papers.

Not only that, but how would 3% of papers, even were they rated three to five times instead of two or three times - how would that make any substantive difference to the 97% result? It wouldn't.

The SkS booklet provides further demonstration that Richard is barking up the wrong tree in his fixations. See the analysis in Tol's Error 14 in the SkS booklet. It's not quite the same issue, but it is related.

Tol Gallop number 6 - jumping to wrong conclusions

My goodness. I can't keep up with Richard's Gish Galloping. He is a master at jumping to wrong conclusions, isn't he. Here is his latest comment:

Sou finds that the abstract with lower IDs were removed from the data. Lowest IDs were removed disproportionally. The default data dump from WoS is latest first. Cook's second data dump focused on recent papers.

The date stamps show that the second data dump was done after first and second ratings were completed for the first data dump.

How does Richard know that the first "cleaning out of duplicates" (the earliest duplicates) didn't happened before the ratings started?

Not that it makes any difference - see the Tol Gallop numbers 4 and 5 above.

Where is the apology? Where is the retraction?

Do not expect any acknowledgement or retraction, let alone an apology to John Cook and the Cook13 team. That is not part of the Gish Galloper Handbook. Nor is it part of the Smear and Disinformation Handbook.

I don't know if Richard will try on any more gallops. Just when you think he's run out of steam he comes up with new ideas - all imputing nefarious intent. That's par for the course with Gish Gallopers and smear merchants.

Continued here.

References and further reading

Cook, John, Dana Nuccitelli, Sarah A. Green, Mark Richardson, Bärbel Winkler, Rob Painting, Robert Way, Peter Jacobs, and Andrew Skuce. "Quantifying the consensus on anthropogenic global warming in the scientific literature." Environmental Research Letters 8, no. 2 (2013): 024024. doi:10.1088/1748-9326/8/2/024024 (Open access)

From the HotWhopper archives

Settled science: there is a scientific consensus that humans are causing climate change - April 2016
Deconstructing the 97% self-destructed Richard Tol - March 2015
The Evolution of a 97% Conspiracy Theory - The Case of the Abstract IDs - March 2015
BUSTED: How Ridiculous Richard Tol makes myriad bloopers and a big fool of himself and proves the 97% consensus - June 2014
Ridiculous Richard Tol sez 12,000 is a strange number... - June 2014
Denier Weirdness: Don't count climate science papers to "prove" there's no consensus! - June 2013

40 comments:

AnonymousMarch 29, 2015 at 6:49 PM
I don't know if you've pointed this out already, but this comment by Andrew Gelman would seem to be making the same kind of point that you are.
ReplyDelete
Replies
richardtolMarch 29, 2015 at 7:01 PM
>How he knew the date and time of the final query that Cook13 ran?

From Cook's paper.

>What his own search parameters were

The same as Cook's.
ReplyDelete
Replies
AnonymousMarch 29, 2015 at 7:23 PM
Richard,
Except, IIRC, the Web of Knowledge search engine (search page - whatever you want to call it) has changed a bit since 2012. When I did the search in 2013, I got the same kind of result as Cook et al. If I do a search now using the WoS Core Collection and restrict it to articles only I get 14205. If I then select More Settings and then select Science Citation Index Expanded (SCI-EXPANDED) --1900-present, I get 12603. Given that these databases are updated, one wouldn't expect the number returned today to be the same as in March 2012. So, which search is equivalent to the done by Cook et al?

Furthermore, WHY TF does this even matter? It just makes it seem as though you are searching for any reason to find fault in something you've already accepted as returning an answer that noone disputes. Okay, yes, it's obvious that this is what you're doing. The big question is WHY? The honest answer is almost certainly not something that would reflect well on you. Of course, that appears to not be something that bothers you particularly.
ReplyDelete
Replies
Collin MaessenMarch 29, 2015 at 7:38 PM
Hey, my Richard Tol’s 97% Scientific Consensus Gremlins didn't get a mention! :P

The most frustrating part about Tol is that when he claims something, often that is either obviously wrong or makes you wonder where he got it from, he then either refuses to talk about it or just gives cryptic one-liners.
ReplyDelete
Replies
thefordprefectMarch 29, 2015 at 9:51 PM
It's unwise to cross a Tol !!:
http://frankackerman.com/tol-controversy/
ReplyDelete
Replies
Dan AndrewsMarch 29, 2015 at 11:34 PM
Come on, Richard. If you don't like the results, do your own literature survey using methodologies you think are better. Then publish (share your search parameters this time). If you don't want to do this, find someone who will.

What's the point of nit-picking methodological methods if you can't show how they altered the results substantially. It just reminds me of the obsession over the original Mann et al paper. Could they have used better stats/methods? Yes. Would that have changed the results? No---and we know that because they, and others, did use better methodologies/different proxies, and very little changed.

Now go and do likewise. Show everyone that Cook's methods were flawed enough to make a substantial difference. Imagine how vindicated you would feel if you could do that.
ReplyDelete
Replies
richardtolMarch 29, 2015 at 11:44 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Bernard J.March 30, 2015 at 1:59 AM
"He removed duplicate abstracts that had already been rated -- thus denying himself another opportunity to test inter-rater reliability."

This one especially had me amused and bemused.

Richard Tol, why was it so important that they retain the duplicate imports and rate them as well, rather than simply using non-replicates distributed more than once between different assessors, or even having individual assessors going back through their catalogues at a later date to reassess, and calibrate their own work that way?

Why is the deletion of duplicate entries so heinous? What's different about the duplicates that the original entries wouldn't serve as well?
ReplyDelete
Replies
PGMarch 30, 2015 at 5:40 AM
This became a matter for the University of Sussex in 2014.
ReplyDelete
Replies
John HartzMarch 30, 2015 at 5:55 AM
Sou: Kudos on an yet another excellent post.
ReplyDelete
Replies
Andy SkuceMarch 30, 2015 at 8:12 AM
Richard Tol is increasingly reminding me of TV lawyer Saul Goodman, for whom no tactic is unethical, no argument too flimsy. Lawyers of this kind can admit that the main conclusion of the other side is right, while searching for a technicality that can somehow justify declaring a mistrial.

-If your case is thrown out by one set of editors, shop around for another journal.
-If your requests for private data get tuned down, try repeated FOI requests. When that fails, send off some nastygrams to your opponent’s employer.
-Make insinuations of dishonesty in the Murdoch press. Odd data sequences and, pauses in the ratings process surely cannot have innocent explanations, can they?
-Never apologize for making a false accusation or a lousy argument. Just move on quickly to the next one.
-Make a formal complaint about a smear campaign to the Guardian. When they dismiss it, claim victory anyway. When you’re a victim, losing is vindication.
-Mouth pieties about the sanctity of the scientific process, while using stolen private data to bolster your case. S’all good, man.

If you need someone to tirelessly defend the indefensible: Better Call Tol!
ReplyDelete
Replies
John HartzMarch 30, 2015 at 9:43 AM
Sou: I just posted a link to your OP on the Skeptical Science Facebook page. You will probably see an uptick in visitors as a result.
ReplyDelete
Replies
SouMarch 30, 2015 at 1:58 PM
I can now confirm that the 411 duplicates were removed from the database well before the ratings exercise began. John Cook has clarified this to me privately (and in no uncertain terms).

This confirms what I myself deduced.

I will be clarifying this in the main article later today.
ReplyDelete
Replies
Tyson AdamsMarch 30, 2015 at 9:55 PM
Sorry, did I miss the bit where Richard Tol admitted he was wrong and then decided to not embarrass himself further? Because it looks like the first part happened but then he forgot to do the second bit.
ReplyDelete
Replies

Add comment

Instead of commenting as "Anonymous", please comment using "Name/URL" and your name, initials or pseudonym or whatever. You can leave the "URL" box blank. This isn't mandatory. You can also sign in using your Google ID, Wordpress ID etc as indicated. NOTE: Some Wordpress users are having trouble signing in. If that's you, try signing in using Name/URL. Details here.

Click here to read the HotWhopper comment policy.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sunday, March 29, 2015

The fall and fall of Gish galloping Richard Tol's smear campaign

"The consensus is of course in the high 90s" - Richard Tol

Richard Tol's smear campaign

Richard's Gish Gallop

Signs of a Gish galloper

Tol gallop number 1 - the sample

Tol Gallop number 2 - getting tired

Tol Gallop number 3 - The sample: why didn't Richard ask John Cook?

Tol Gallop Numbers 4 and 5 - A late addition of nuts!

On reliability

Time of ratings

Tol Gallop number 6 - jumping to wrong conclusions

Where is the apology? Where is the retraction?

References and further reading

40 comments:

New Look