Update: OMG! If you can believe it, even after all this, Richard Tol, in the comments below, is still indulging in a Recursive Fury of Gish gallops. He's taken the new (to him) facts and, instead of letting go of his wacky ideas as he should, he's gone and woven still more new conspiracy theories. (Has Richard not got any friends to have a quiet word in his ear? No-one who cares for him? How sad.)
If you've been following this blog for the past few days, you'll have noticed the fine illustration of denier-weird in action, including a Gish gallop evolving live (here and here).
This article is more by way of a post-script. An extraordinarily long post-script. As you probably know, I don't normally repeat a theme over consecutive days. The reason I'm writing this up as a separate article is because it is a wonderful chance to see how a conspiratorial notion was developed over a few short hours, at the tail end of a Gish gallop.
|It's a conspiracy. I just know it!|
The What: The case of the Abstract IDs
Before I show how the conspiratorial thinking evolved, let me fill you in on the background. Richard Tol raised the issue initially under another guise, what turned out to be "the case of the missing Abstract IDs" in the Cook13 data.
Cook13 is the famous paper that documented the 97% consensus. A team of researchers examined 11,944 abstracts of peer-reviewed papers relating to climate - a collation of twenty years of research papers from 1991 to 2011. They found that 97% of the papers that attributed a cause to recent warming, attributed it to human activity.
After trying on various conspiracy ideations and implying nefarious intent or at least incompetence on the part of the Cook13 research team, Richard Tol pointed to missing "data". Well, as you'll soon see that's not what it was. However by a bit of detective work, I was able to figure out that what Richard was referring to.
I soon worked out that Richard's "missing data" wasn't data at all. It was that there were 411 extra Article IDs (unique identifier numbers) than were accounted for by the papers in the Cook13 research. I figured the missing IDs were related to automatic numbering and removal of some abstracts for one reason or another, It turns out that was correct.
The How: The Uniqueness of Identifiers
After a bit of digging, I discovered that there were three main sequences of ID numbers that had no matching abstracts. In other words, there were gaps in the sequence of ID numbers. I looked at the gaps themselves. There were three lots of sequential numbers, all of which were in the first couple of thousand IDs. Adding them together accounted for all but two of the "missing numbers". This to me was a signal that the sequences represented duplicate entries that had been deleted, or perhaps abstracts from a journal that should have been excluded from the search.
I wasn't too wide of the mark.
I did what any reasonable person would do and I asked John Cook about how there came to be gaps. John Cook kindly got back to me fairly quickly. He said that he downloaded the abstracts for the Cook13 study from the main repository, the Web of Science (WoS). He downloaded the records in smallish batches and loaded them into his own database. In doing so, he inadvertently entered some of them into his database twice.
The database assigned IDs automatically to each new record (abstract), so when John deleted the duplicates, the IDs disappeared too, leaving gaps in the numbering. (Anyone who's worked with an SQL database knows this is normal when you assign automatic unique identifiers to individual records.)
As John Cook said to me - and I'm paraphrasing here (John's language was a tad more colourful):
What programmer would bother filling gaps? Who on earth would obsess about unique identifiers?
And he's right. Unique identifiers are, well, unique. There is one for every record and no two are the same. That's all you need to build relationships and construct queries. They serve no other purpose.
The When: Well Before any Rating Commenced
I've since got back to John and asked him to clarify when this happened. It turns out it all took place several weeks before the research proper began. He had deleted the duplicate entries and had the full database set up well before the researchers began to categorise the abstracts. All the abstracts were in the database before the ratings commenced, except for the late additions to the Web of Science database itself. (WoS continues to add papers over time, sometimes going back to prior years.) The final additions were made to the database in May 2012 - as described in the Cook13 paper.
This was consistent with the highest number of sequential missing IDs being 2128, which is low and indicates it was in one of the earlier of the smallish batches he imported into his database. (The highest Abstract ID number is 12876).
The Evolution of a Wild Conspiracy Theory
Which leads me to the evolution of the smear/conspiratorial ideation. The evolution took place over a few hours. It's classic Recursive Fury: as new facts emerged, rather than discard the conspiracy, it was adapted. It evolved.
As I indicated above, John Cook has clarified that the 411 duplicates were removed well before the researchers began to categorise ratings. This is as I thought, and as the numbers suggest. Richard Tol took my investigative skills, that resolved one of his insinuating questions, and turned it into another of his Gish gallops. It's hard to believe that he would do this, all over 411 missing Abstract IDs. A mere 3% of the total number of abstracts.
Below is a list of the main stages in the development of Richard's conspiracy theory.
In the Beginning: A "Minority" Consensus
As a precursor to the evolution, there was this comment from Richard Tol, which he used as a lead in. Richard wrote in an early comment, talking about his WoS search purportedly yielding 1500 more papers than did Cook13, for which he refused to provide any evidence (which is another story, already told):
The number of papers omitted is sufficiently high to reduce the consensus (as defined by Cook) to a minority.:
Step 1: "Cooks' papers"
While HotWhopper readers were variously picking themselves up off the floor and wiping tears of laughter from their eyes, or cleaning their keyboards, trying to explain to their families, between guffaws, what ailed them so, Richard wrote another comment:
Repeating their query, I found 13,458 papers. Repeating their historical query, I found 13,431 papers.
Cook's data have 12,876 papers. Cook's paper mentions 12,465 papers, of which 11,944 were used.
That is, up to 11% of Cook's observations are unaccounted for.
11% unaccounted for? This provoked further gales of mirth all over. Animals around the world, from the Emperor Penguins down south to the polar bears up north, pricked up their ears in wonder at the sound of so much earthly laughter. (You didn't know that ears of Emperor Penguins and polar bears can prick?)
If you do the sums, you'll see Richard is claiming that between his own "search" and Richard's alleged "Cook's data", there would be 1377 "observations unaccounted for". (Remember that number.)
What a let down. This grand conspiracy that began with the consensus being reduced to a minority, was quickly reduced to eleven per cent of papers "missing". (How often can a minority be classed as a consensus, I wonder?)
Step 2: A conspiratorial suggestion
Richard expanded further, with a hint that "something was wrong", inexplicably increasing his 1377 to 1500 and saying:
There are some 1500 missing abstracts. This is a large number relative to
Cat 1: 64
Cat 2: 922
Cat 5: 54
Cat 6: 15
Cat 7: 9
It is middling number relative to
Cat 3: 2910
It is a small number relative to
Cat 4: 7970
Now I don't know what those 1500 papers said. I just know they're missing.
See what he has casually done? First he dipped into his magician's hat and magically produced an extra 123 more papers out of thin air. Secondly, he's not assumed that additional papers would be apportioned across categories in the same manner as all the other papers. He's hinting that maybe they are all in one or two categories.
There followed more to-ing and fro-ing, with Richard finally agreeing to let go of his "search" conspiracy - at least for the time being. He wrote:
@CFHAs you can see, he still claimed that there were "12,876 papers in Cook's data".
You're welcome to disregard my search.
That still leaves the 12,876 papers in Cook's data versus the 12,465 papers in Cook's paper versus the 11,944 papers that were used in Cook's analysis.
Of the 1500 missing papers, 900 (60%) are missing in Cook's account.
Report 1 from The HotWhopper Detective Agency
Well, I disregarded Richard's "search" for reasons described elsewhere, and focused on his "12,876 papers". The number rang a bell. I'd come across that number before and remembered that it had something to do with the unique identifiers assigned to abstracts.
So I went back to the table I recalled having Abstract IDs. There was only one file available that included Abstract IDs, and it was hiding in plain sight on the SkepticalScience (SkS) web page for The Consensus Project. It was the file listed as: "All the articles listed by Id number (Article Id #, Year of Publication and Paper Title).
I wrote a comment to Richard, letting him know where his number came from, and that although the file had Abstract ID's going to 12,876, there were only 11,944 unique abstracts in the file itself:
Now all he has left is his 12,876 number. That probably came from a text file on the SkS page. The downloadable file described as: All the articles listed by Id number (Article Id #, Year of Publication and Paper Title), has some sort of article ID. It doesn't seem to mean anything. The difference could be journals removed from the search as being irrelevant (eg social science journals). There are only the 11,944 unique abstracts listed in the file.
I also quoted from Cook13, demonstrating Richard was mistaken. The paper showed that there weren't, as he claimed, 12,876 papers. The full data set used in the research comprised 12,465 papers, some of which were eliminated:
The ISI search generated 12 465 papers. Eliminating papers that were not peer-reviewed (186), not climate-related (288) or without an abstract (47) reduced the analysis to 11 944 papers written by 29 083 authors and published in 1980 journals.
Step 3: Ignoring the evidence
Richard ignored what I wrote about his number only relating to ID's not papers, and repeated what he'd said earlier, leaving out anything about his "search". He wrote:
Again, there are 12,876 paper in Cook's data.
There are 12,465 papers in Cook's paper.
The 12,876 is from Cook's data. You can get the file here: http://iopscience.iop.org/1748-9326/8/2/024024/media
Step 4: A small step back
Finally, Richard appeared to backtrack, and for the first time started talking about IDs, instead of "papers", and wrote (my emphasis):
Yes, the number of lines is 11,944. Paper IDs run up to 12,876, however, rather than 12,465 as suggested by Cook.
If the max ID would have been 12,576 the number in the paper may have been a typo, but 12,876 is unlikely.
Report 2 from the HotWhopper Detective Agency
Some time later I provided more results from my detective work, and explained that all that had happened was some records had inadvertently been duplicated, and subsequently deleted. I wrote it out in some detail:
I asked John Cook himself about the numbering. He let me know that I wasn't far off track.
Turns out the IDs were assigned sequentially automatically, as expected. Some duplicates were accidentally added when John re-imported to his database from WoS, so he deleted them. This meant there were gaps in the article IDs.
My own digging supports this. Richard could have done the same if he'd been interested in finding out, instead of just wanting to imply nefarious activity.
I was able to account for all but two of the Abstract IDs in three lots of sequential IDs that have no abstracts attached. This indicates the removal of duplicates, inserted then removed in a batch. It's highly unlikely that there would have been this many sequential non-peer reviewed, for example, or anything else. So that leaves duplicate entries. Here are the numbers of sequential IDs:
Bang goes the last of Richard's gish gallop of protests.
- IDs 5 to 346 inclusive = 342
- IDs 1001 to 1004 inclusive = 4
- IDs 2066 to 2128 inclusive. = 63
- Total = 409 - the other two are probably isolated somewhere.
Step 5: A new conspiracy theory is born
You'd think that would have put an end to the matter. The mystery of the missing IDs was solved.
Not on your life. Richard got up a new head of steam. He manufactured a new conspiracy. He manufactured more than that. On his own blog, he documented it in an article, and misrepresented what I wrote. Here is what he wrote at HotWhopper:
That may be the explanation. The paper indeed speaks of two data downloads. If you are correct, then Cook did not just remove duplicate abstracts. He removed duplicate abstracts that had already been rated -- thus denying himself another opportunity to test inter-rater reliability.
Furthermore, if you are right, Cook replaced ratings from the earlier rating period with ratings from the later rating period. The two periods are markedly and significantly different.
As you can see, he's merged his new conspiracy theory into an old meme of his, that "something must be wrong". He also reckons that ratings changed in some manner over the course of the exercise. Given Richard's known difficulty with numbers, the chances are extremely likely that he got that part wrong as well. That's neither here nor there. The point is that in true recursive fury style, Richard took new information and wove it into a reshaped conspiracy theory.
You could hardly find a better example of Recursive Fury in a single discussion thread, could you.
Saved by the bell - not!
It develops from there. This is how Richard saw it. He refused to accept that the duplicate records were deleted way back when (which they were). He decided that it could only have happened after later records had been added. In other words, He was convinced that 411 abstracts had been rated twice. He wrote:
Cook added abstracts later; see his paper and data.
Later abstracts have higher IDs.
Cook told Sou that there was overlap between the earlier and later set of abstracts.
Sou finds that the abstract with lower IDs were removed from the data. Lowest IDs were removed disproportionally. The default data dump from WoS is latest first. Cook's second data dump focused on recent papers.
The date stamps show that the second data dump was done after first and second ratings were completed for the first data dump.
Now "Cook told Sou" nothing of the sort. He didn't say that there was overlap between earlier and later sets of abstracts. What John Cook said was that he inadvertently added duplicate entries when he added the Web of Science data to his own data base. And that he subsequently deleted these entries.
When he answered my first query, John made no comment about exactly when this took place. It was Richard, no-one else, who decided it must have been after the cataloguing had proceeded after the first and second ratings had been completed. He was wrong.
Richard's interpretation fitted his two years smear campaign of flawed methodology (as if) and nefarious intent (as if). Richard was heavily invested in finding "something wrong". Having lost on every point so far, he was determined to find something, anything. He must have thought to himself "saved by the bell".
Richard didn't ask anyone, he proceeded to weave an ever more intricate conspiracy web. He wrote, arguing with one of the people who worked on Cook13:
Here's the timeline:
Download first batch of abstracts
Rate abstracts from first batch
Download second batch of abstracts
Remove duplicates from FIRST batch
Rate abstracts from second batch
No last chances
But that's not at all how it went down. As John Cook explained to me initially, he downloaded the first full set of abstracts and then, in a second step, proceeded to load them into his database. In smaller batches. That was when the duplication occurred. I didn't bother trying to explain all that initially. Why would I? Surely it was sufficient to explain that in the process of building the SQL database, duplicate entries were inadvertently added and then deleted.
But I didn't count on an uber conspiracy theorist, Richard Tol. Someone who has invested a huge amount of time and energy over two years, trying to find "something wrong". Anything at all. And failing dismally. This was a survey with whose results Richard agreed, mind you.
Richard had run out of options and must have seen this as his last chance at redemption.
Admitting to being a receiver of stolen property
Richard was so desperate that he finally admitted where he got his 12,876 number from. It wasn't from the only file released by Cook13 that had Abstract ID's listed. No. Why would Richard look at official material when there was stolen information he could play with?
The comment below explains why Richard wasn't previously able to say where he got his number from. He was understandably reluctant to admit he was unethical. (It's not a good look for an academic to say they are relying on misinterpretations of stolen snippets to build a smear campaign.) Richard wrote:
Recall that the date stamps were released.
I quickly responded, letting everyone, including Richard, know that there were no files with date stamps released. And for very good reasons (see here). And that I noted his admission. (Richard was referring to files he got from some script kiddie who hacked SkS.)
The inevitable (?) smear
Richard kept reaching beyond the information available to him, to try to justify his increasingly complex conspiracy theory. He finally presented his "theory" in all its glory, smear and all. I relegated it to the HotWhoppery for obvious reasons. He wrote:
Removing data is always bad, and removing data without telling is worse.
In this case, data from the first round of rating seem to have been replaced by data from the second round of rating. Ratings were materially and significantly different between the two rounds, so the final results are affected.
So all his conspiracy theorising was leading to that one point. Smear. Unadulterated, unjustified smear.
The real story is quite a let down, after all that
For all you conspiracy theorists out there, you are going to be sadly disappointed. Richard's theory is without any foundation whatever.
Here again are the simple facts. Just a repeat of what I've already written. Since I've lead you through all of Richard's conspiratorial hoops, your mind is probably reeling. So here is the prosaic tale once more.
It turns out that all the abstracts available at the time were loaded into the SQL database several weeks before the start of the Cook13 categorisations. After getting the data from WoS, John Cook loaded it into his SQL data base in batches. In the first few batches he accidentally added some duplicate abstracts, and so he deleted them.
All this took place well before the research proper began. The full SQL database was prepared in advance of the categorisation. Well, that's sensible, I hear you saying. You wouldn't ask people to rate non-existent abstracts in an empty database, would you. You'd set it up and make sure it's all working first.
So it was some weeks before the exercise began that John deleted the duplicate entries. All the abstracts were in the database long before the ratings commenced, except for the late additions to the Web of Science database itself. (WoS continues to add papers over time, sometimes going back to prior years.) The final additions were made to the database in May 2012 - as described in the Cook13 paper.
A final word
You may recall the number of duplicated abstract IDs - it's a mere 411. It's only three per cent of all the 11,944 abstracts in the final set. If they hadn't been picked up by the astute project leader, they would scarcely have made a difference to the main result, even were the sample skewed slightly one way or another.
Being astute, you'll have noticed the downs and downs of Richard's conspiracy, that he couldn't quite let go altogether.
- It started with his "minority" comment - that he could show that more than half of all papers that attributed a cause to global warming, would attribute it to anything but human causes. He let that notion drop without a word.
- This quickly shrank to eleven per cent of "papers" were missing - and by Richard's conspiratorial thinking, meant the consensus was less than 97%.
- It soon shrank again, to just 3% - in Richard's weird mind, this represented papers that were "missing".
- Finally it all but disappeared altogether as Richard accepted the 3% were duplicate papers. That there were no "missing" papers at all.
- He wasn't willing to let it go quite as far as zero, however, He decided that the duplicate IDs meant that the ratings were skewed somehow (though he didn't say how).
What a strange tale.
The fuss that university professor, Richard Tol, made about this would not be believed if the evidence wasn't here in black and white. You wouldn't credit it, but Richard Tol isn't just a run of the mill denier at WUWT, he actually holds down a job at The University of Sussex. He is a Professor of Economics. Hard to accept, I know. He's even been a lead author in one of the working groups (not the physical science one) of the latest IPCC report, until he quit - mostly. He's also on the advisory board, with a whole bunch of ratbags, of the UK denier organisation, the Global Warming Policy Foundation.
The First Law of Tol:
Related HotWhopper articles
- Deconstructing the 97% self-destructed Richard Tol - 27 March 2015
- The fall and fall of Gish galloping Richard Tol's smear campaign - 29 March 2015
- BUSTED: How Ridiculous Richard Tol makes myriad bloopers and a big fool of himself and proves the 97% consensus - 5 June 2014
- Ridiculous Richard Tol sez 12,000 is a strange number... - 7 June 2014
- Denier Weirdness: Don't count climate science papers to "prove" there's no consensus! - 3 June 2013