Behe Blows It (in other news, dog bites man)

The iconic image of the Kitzmiller v. Dover trial in Pennsylvania was Michael Behe sitting on the witness stand, a pile of papers and book chapters on the evolution of the vertebrate immune system on his lap, steadfastly denying the existence of research on the evolution of the vertebrate immune system. In his new book, The Edge of Evolution, Behe continues his practiced denial, minimizing or ignoring a pile of research in order to maintain his claim that evolution can’t produce this or that biological structure because it is “irreducibly complex”. While I didn’t get a review copy, I know a friendly bookstore owner who encourages customers to read in the store.

I will leave it to others to evaluate Behe’s claims about the various specific biological systems (and many have: see Science after Sunclipse for a complete listing). I will return to a piece of research that demonstrates that Behe’s conception of how evolutionary processes produce complicated structures is amazingly over-simplified and empirically false, and that his conception of what evolutionary processes are capable of is pure caricature.

Mark Chu-Carroll has already dissected Behe’s misuse of probability and his utter ignorance of the properties of high-dimensioned and plastic fitness landscapes and ERV nicely illustrates the point. As Nick Matzke has remarked,

My first take is that The Edge of Evolution is basically an incompetent attempt to provide a biological foundation for the silly assumptions that were made in Behe and Snoke’s (2004) mathematical modeling paper in Protein Science.

Mark analyzed Behe’s argument from teeny-weeny numbers, showing that it is based on fundamentally flawed assumptions about the topography of fitness landscapes and the supposed inability of evolving populations to escape from local maxima. I’ll show that Mark’s analysis has empirical corroboration – Behe’s probability model generates wholly absurd results. In addition, I’ll describe data, some of it from new analyses, that flatly contradict Behe’s claims about what evolutionary mechanisms allegedly cannot do. To put it in the most direct terms possible, Behe is either ignorant or actively ignores evidence that contradicts his fundamental assumption about what evolutionary mechanisms can do. I’ll show that in addition to producing entities that are irreducibly complex by Behe’s original Darwin’s Black Box definition, computer models of evolution also produce those entities via evolutionary pathways that are irreducibly complex by Behe’s second so-called “evolutionary” definition, pathways that contain multiple unselected mutational steps. Neutrality lives! Then I’ll make a few remarks on Behe’s probability calculations in the light of computer evolutionary simulation runs, and make a few remarks on Behe’s notion of fitness landscapes.

More below the fold.

The research to which I return is Lenski, et al.’s The Evolutionary Origin of Complex Features, the 2003 Nature paper that used Avida, an evolutionary simulator, to show the evolution of IC structures. I have previously written on that research (see here for one post). In that previous post I described how the Avida research showed that structures that meet Behe’s first definition of “irreducible complexity” evolve by standard Darwinian mechanisms – random mutations and selection. In the course of an interminable thread on Internet Infidels I did some additional analyses of the Lenski, et al., data to show that evolution can produce structures via pathways that are IC by Behe’s second definition.

I. Evolving irreducibly complex structures

Behe’s original definition of irreducible complexity in Darwin’s Black Box defined IC in terms of the structure of the object under analysis:

A single system composed of several well-matched, interacting parts that contribute to the basic function of the system, wherein the removal of any one of the parts causes the system to effectively cease functioning. (Darwin’s Black Box, 39)

That implies a simple operational procedure to determine whether a structure is IC: Do knockout experiments. Knock out components one by one and determine whether the structure retains its function. By that operational criterion, the evolution of Avida critters capable of performing EQU produced IC structures. See here for the results of Lenski, et al.’s knockout analysis. Case closed on that version of IC.

II. Evolving irreducibly complex structures via irreducibly complex pathways

Following DBB, Behe later offered a second, “evolutionary” definition of irreducible complexity:

An irreducibly complex evolutionary pathway is one that contains one or more unselected steps (that is, one or more necessary-but-unselected mutations). The degree of irreducible complexity is the number of unselected steps in the pathway. (p. 17)

Behe & Snoke’s 2004 Protein Science paper was an attempt to show that the time and population sizes required to produce structures via irreducibly complex evolutionary pathways are prohibitively huge. See here for a rebuttal.

The operational definition of an irreducibly complex evolutionary pathway is simple in principle: determine the “necessary-but-unselected” mutational steps in an evolutionary history and count them. Unfortunately, in biological systems we typically don’t know the actual evolutionary history nor the nature of the selective environments through that history at the level of detail necessary to determine whether a given mutation was selectively neutral or deleterious when it first occurred and hence was unselected. However, in the Avida evolutionary simulator it’s easy to do that: the selective environment is controlled by the experimenter and the complete evolutionary history of a lineage can be dumped to disk for analyses. In fact, Lenski, et al. made available the evolutionary history of one of the lineages that evolved to perform EQU and that was IC by the first definition. Since the knockout procedure tells us which instructions in the first EQU-performing critter were necessary, we can trace back every one of them through the evolutionary history to determine whether they were selected when they first occurred. The saved evolutionary history includes the fitness value of the critter associated with each mutational step. Hence one can classify every instruction in the final IC structure according to whether on its first appearance it was beneficial, neutral, or deleterious.

In the case study lineage there were 111 mutational steps leading to the first critter capable of performing EQU. That critter had a genome of 60 instructions. Of those 60 instructions, 28 were necessary to perform EQU. I traced back all of those 28 instructions in the evolutionary history of that critter, determining for each mutation whether on its first appearance it was selectively beneficial, neutral, or deleterious. Of the 28 essential instructions, 7 produced by insertion or substitution mutations were either neutral (6) or deleterious (1) on their first appearance. (It’s also noteworthy that three other EQU IC instructions were part of the original Ancestor’s replication code that acquired an additional role in performing EQU. Two simultaneously retained their original role in replication – knocking them out abolished both replication and EQU – but one was no longer used in replication but only in EQU, a nice example of change of function through evolution. Behe, of course, studiously ignores changes of function.)

III. How does Behe address Lenski, et al.? He doesn’t.

And how does Behe’s book actually address the Lenski, et al., paper? Behe’s initial 2003 reaction to the paper, quoted in the PT critique of Behe & Snoke, was

“There’s precious little real biology in this project,” Mr. Behe said. For example, he said, the results might be more persuasive if the simulations had operated on genetic sequences rather than fictitious computer programs.

This from a man whose iconic metaphor is a mousetrap, and who in Behe & Snoke constructed a computer model that eliminated virtually all evolutionary mechanisms and then imagined that he was testing an evolutionary hypothesis! In The Edge of Evolution Behe again declines to address the substance and specifics of the Lenski, et al. study, but instead focuses on an irrelevant side issue. In Appendix D he refers to the Avida experiments without describing them. Rather, he focuses on just one property of the simulation. In the Lenski, et al., study, genome length was rendered selectively neutral by apportioning computer cycles in proportion to length. The effect of that is to make genome length invisible to selection. Describing that, Behe wrote

Let’s look at just one example to illustrate the point. In Avida, acquiring new abilities is only one way for an organism to get computer food. Another way is by simply acquiring surplus instructions, whether or not they do anything. In fact, instructions that aren’t ever executed—making them utterly useless for performing tasks – are beneficial in Avida because they provide additional food without requiring any additional consumption. It’s survival of the fattest! (p. 276; italics original)

But it’s false that “instructions that aren’t ever executed … are beneficial”. They’re selectively neutral. Either Behe didn’t read the paper or he didn’t understand it or he consciously misrepresents it. Adding instructions does increase “food” consumption – it requires cycles to reproduce the added instructions. Hence, apportioning computer cycles in proportion to genome length neutralizes length.

Now it’s still worth asking whether making genome length selectively neutral invalidates the simulation. Several lines of research are relevant. Genomes in biological critters vary enormously in length. For example, in animals the C-values (on present data) vary by 5 orders of magnitude. In Archea and Bacteria the number of protein-coding genes varies by at least an order of magnitude. The major metabolic cost of DNA apparently lies in its expression – transcription and translation – and not in its reproduction during cell division. Added DNA that is not expressed is apparently not strongly selected against; the metabolic cost of replicating DNA during cell division appears to be down in the noise in a cell’s energy budget. Corroboration comes from the persistence of pseudogenes and other non-coding DNA in genomes. Their ubiquity argues against strong selective pressures to slim down the genome. Also note T. Ryan Gregory’s remark here:

I make this statement because there are several different sorts of DNA sequences in the genome whose presence can be explained even if they do not benefit (and indeed, even if they slightly harm) the organism carrying them. Pseudogenes, satellite DNA, transposable elements (45% of our genome), and other non-coding sequences may or may not be functional – that requires evidence – and some may exist as a result of accidental duplication or even due to selection at the level of the elements themselves (by “intragenomic selection”). The old assumption that all non-coding DNA must be beneficial to the organism or it would have been deleted by now ignores genome-specific processes by which non-coding DNA evolves.

As a consequence, Behe’s claim that

It’s also very unrealistic. Biological organisms show the opposite behavior—genes that are useless in the real world are not rewarded; the genes are rapidly lost or degraded by mutation.

is false. Degraded, yes, but “rapidly lost” in the sense of disappearing altogether? The data argue that’s not the case. So rendering genome length selectively neutral is not a disabling flaw in the Avida work, and the results stand as refutations of Behe’s claims.

IV. Calculating Behe’s probabilities for an Avida run

One can use Behe’s probability model in Chapter 3, “The Mathematical Limits of Darwinism” to analyze the Lenski, et al. data. Behe treats the initial appearance of components of a structure as statistically independent and calculates the probability of occurrence of the necessary mutations as the product of the probabilities of occurrence of each mutation individually:

Recall that the odds against getting two necessary, independent mutations are the multiplied odds for getting each mutation individually. What if a problem arose during the course of life on earth that required a cluster of mutations that was twice as complex as a CCC? (Let’s call it a double CCC.) For example, what if instead of the several amino acid changes needed for chloroquine resistance in malaria, twice that number were needed? In that case the odds would be that for a CCC times itself. Instead of 1020 cells to solve the evolutionary problem, we would need 1040 cells. (italics added; p. 62-63)

So for Behe it’s simple: Multiple the probabilities of the (allegedly) independent events to get the probability of the joint event. Let’s do that for the Avida case study lineage. Again, seven of the mutations that produced components of the final irreducibly complex structure were unselected when they first appeared.

There are 26 instructions in the Avida instruction set, so the probability of any given instruction occurring via a random insertion or substitution mutation is 1/25. On average the genome of the lineage that produced the first critter to perform EQU had fewer than 60 instructions through the course of most of its history – it started at 50 instructions in the Ancestor, increased to 61 instructions by step 92, and then shrank back to 60 at step 111 when the EQU-performing critter appeared. Call it an average of 55 instructions. Since sequence is important in the Avida genome – it is after all an opcode program – the probability that a given mutation will appear via insertion or substitution in a given position is then about 1/55 * 1/25, or 0.000727. On Behe’s assumptions (mainly independence), the probability that the seven specific unselected mutations would occur in the specific positions in which they did is 0.0007277, or 1.076x10-22.

Now, 1.076x10-22 is a pretty small probability, but it gets worse. Critters capable of performing EQU evolved in 23 of the 50 runs in Lenski, et al.’s main experiment. If the other 22 lineages were comparable to the case study run (and there’s no reason to suppose they weren’t), then the joint probability of all 23 runs evolving via pathways including seven unselected mutations is (1.076x10-22)23, or a number so small (on the order of 10-506) that Excel calls it zero. Excel poops out at (1.076x10-22),14. Clearly it couldn’t have happened.

Of course, that’s a ridiculous set of numbers, since the probability calculations do not veridically map the phenomena. Behe’s calculations are directly parallel to those of young-earth creationist Henry Morris in his 1974 Scientific Creationism and they are equally ludicrous. Unfortunately for Behe, they are informative only about the ignorance of the person doing the calculating. They ignore the role of neutral mutations in making multiple potential evolutionary pathways available and the probability amplification of cumulative selection – see Jerry Coyne’s review and ths recent Nature brief review of the reconstruction of selectable pathways in molecular evolution. Of particular interest in the latter review is a description of one study in which 120 potential evolutionary pathways from a precursor to a ‘final’ structure were identified. Of the 120 potential pathways, 18 were composed of selectable steps all the way. Now, if one picked out one of the pathways with unselectable steps, one could marvel at how an intelligent designer was necessary to bridge the gaps. On the other hand, if one knows there are pathways in which all steps are selectable, no designer is necessary. Behe consistently picks out just one path and marvels at the gaps.

V. The topography of fitness landscapes

As noted above, Mark Chu-Carroll analyzed Behe’s argument from teeny-weeny numbers, showing that it is based on fundamentally flawed assumptions about the topography of fitness landscapes and the notion that evolving populations cannot escape from local maxima. Mark and ERV have already done the heavy lifting on the fitness landscape issue. My remarks corroborate theirs.

That 25% of the mutational steps in the Lenski, et al. case study lineage were unselected when they occurred is consistent with the dispersal of populations along selectively nearly-equal ‘ridges’ in fitness space. Those ridges enable subpopulations to escape apparent local maxima, especially in high-dimensioned genotype spaces where ‘local’ maxima are maxima only in a subset of the dimensions. That’s an empirical demonstration of Gavrilets’ argument that the topography of fitness landscapes in high-dimensioned spaces is akin to a highly interconnected network with genotypes connected by traversable ‘ridges’ of nearly-equal fitness, rather than like a one- or two-dimension surface with inescapable local maxima as in Behe’s conception. As John Wilkins described it,

Gavrilets uses a different metaphor, and I must point out that it is indeed a metaphor not a model, of a holey landscape, in which there is a high region of fitness in the landscape, interspersed with regions of low fitness (see figure). In the high fitness region, a random walk can take you all over the place. Of course, one thing Gavrilets insists upon is the high dimensionality of the space of fitness combinations - the 2-dimensional space here is a necessary limitation of paper.

Put differently, high-dimensioned fitness spaces are like Swiss cheese, with the (relatively) high fitness cheese surrounding local low-fitness ‘holes’. Behe actually used a figure from Gavrilets’ book, but as Mark Chu-Carroll pointed out, Behe confines his consideration to low-dimensioned spaces. Even worse, Behe footnotes Gavrilets’ book in his discussion of rugged fitness landscapes, but completely ignores Gavrilets’ main argument! Thus while Behe is demonstrably aware of Gavrilets’ work (after all, he uses Gavrilets’ figure), he ignores Gavrilets’ analysis. Maybe he just looked at the pictures.

VI. From dogs to cats?

Finally, I can’t resist quoting from a radio interview Behe recently gave plugging his book. On Michael Medved’s program there was this interchange at about 23:30:

Medved: “What you’re talking about really is the leaps, aren’t you. I mean the kind of random mutations, or allegedly random mutations, who (sic) create a new species.”

Behe: “Yeah, well I wouldn’t call it species. I’d go a little higher, maybe genus or something in biology. Biology has a number of levels and you might be able to get, say, from a wolf to a dog using random mutation and natural selection. But I don’t think you can get from a dog to a cat or a precursor organism and get from a dog to a cat or certainly to an elephant.”

That’s pure creationist drivel. I half expected Behe to start telling us about how all those kinds fit on the ark.

VII. Take-home summary

Behe studiously ignores contradictory data, is ignorant of relevant properties of fitness landscapes, misrepresents evolutionary processes and research, and misuses probabilities in exactly the fashion of young-earth creationists. His book is an extended sham and will appeal only to a pre-committed and biologically ignorant audience. It’s a damn shame that its appeal depends on such egregiously poor scholarship, flat ignorance, and apparently purposeful misrepresentations.