The Revenge of Calvin and Hobbes
In “Darwin”s Black Box” (DBB), ID”s arch-biochemist Behe glibly labeled evolutionary hypotheses for the origin of “irreducibly complex” systems as “hops into the box of Calvin and Hobbes” (for those who don”t know what the heck this refers to, go here to learn about Calvin and Hobbes, and here for info on their box, or even better go spend some time here, and come back tomorrow). This overconfidence has come back to haunt him as more and more evidence accumulated in support of the evolutionary origin of his various IC systems, from the flagellum to the complement and clotting cascades.
The topic where the idea of unevolvability of IC systems has probably taken the most beating is the vertebrate adaptive immune system, where not only evidence for evolution has accumulated at a steady pace, but even more embarrassingly for Behe, it has developed exactly along the lines predicted by those “Calvin and Hobbes jumps” he originally dismissed. A recent paper in the journal PLoS Biology  is the latest turn in the death spiral of irreducible complexity of the immune system, and I think provides a good opportunity to take a look at how science works, as opposed to ID navel-gazing.
Let me start with a brief description of the issue. Basically, almost every organism faces the problem of pathogens and parasites, and most of them solve it, in broad terms, using molecular detection systems that can discriminate “self” from “non-self”, allowing the elimination of the latter. Of course, the problem of discriminating you from them is that there is only one you, and an almost infinite variety of them “ so any effective discrimination system has to be flexible enough to recognize very many different forms of non-self, and able to do so in a parsimonious way in term of use of molecular components (no organism can possibly hope to have a genome large enough to encode even one single specific detector molecule for every possible pathogen). In evolutionary terms, any successful immune system has therefore to satisfactorily juggle, among other things, these two contrasting selective pressures: diversity of effective target vs metabolic/genetic parsimony. Jawed vertebrates (which include cartilaginous and bony fish, amphibians, reptiles, birds and mammals) happen to have hit on a solution that is, frankly, way cool. (And I am not saying it just because I work on this.)
In all jawed vertebrates, the adaptive immune system “detectors” (receptors), are encoded not by single, stable genes, but by families of gene segments that change their conformation on DNA (rearrange) during the development of immune cells. By randomly joining one segment from each of the different families (called “V”, “D” or “J”, each containing from a few to a few hundred members) into a single coding sequence, the immune system can generate many thousands of different genetic combinations, each encoding a different receptor capable of recognizing a different molecular target. The proteins that mediate this DNA rearrangement process (“VDJ recombination”) are called RAG1 and RAG2, and they act on specific DNA sequences, recombination signal sequences, or “RSSs”, which flank the rearranging V, D and J segments. (For those who want to know more, Matt Inlay’s excellent summary, especially its “IC system II” section, serves as a very good primer).
Those of you who are used to the ID approach on science, i.e. giving up on it, can probably already see where the problem lies: this is a complex system of functionally inter-related components that, looked at superficially, simply cannot work in isolation. Behe was absolutely certain of this in 1996:
In the absence of the machine [RAG1/RAG2], the parts [V, D and J gene segments] never get cut out and joined. In the absence of the signals [RSSs], it’s like expecting a machine that’s randomly cutting paper to make a paper doll. And, of course, in the absence of the message for the antibody itself, the other components would be pointless.
DBB, p. 130
Now, in evolutionary terms the obvious question to ask is indeed what function could any precursor of this system have had, before the evolution of the adaptive immune system. Some ideas were already around at the time of DBB’s publication, and had been for a while. Already in 1979, Sakano, Tonegawa (who would later win a Nobel Prize for his discovery of VDJ recombination) and colleagues identified the RSSs and noticed that they shared features with the recombination sequences of certain mobile DNA elements called transposons .
Transposons are odd fellows in the DNA world, who spend their time physically hopping from genomic site to genomic site, and replicating themselves, pretty much as “molecular parasites”. They do this via a number of mechanisms, but the kind we are interested here are a class of DNA transposons which carry within their own sequences genes encoding the necessary enzymes (“transposases”) for cutting themselves off the genomic DNA (“excision”), and re-inserting somewhere else (“integration”). At the very end of each transposon element is a characteristic sequence, which is recognized by the specific transposase (I am sure you are already seeing the parallel with VDJ recombination).
A decade after the discovery of VDJ recombination the responsible enzymes, RAG1 and RAG2, were identified, and lo and behold their genes had a funny look about them: just like transposases, they were almost devoid of introns, and mapped right next to each other in the genome (transposons need to “travel light”, and cannot carry excess DNA as they hop around). This is when David Baltimore, in whose lab the RAG genes were discovered, and others wrote the review in the Proceedings of the National Academy of Sciences USA  that was mocked by Behe as proposing a “hop in the box of Calvin and Hobbes” for openly stating the transposon hypothesis: that RAGs/RSSs were the remnants of some sort of transposon system that integrated itself into a non-rearranging immune receptor, and became “enslaved” to it, causing the integrated portion to “pop out” whenever the gene became active, and in so doing generated useful diversity for immune target recognition.
To be fair, at the time the hypothesis was indeed quite a stretch, but still, a stretch that made some specific predictions. No sooner had Behe’s words been put to print, those predictions started coming true. What follows is a short timeline, with the major milestones:
- DBB published. In it, Behe says:
” the complexity of the [VDJ recombination] system dooms all Darwinian explanations to frustration. [my emphasis] DBB, p. 139
- In the same year, the Gellert lab, which had developed a system to study VDJ recombination in a test tube with purified proteins, discovers a striking similarity between the RAG-mediated reaction and that of known transposases and integrases: both proceed through a characteristic intermediate in which the DNA takes an unusual hairpin-like shape . This was really a breakthrough finding, the first solid piece of evidence in favor of the transposon hypothesis. But are RAGs actually a transposase”
1998: The Gellert and Schatz lab independently discover that RAGs can, under certain in vitro conditions, mediate actual transposition reactions by inserting cleaved DNA ends containing RSSs into double-stranded target DNA. This is the other side of the transposon “life cycle”, the insertion phase. In other words, although the RAGs’ physiological activity requires them only to cut DNA out of the genome, they bear, buried inside their structure, the ability to insert RSSs into DNA, a sort of molecular vestigial structure. This makes sense if RAGs are indeed an evolved transposase, but is harder to justify from a design perspective, because transposition events, by potentially disrupting genes, can actually be quite deleterious, for instance by causing cancer.
2000: Indirect evidence is identified that suggests that transposition reactions mediated by RAGs can occur not just in vitro, but within mammalian cells .
2003: Direct evidence of RAG-mediated transposition in yeast and mammalian cells is uncovered [8, 9]. In other words, RAGs are a transposase in eukaryotic cells.
2004: Molecular evidence uncovers a class of transposons called hAT that use a transposition mechanisms essentially identical to that used by RAG proteins, and, in addition, that their enzymes share some basic similarity with RAGs in their active site . (This paper was discussed by Matt here on PT a few months ago)
2005: RAGs find their long-lost family of transposases . In this paper, Kapitonov and Jurka take a fully evolutionary, biocomputational approach to figure out where RAGs may have come from. Let’s look at what they did, and how.
They started from the observation of a low, borderline significant sequence similarity between a portion of RAG1 and certain transposases of the Transib family. They applied then a different algorithm for protein similarity searches, which uses information from a similarity search to “hone” successive iterations of the same search, by assigning position-specific scores to amino acid residues. In other words, it searches for “deep” homologies that are reflected by the presence of specific sequence motifs within proteins that may otherwise have diverged significantly (and thus yield poor scores at a direct alignment). What they found was that 10 motifs were very highly conserved between RAG1 proteins from various species and Transib transposases. Figure 1 below shows the alignment of these motifs, where every letter corresponds to an amino acid, and color patterns indicate amino acids which have similar physico-chemical properties (and can therefore often replace each other without much disruption of structure and function). The similarities are very statistically significant.
Figure 1 “ Alignment of conserved regions in Transib transposases and RAG1 (Click on the figure to see a larger version from the original paper)
They next compared the sequence of the RSSs to those of known Transib transposon signal sequences (terminal inverted repeats, or TIRs), and they found another striking correlation: all the positions that are strongly conserved in TIRs are also strongly conserved in RSSs. This is shown in Figure 2.
Figure 2 “ Alignment of TIRs and RSSs. Panel A shows a graph of the nucleotide sequence conservation (with 1.0 = absolutely conserved) at different positions in a large panel of TIR families (sequences shown in panel B). Under the graph are the “consensus” TIR sequence (representing the most common nucleotide at each position), aligned with the consensus RSS, which consists of a 7-nucleotide sequence, followed by a spacer, and another conserved 9-nucleotide sequence. Boxed nucleotides are those that show highest conservation, and are absolutely required for the mechanisms. Panel C shows that not only are the crucial positions in the sequences conserved, but their overall structure also is. Like RSSs, which require “spacer” elements of either 12 or 23 nucleotides to pair for efficient rearrangement, so do the TIRs at the ends of each transposon have different and specified length, which correspond to a 12- or 23 nucleotide distance between the conserved sequences (the reason for these numbers is that each turn of the DNA double helix is about 11-12 bp, so sequences 12 or 23 bp apart will be on the same side of the helix, one or two turns apart, and simultaneously accessible to recognition by any binding factor).
What this means is that a simple system exists, with both a RAG1-like gene and RSSs, as an independent functional unit: what we would expect for a direct, “reduced” predecessor to the supposedly irreducible VDJ recombinase system. But there’s more: while extending their search to the genome databases from various organisms, Kapitonov and Jurka found a number of other RAG1 homologues in various organisms, including some in which the similarity extended beyond the protein “core” they had originally search for, all the way to the so-called N-terminal region of the protein. There is therefore a family of close RAG1-related proteins in various organisms. The distribution of the various homologues in different lineages is shown below.
Figure 3: RAG-like proteins and Transib transposons in various organisms. Red circles represent Transib transposons, orange and blue ellipses RAG1 core and N-terminus homologues, and gray rectangles RAG2 proteins.
Note that these new RAG1-related proteins are not known to be associated with any Transib-like transposons. Some are clearly pseudogenes, and the function of others is unknown. The overall picture that emerges is that of a complex, diffuse and diverse family, which has accompanied metazoan evolution for a while, with multiple instances of horizontal gene transfer (quite common for mobile DNA elements) and of independent “adoption” by the host genomes of family members. Exactly the picture which one would predict would facilitate the occurrence of random integration of a transposable element within a primordial antigen receptor gene, causing junctional diversification (that is, protein variation at the excision site), and therefore an increase in target binding ability: the “transposon hypothesis” for adaptive immune system evolution.
Which brings me to the last item in the story. At the time Behe wrote, no known potential precursor of the immune system receptors existed outside jawed vertebrates. Many proteins belong to the same structural family of antigen receptors, but none carried the same exact sequence hallmarks. That has changed too: at least 3 protein families have been now identified in protochordates and jawless vertebrates which have non-rearranging V-like segments of the same kind of antigen receptors [11-13]. They show presence of multiple, closely related members, suggesting that selective pressure exists for their diversification, and some may even be involved in “innate” immune responses. Although it is almost impossible to say whether any of these proteins is in fact the direct descendant of the ancestral receptor of the adaptive immune system, their existence suggests a rich evolutionary history of non-rearranging immune receptors predating VDJ recombination adaptive immunity.
Let’s summarize: where once Behe saw an “irreducibly complex” system made of a) a receptor gene, b) a RAG recombinase, and c) RSSs, we now know that a) whole families of non-rearranging receptors and b) a whole family of functional RAG1 homologues acting on c) RSS-like sequences already existed before the emergence of the vertebrate adaptive immune system. Exactly what we would expect to see if the adaptive immune system did arise via an evolutionary process, as opposed to poof into existence in its complete form.
So, what next? Well, for one, we still don’t know where RAG2 came from. So far, no RAG2-like genes have not been found, inside or outside transposons. However, the lack of introns and chromosomal location of RAG2, right next to RAG1, are too strong a hint to dismiss, so I think the prediction still remains that a RAG2 ancestor will be found in association with a mobile DNA element, along with a RAG1-like transposase. In the context of VDJ recombination, RAG2 seems to play mostly a regulatory role, so it would not be surprising if its ancestor did something similar. However, it is possible that considerable sequence divergence may have occurred for this protein, since mechanisms for transposon regulation may be significantly different from those required for VDJ regulation. Thankfully, much work remains to be done - that’s what scientists are for.
Is Behe going to concede that evolutionary models for the origin of VDJ recombination are gaining more and more support by the day? Probably not, frankly. No matter how many predictions get verified, how many plausible precursors are identified, Behe and the ID advocates will retreat further and further into impossible demands, such as asking for mutation-by-mutation accounts of specific evolutionary pathways, as if one could meaningfully recreate in the lab the precise evolutionary conditions which some mud-dwelling lamprey-like critter experienced some time in the Cambrian. Too much has been invested by ID advocates in the “irreducibly complexity” concept for them to recognize its significance (assuming it ever had any, given its recurrent reformulations) has essentially collapsed.
For the rest of us, the lesson to be learned is that even wild hypotheses, if rational, consistent with available evidence, predictive and testable, are worth considering and pursuing. Behe said:
We can look high or we can look low, the result is the same. The scientific literature has no answer to the question of the origin of the immune system. DBB, p. 138
Yet the answer was there all along, in the only place where Behe refused to look: in the box of Calvin and Hobbes.
Acknowledgements Thanks to Matt Inlay and the rest of the PT crew for info, comments and suggestions.
References 1. Kapitonov VV, Jurka J. RAG1 Core and V(D)J Recombination Signal Sequences Were Derived from Transib Transposons. PLoS Biol. 2005;3: e181 [Epub ahead of print]
2 Sakano H, Huppi K, Heinrich G, Tonegawa S. Sequences at the somatic recombination sites of immunoglobulin light-chain genes. Nature. 1979; 280: 288-94.
3. Bartl S, Baltimore D, Weissman IL. Molecular evolution of the vertebrate immune system. Proc Natl Acad Sci U S A. 1994; 91: 10769-70.
- van Gent DC, Mizuuchi K, Gellert M. Similarities between initiation of V(D)J recombination and retroviral integration. Science. 1996; 271: 1592-4.
5 Hiom K, Melek M, Gellert M. DNA transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell. 1998; 94: 463-70.
6 Agrawal A, Eastman QM, Schatz DG. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature. 1998; 394: 744-51.
Vaandrager JW, Schuuring E, Philippo K, Kluin PM. V(D)J recombinase-mediated transposition of the BCL2 gene to the IGH locus in follicular lymphoma. Blood. 2000; 96: 1947-52.
Clatworthy AE, Valencia MA, Haber JE, Oettinger MA. V(D)J recombination and RAG-mediated transposition in yeast. Mol Cell. 2003; 12: 489-99.
9 Messier TL, O’Neill JP, Hou SM, Nicklas JA, Finette BA. In vivo transposition mediated by V(D)J recombinase in human T lymphocytes. EMBO J. 2003; 22: 1381-8.
Zhou L, Mitra R, Atkinson PW, Hickman AB, Dyda F, Craig NL. Transposition of hAT elements links transposable elements and V(D)J recombination. Nature. 2004; 432: 995-1001.
Cannon JP, Haire RN, Litman GW. Identification of diversified genes that contain immunoglobulin-like variable regions in a protochordate. Nat Immunol. 2002; 3: 1200-7.
12: Cannon JP, Haire RN, Pancer Z, Mueller MG, Skapura D, Cooper MD, Litman GW. Variable domains and a VpreB-like molecule are present in a jawless vertebrate. Immunogenetics. 2005; 56: 924-9.
13: Suzuki T, Shin-I T, Fujiyama A, Kohara Y, Kasahara M. Hagfish leukocytes express a paired receptor family with a variable domain resembling those of antigen receptors. J Immunol. 2005; 174: 2885-91.