Gene duplication versus ID

| 21 Comments | 1 TrackBack

Gene duplication is mentioned as one example in which information and complexity in the genome can increase leading to biochemical novelty and even irreducibly complex systems. Despite this ID proponents object to gene duplication as a relevant mechanism. I will explore some of the objections and show how science has and is addressing these objections. I intend to show that the objections raised by ID proponents are mostly without merit.


Behe mentions in this 2003 interview that

My current work is an attempt to model the evolution of new protein functions through gene duplication. Gene duplication is purported to be a major pathway for the Darwinian evolution of biochemical novelty. However, as in other areas, Darwinists have not closely examined whether gene duplication can realistically do all that they ascribe to it. I hope to help them out in this area by asking those questions.

It seems that Behe may not be familiar with the research on gene duplication when he states that

The hitch, as always, is that Darwinists virtually never explain in any detail how natural selection would actually get from protein A to protein B after the gene for protein A duplicated. After all, gene duplication just leaves you with a second copy of the same gene — nothing different. The problem is, as everyone agrees, that the duplicated gene is much more likely to suffer a deleterious mutation than a beneficial one. Nonetheless, Darwinists hope that the occasional beneficial mutation just might come along. However, they never look very deeply into the matter. It turns out that to acquire some new functions, such as the capacity to bind a new molecule, multiple mutations would be expected to be needed, not just a single mutation. The requirement for multiple mutations would quickly render gene duplication an untenable explanation, since a duplicated gene would be riddled with deleterious mutations before acquiring several positive ones.

The scientific evidence

Force, A., M. Lynch, F.B. Pickett, A. Amores, Y.-L. Yan, and J. Postlethwait. The preservation of duplicate genes by complementary degenerative mutations. Genetics 151:1531-1545. 1999.

Gene duplication is commonly given as the explanation for the increase in complexity via the acquisition of new functions. This paper addresses the standard scenario of duplication followed by either an adaptive mutation leading to the preservation of both genes or followed by degeneration of one of the copies. Since detrimental mutations are more likely than benificial mutations, the classical model predict that one of the duplicated genes will become a psuedogene. Actual data seems to indicate that the number of functional copies is larger than expected from the classical model and the authors present an interesting alternative. The alternative explains duplicate gene preservation by the fixation of a degenerative mutation rather than a more rare benificial mutations. The authors also present data from the Zebrafish consistent with this new model.

ABSTRACT The origin of organismal complexity is generally thought to be tightly coupled to the evolution of new gene functions arising subsequent to gene duplication. Under the classical model for the evolution of duplicate genes, one member of the duplicated pair usually degenerates within a few million years by accumulating deleterious mutations, while the other duplicate retains the original function. This model further predicts that on rare occasions, one duplicate may acquire a new adaptive function, resulting in the preservation of both members of the pair, one with the new function and the other retaining the old. However, empirical data suggest that a much greater proportion of gene duplicates is preserved than predicted by the classical model. Here we present a new conceptual framework for understanding the evolution of duplicate genes that may help explain this conundrum. Focusing on the regulatory complexity of eukaryotic genes, we show how complementary degenerative mutations in different regulatory elements of duplicated genes can facilitate the preservation of both duplicates, thereby increasing long-term opportunities for the evolution of new gene functions. The duplication-degeneration-complementation (DDC) model predicts that (1) degenerative mutations in regulatory elements can increase rather than reduce the probability of duplicate gene preservtion and (2) the usual mechanism of duplicate gene preservation is the partitioning of ancestral functions rather than the evolution of new functions. We present several examples (including analysis of a new engrailed gene in zebrafish) that appear to be consistent with the DDC model, and we suggest several analytical and experimental approaches for determining whether the complementary loss of gene subfunctions or the acquisition of novel functions are likely to be the primary mechanisms for the preservation of gene duplicates.

The authors distinguish between nonfunctionalization of the duplicate gene, neofunctionalization or subfunctionalization. Subfunctionalization happens when genes are preserved by complementary degenerative mutations.

Figure 1. Three potential fates of duplicate gene pairs with multiple regulatory regions. The small boxes denote regulatory elements with unique functions, and the large boxes denote transcribed regions. Solid boxes denote intact regions of a gene, while open boxes denote null mutations, and triangles denote the evolution of a new function. Because the model focuses on mutations fixed in populations, the diagram shows the state of a single gamete. In the first two steps, one of the copies acquires null mutations in each of two regulatory regions. On the left, the next fixed mutation results in the absence of a functional protein product from the upper copy. Because this gene is now a nonfunctional pseudogene, the remaining regulatory regions associated with this copy eventually accumulate degenerative mutations. On the right, the lower copy acquires a null mutation in a regulatory region that is intact in the upper copy. Because both copies are now essential for complete gene expression, this third mutational event permanently preserves both members of the gene pair from future nonfunctionalization. The fourth regulatory region, however, may still eventually acquire a null mutation in one copy or the other. In the center, a regulatory region acquires a new function that preserves that copy. If the beneficial mutation occurs at the expense of an otherwise essential function, then the duplicate copy is preserved because it retains the original function.

See also

Lynch, M., and A. Force. The probability of duplicate-gene preservation by subfunctionalization. Genetics 154: 459-473. 2000

Lynch, M., and A. Force. Gene duplication and the origin of interspecific genomic incompatibility. American Naturalist 156: 590-605. 2000.

Michael Lynch, Martin O’Hely, Bruce Walsh, and Allan Force. The probability of preservation of a newly arisen gene duplicate. Genetics 2001 159: 1789-1804.

Force, A. G., Cresko, W. A., and F. B. Pickett. Infomational accretion, gene duplication, and the mechanisms of genetic module parcellation. In Modularity in Development and Evolution (in press), G. Schlosser and G. Wagner. 2002

Examples of gene duplication and innovative functions

antifreeze protein

Figure 4. Likely mechanism by which an ancestral trypsinogen gene was transformed into an AFGP gene. The 5 end (E1, I1, and small segment of E2) and the 3 end (I5 3 splice site and E6) of trypsinogen gene were recruited and linked, and the remainder of the gene deleted (dashed lines and boxes). The Thr-Ala-Ala coding element was duplicated, presumably via slippage at the repetitive (gt)n sequence during replication. The recruited E1 provided the 5 UTR and signal peptide sequences for the new AFGP gene. The deletion, linking, and amplification events led to a 1-nt frameshift resulting in a termination codon (tga) at the start of the recruited trypsinogen E6 and converting it into the 3 flanking sequence of the AFGP gene. The spacer sequence (bars filled with zigzagged lines) and additional I1 sequence might be existing sequence in the trypsinogen progenitor gene or acquired through recombinatory events. The Thr-Ala-Ala coding duplicants plus a spacer became amplified de novo to form the new AFGP polyprotein coding region. The regions of identity are illustrated as in Fig. 1. Splice sites in trypsinogen gene are given in italics.

“Origin of antifreeze protein genes: A cool tale in molecular evolution”, John M. Logsdon Jr. and W. Ford Doolittle Proc. Natl. Acad. Sci. USA Vol. 94, pp. 3485-3487, April 1997

“Evolution of anti-freeze glycoprotein from a trypsinogen gene in Antarctic notothenoid fish”, Chen L, DeVries AL, Cheng CC, Proceedings of the National Academy of Science 94:3811-16, April 1997

Abstract: Freezing avoidance conferred by different types of antifreeze proteins in various polar and subpolar fishes represents a remarkable example of cold adaptation, but how these unique proteins arose is unknown. We have found that the antifreeze glycoproteins (AFGPs) of the predominant Antarctic fish taxon, the notothenioids, evolved from a pancreatic trypsinogen. We have determined the likely evolutionary process by which this occurred through characterization and analyses of notothenioid AFGP and trypsinogen genes. The primordial AFGP gene apparently arose through recruitment of the 5 and 3 ends of an ancestral trypsinogen gene, which provided the secretory signal and the 3 untranslated region, respectively, plus de novo amplification of a 9-nt Thr-Ala-Ala coding element from the trypsinogen progenitor to create a new protein coding region for the repetitive tripeptide backbone of the antifreeze protein. The small sequence divergence (4-7%) between notothenioid AFGP and trypsinogen genes indicates that the transformation of the proteinase gene into the novel ice-binding protein gene occurred quite recently, about 5-14 million years ago (mya), which is highly consistent with the estimated times of the freezing of the Antarctic Ocean at 10-14 mya, and of the main phyletic divergence of the AFGP-bearing notothenioid families at 7-15 mya. The notothenioid trypsinogen to AFGP conversion is the first clear example of how an old protein gene spawned a new gene for an entirely new protein with a new function. It also represents a rare instance in which protein evolution, organismal adaptation, and environmental conditions can be linked directly.

“Functional Antifreeze Glycoprotein Genes in Temperate-Water New Zealand Nototheniid Fish Infer an Antarctic Evolutionary Origin”, Chi-Hing C. Cheng, Liangbiao Chen, Thomas J. Near, and Yumi Jin, Mol. Biol. Evol. 20(11):1897-1908. 2003

Abstract:The fish fauna of the Antarctic Ocean is dominated by five endemic families of the Perciform suborder Notothenioidei, thought to have arisen in situ within the Antarctic through adaptive radiation of an ancestral stock that evolved antifreeze glycoproteins (AFGPs) enabling survival as the ocean chilled to subzero temperatures. The endemism results from geographic confinement imposed by a massive oceanographic barrier, the Antarctic Circumpolar Current, which also thermally isolated Antarctica over geologic time, leading to its current frigid condition. Despite this voluminous barrier to fish dispersal, a number of species from the Antarctic family Nototheniidae now inhabit the nonfreezing cool temperate coasts of the southern continents. The origin of these temperate-water nototheniids is not completely understood. Since the AFGP gene apparently evolved only once, before the Antarctic notothenioid radiation, the presence of AFGP genes in extant temperate-water nototheniids can be used to infer an Antarctic evolutionary origin. Genomic Southern analysis, PCR amplification of AFGP genes, and sequencing showed that Notothenia angustata and Notothenia microlepidota endemic to southern New Zealand have two to three AFGP genes, structurally the same as those of the Antarctic nototheniids. At least one of these genes is still functional, as AFGP cDNAs were obtained and low levels of mature AFGPs were detected in the blood. A phylogenetic tree based on complete ND2 coding sequences showed monophyly of these two New Zealand nototheniids and their inclusion in the monophyletic Nototheniidae consisted of mostly AFGP-bearing taxa. These analyses support an Antarctic ancestry for the New Zealand nototheniids. A divergence time of approximately 11 Myr was estimated for the two New Zealand nototheniids, approximating the upper Miocene northern advance of the Antarctic Convergence over New Zealand, which might have served as the vicariant event that lead to the northward dispersal of their most recent common ancestor. Similar secondary northward dispersal likely applies to the South American nototheniid Paranotothenia magellanica, which has four AFGP genes in its DNA, but not to the sympatric nototheniid Patagonotothen tessellata, which does not appear to have any AFGP sequences in its genome at all.

Caenorhabditis elegans

“The Structure and Early Evolution of Recently Arisen Gene Duplicates in the Caenorhabditis elegans Genome “, Vaishali Katju and Michael Lynch, Genetics, Vol. 165, 1793-1803, December 2003

Abstract: The significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with 10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms.

Tubulin genes

“Evolution, Organization, and Expression of -Tubulin Genes in the Antarctic Fish Notothenia coriiceps: ADAPTIVE EXPANSION OF A GENE FAMILY BY RECENT GENE DUPLICATION, INVERSION, AND DIVERGENCE “, Sandra K. Parker and H. William Detrich III, J Biol Chem, Vol. 273, Issue 51, 34358-34369, December 18, 1998

To assess the organization and expression of tubulin genes in ectothermic vertebrates, we have chosen the Antarctic yellowbelly rockcod, Notothenia coriiceps, as a model system. The genome of N. coriiceps contains ~15 distinct DNA fragments complementary to -tubulin cDNA probes, which suggests that the -tubulins of this cold-adapted fish are encoded by a substantial multigene family. From an N. coriiceps testicular DNA library, we isolated a 13.8-kilobase pair genomic clone that contains a tightly linked cluster of three -tubulin genes, designated NcGTba, NcGTbb, and NcGTbc. Two of these genes, NcGTba and NcGTbb, are linked in head-to-head (5’ to 5’) orientation with ~500 bp separating their start codons, whereas NcGTba and NcGTbc are linked tail-to-tail (3’ to 3’) with ~2.5 kilobase pairs between their stop codons. The exons, introns, and untranslated regions of the three -tubulin genes are strikingly similar in sequence, and the intergenic region between the a and b genes is significantly palindromic. Thus, this cluster probably evolved by duplication, inversion, and divergence of a common ancestral -tubulin gene. Expression of the NcGTbc gene is cosmopolitan, with its mRNA most abundant in hematopoietic, neural, and testicular tissues, whereas NcGTba and NcGTbb transcripts accumulate primarily in brain. The differential expression of the three genes is consistent with distinct suites of putative promoter and enhancer elements. We propose that cold adaptation of the microtubule system of Antarctic fishes is based in part on expansion of the - and -tubulin gene families to ensure efficient synthesis of tubulin polypeptides.

“Tandem sequence duplications functionally complement deletions in the D1 protein of Photosystem II”, Kless H, Vermaas W, J Biol Chem 270(28): 16536-165451, July 1995

Obligate photoheterotrophic mutants of the cyanobacterium Synechocystis sp. PCC 6803 that carry deletions of conserved residues in the plastoquinone-binding niche of the D1 protein were used to select for spontaneous mutations that restore photoautotrophic growth. Spontaneous pseudorevertants emerged from two deletion mutants, YNIV and NN, when the cultures were maintained long after the carbon source (glucose) had been depleted from the medium and cells had reached stationary phase. Most pseudorevertants were found to contain tandem duplications of 6-45-base pair DNA sequences located close to the domain carrying the deletion; none of them restored the wild-type sequence. Three pseudorevertants isolated from the YNIV mutant contained a duplication (7-15 codons) of the DNA sequence immediately downstream of the deletion; the protein region encoded by this DNA may include part of the putative de helix, an important constituent of the plastoquinone-binding niche. Three pseudorevertants isolated from the NN mutant contained duplications corresponding to 2-8 amino acid residues adjacent to the site of the deletion. In all six pseudorevertants carrying duplications, the length of the D1 protein in the modified regions was restored to at least the length present in wild type, suggesting that a minimal length of these protein domains may be required for functional integrity. In another photoautotrophic strain isolated from NN, no secondary mutations could be identified in the gene coding for the D1 protein; such mutations apparently reside on another protein subunit of the photosystem II complex. Photosystem II function in the pseudorevertants was altered as compared with wild type in terms of growth and oxygen evolution rates, photosystem II concentration, the semiquinone equilibrium at the acceptor side, and thermostability. A mechanism leading to tandem sequence duplication may involve DNA damage followed by DNA synthesis, strand displacement, and ligation.

“Transposable elements are found in a large number of human protein-coding genes”, Nekrutenko A, Li W-H, Trends in Genetics 17(11):619-621 Nov ‘01

To study the genome-wide impact of transposable elements (TEs) on the evolution of protein-coding regions, we examined 13 799 human genes and found 533 (approximately 4%) cases of TEs within protein-coding regions. The majority of these TEs (approximately 89.5%) reside within ‘introns’ and were recruited into coding regions as novel exons. We found that TE integration often has an effect on gene function. In particular, there were two mouse genes whose coding regions consist largely of TEs, suggesting that TE insertion might create new genes. Thus, there is increasing evidence for an important role of TEs in gene evolution. Because many TEs are taxon-specific, their integration into coding regions could accelerate species divergence.

“Positive Darwinian selection after gene duplication in primate ribonuclease genes”, Zhang J, Rosenberg HF, Nei M, PNAS 95: 3708-3713, Mar ‘98

Evolutionary mechanisms of origins of new gene function have been a subject of long-standing debate. Here we report a convincing case in which positive Darwinian selection operated at the molecular level during the evolution of novel function by gene duplication. The genes for eosinophil cationic protein (ECP) and eosinophil-derived neurotoxin (EDN) in primates belong to the ribonuclease gene family, and the ECP gene, whose product has an anti-pathogen function not displayed by EDN, was generated by duplication of the EDN gene about 31 million years ago. Using inferred nucleotide sequences of ancestral organisms, we showed that the rate of nonsynonymous nucleotide substitution was significantly higher than that of synonymous substitution for the ECP gene. This strongly suggests that positive Darwinian selection operated in the early stage of evolution of the ECP gene. It was also found that the number of arginine residues increased substantially in a short period of evolutionary time after gene duplication, and these amino acid changes probably produced the novel anti-pathogen function of ECP.

“Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey”, Zhang J, Zhang Y-P, Rosenberg HF, Nature Genetics 30:411-415, April ‘02

Although the complete genome sequences of over 50 representative species have revealed the many duplicated genes in all three domains of life, the roles of gene duplication in organismal adaptation and biodiversity are poorly understood. In addition, the evolutionary forces behind the functional divergence of duplicated genes are often unknown, leading to disagreement on the relative importance of positive Darwinian selection versus relaxation of functional constraints in this process. The methodology of earlier studies relied largely on DNA sequence analysis but lacked functional assays of duplicated genes, frequently generating contentious results. Here we use both computational and experimental approaches to address these questions in a study of the pancreatic ribonuclease gene (RNASE1) and its duplicate gene (RNASE1B) in a leaf-eating colobine monkey, douc langur. We show that RNASE1B has evolved rapidly under positive selection for enhanced ribonucleolytic activity in an altered microenvironment, a response to increased demands for the enzyme for digesting bacterial RNA. At the same time, the ability to degrade double-stranded RNA, a non-digestive activity characteristic of primate RNASE1, has been lost in RNASE1B, indicating functional specialization and relaxation of purifying selection. Our findings demonstrate the contribution of gene duplication to organismal adaptation and show the power of combining sequence analysis and functional assays in delineating the molecular basis of adaptive evolution.

“Origin of new genes and source for N-terminal domain of the chimerical gene, jingwei, in Drosophila”, Long M, Wang W, Zhang J, Gene 238: 135-141, Sep 99

This paper deals with a general question posed by the origin of new processed chimerical genes: when a new retrosequence inserts into a new genome position, how does it become activated and acquire novel protein function by recruiting new functional domains and regulatory elements? Jingwei (jgw), a newly evolved functional gene with a chimerical structure in Drosophila, provides an opportunity to examine such questions. The source of its exon encoding C-terminal peptide has been identified as an Adh retrosequence, which extends the concept of exon shuffling from recombination to retroposition as a general molecular mechanism for the origin of a new gene. However, the origin of 5’ exons remains unclear. We examined two hypotheses concerning the origin of these non-Adh-derived jgw exons: (i) these exons might originate from a unique genomic sequence that fortuitously evolved a standard intron-exon structure and regulatory sequence for jgw; (ii) these exons might be a duplicate of an unrelated previously existing gene. Genomic Southern analysis, in conjunction with construction and screening of a genomic bookshelf (sub-library), was conducted in a group of Drosophila species. The results demonstrated that there are duplicate genes containing the same structure as the recruited portion of jgw. We name this duplicate gene in Drosophila teissieri and Drosophila yakuba and its orthologous gene in Drosophila melanogaster as yellow-emperor (ymp). Thus, the 5’ exons/introns originated from a previously existing gene that provided new modules with specific sub-function to create jgw.

Links adapted from Here

Scale free networks

In addition, protein networks, RNA networks can be characterized by a ‘scale free’ nature. Remarkably a simple model involving gene duplication can explain the nature of these networks.

“Scale free” was a term first coined by Albert-L�szl� Barab�si

Powerlaw website


Barab�si is a professor of physics and director of the Study of Self-Organized Networks at Notre Dame.

See also their Cellular networks publication page.

Barabasi, A. and Albert, R. Emergence of scaling in random networks. Science 286, 509-512. 1999.

Barabasi, A. and Bonabeau, E. Scale-Free Networks. Scientific American 288, 60-69. 2003.

Various papers explore how simple models based on gene duplication can lead to networks with similar statistics as found in nature.

Bhan A, Galas DJ, Dewey TG. A duplication growth model of gene expression networks. Bioinformatics. 2002 Nov;18(11):1486-93.

The overall structure of these biological networks is distinctly different from that of other recently studied networks such as the Internet or social networks. These biological networks show hierarchical, hub-like structures that have some properties similar to a class of graphs known as small world graphs. Small world networks exhibit local cliquishness while exhibiting strong global connectivity. In addition to the small world properties, the biological networks show a power law or scale free distribution of connectivities. An inverse power law, N(k) approximately k(-3/2), for the number of vertices (genes) with k connections was observed for three different data sets from yeast. We propose network growth models based on gene duplication events. Simulations of these models yield networks with the same combination of global graphical properties that we inferred from the expression data.

V. van Noort, B. Snel, and M. A. Huynen The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model EMBO Rep., March 1, 2004; 5(3): 280 - 284.

Evolutionary model of transcription regulation. The evolutionary model consists of a few simple mechanisms. (A) A genome is initiated with 25 genes with random TFBSs, represented by the small coloured shapes. (B) Possible events are as follows: (1) Gene A is duplicated, gene A’ has the same TFBS as its duplicate gene A; the duplicates are coexpressed. (2) Gene deletion. (3) Gene A acquires a new TFBS from gene B. The probability of obtaining a specific TFBS is proportional to its frequency in the genome. The probability of a novel TFBS is (150 - total number of different TFBSs present)/(150+total number of TFBSs). (4) One of the TFBSs of gene A is deleted. (C) A network is constructed by connecting genes that share TFBSs.

Additionally such scale free networks can help explain modularity, robustness, evolvability as well as degeneracy found in nature.

For instance:

Modularity “for free” in genome architecture? by Ricard V. Sole and Pau Fernandez

Recent models of genome-proteome evolution have shown that some of the key traits displayed by the global structure of cellular networks might be a natural result of a duplication- diversification (DD) process. One of the consequences of such evolution is the emergence of a small world architecture together with a scale-free distribution of interactions. Here we show that the domain of parameter space where such structure emerges is related to a phase transition phenomenon. At this transition point, modular architecture spontaneously emerges as a byproduct of the DD process. Although DD models lack any functionality and are thus free from meeting functional constraints, they show the observed features displayed by the real proteome maps when tuned close to a sharp transition point separating a highly connected graph from a disconnected system. Close to such a boundary, the maps are shown to display scale-free hierarchical organization, behave as small worlds, and exhibit modularity. It is conjectured that natural selection tuned the average connectivity in such a way that the network reaches a sparse graph of connections. One consequence of such a scenario is that the scaling laws and the essential ingredients for building a modular net emerge for free close to such a transition.

Yaneer Bar-Yam and Irving R. EpsteinResponse of complex networks to stimuli PNAS March 30, 2004 vol. 101 no. 13 4341-4345

We consider the response of complex systems to stimuli and argue for the importance of both sensitivity, the possibility of large response to small stimuli, and robustness, the possibility of small response to large stimuli. Using a dynamic attractor network model for switching of patterns of behavior, we show that the scale-free topologies often found in nature enable more sensitive response to specific changes than do random networks. This property may be essential in networks where appropriate response to environmental change is critical and may, in such systems, be more important than features, such as connectivity, often used to characterize network topologies. Phenomenologically observed exponents for functional scale-free networks fall in a range corresponding to the onset of particularly high sensitivities, while still retaining robustness.

These data show how claims from ID proponents that (Darwinian) evolutionary mechanisms cannot explain information, complexity, innovation are without much merrit.

In this 2002 paper Lynch explores gene duplication and evolution based on the paper Jeffrey A. Bailey, Zhiping Gu, Royden A. Clark, Knut Reinert, Rhea V. Samonte, Stuart Schwartz, Mark D. Adams, Eugene W. Myers, Peter W. Li, and Evan E. Eichler Science 2002 297: 1003-1007.

Co-option, gene duplication appear to be quite important evolutionary mechanisms

Co-option occurs when natural selection finds new uses for existing traits, including genes, organs, and other body structures. Genes can be co-opted to generate developmental and physiological novelties by changing their patterns of regulation, by changing the functions of the proteins they encode, or both. This often involves gene duplication followed by specialization of the resulting paralogous genes into particular functions. A major role for gene co-option in the evolution of development has long been assumed, and many recent comparative developmental and genomic studies have lent support to this idea. Although there is relatively less known about the molecular basis of co-option events involving developmental pathways, much can be drawn from well-studied examples of the co-option of structural proteins. Here, we summarize several case studies of both structural gene and developmental genetic circuit co-option and discuss how co-option may underlie major episodes of adaptive change in multicellular organisms. We also examine the phenomenon of intraspecific variability in gene expression patterns, which we propose to be one form of material for the co-option process. We integrate this information with recent models of gene family evolution to provide a framework for understanding the origin of co-optive evolution and the mechanisms by which natural selection promotes evolutionary novelty by inventing new uses for the genetic toolkit

Gene co-option in physiological and morphological evolution. True JR, Carroll SB.Annu Rev Cell Dev Biol. 2002;18:53-80.


For example, it is becoming clear that co-option has played a critical role in evolution and the homeotic genes are not exempt in this regard. To demonstrate this point we can point to the expression pattern of the homeotic gene Ubx in various arthropod groups and the most basic morphology of the segments within its expression domains.

Understanding the genetic basis of morphological evolution: the role of homeotic genes in the diversification of the arthropod bauplan. ALEKSANDAR POPADIC, ARHAT ABZHANOV, DOUGLAS RUSCH and THOMAS C. KAUFMAN Int. J. Dev. Biol. 42: 453-461 (1998)


Like Dembski, Johnson is very fond of information theory. He is quite emphatic that natural selection acting on chance variations can not significantly increase the information content of the genome. Johnson offers a crude caricature of the arguments made in Dawkins’ article (41), but offers no explanation of why gene duplication with subsequent divergence can not account for the growth in genetic information. The closest he comes to addressing the subject is the following quote, in which he recounts a discussion with mathematical physicist Paul Davies:

“When I asked Davies about this, his reply gave me the impression that he thinks that natural selection increases genetic information by preserving copies that are made in the reproductive process. I am afraid this misses the point. When two rabbits reproduce there are more rabbits, but there is not any increase in information in the relevant sense. If you need to write out the full text of the encyclopedia and have only page one, you cannot make progress toward your goal by copying page one twenty times.” (59)

In reply I will simply quote John Maynard Smith and Eors Szathmary, from their book The Major Transitions in Evolution. The mere duplication of a gene adds no new information, but the divergence of the two copies does so.”

Design detectives

Creationist arguments

Some websites which seem to be unaware of the scientific data

Fourth, we see the apparent inability of mutations to truly contribute to the origin of new structures. The theory of gene duplication in its present form is unsuitable to account for the origin of new genetic information that is a must for any theory of evolutionary mechanism.

Rebuttals to Common Criticisms of the Book Darwin’s Black Box Robert DiSilvestro, Ph.D. (also found on various other websites)

Rebuttal to criticism # 2. To develop the specialized functions, the duplicated genes still had to evolve structural changes. What drove the changes? In all likelihood, a number of specializations would have had to develop simultaneously to have any value. This brings everything back to the mouse trap analogy. The only refinement is that some parts of the mouse trap would have some structural similarities.

An additional concern here is the high probability of the evolving genes messing up the original system. This is very likely with an abundance of structurally similar gene products. If one of these gene products becomes nonfunctional, it could get in the way of the function of original gene product. This phenomena is readily observed today. For instance, the chemotherapy drug methotrexate looks like the B-vitamin folacin, but does not work like it. The drug will compete with the real vitamin for binding to functional sites, but will not actually function. This action kills cancer cells.

Another problem with gene duplication is that it doesn’t account for all, or even most, of the complexity in many systems. For example, the complexities of oxygen transport involve many genes which are not structurally similar. This is obvious when one considers anemia, a breakdown in oxygen transport. When I teach nutrition courses, I sometimes ask: how many different mechanisms can cause anemia? There are many causes which involve molecules with little or no overlap in structure.

Di Silvestro’s ‘arguments’ are that not all complexity seems to have arisen through gene duplication but that just seems to be irrelevant since the argument was never that all complexity thus arises. Di Silvestro raised “An additional concern here is the high probability of the evolving genes messing up the original system. “ but provides few references to help understand the relevance of this claim other than a reference to a chemotherapy drug. It should be obvious that natural selection would quickly deal with such cases.

1 TrackBack

Panda's Thumb Posts from De Rerum Natura on May 25, 2004 1:36 PM

There are some interestng posts over at the Panda’s Thumb right now. Evolution of complexity, information and entropy Evolution of complexity, information and entropy Gene duplication versus ID... Read More


Pim wrote:

In addition, protein networks, RNA networks can be characterized by a ‘scale free’ nature. Remarkably a simple model involving gene duplication can explain the nature of these networks.

I’ve addressed Barabasi’s work in other locations. Here is a sample of what I said:

The use of the term “self-organizing” by Barabasi and his group is misguided. Correlating biological evolution with technological advancement is a false analogy unless you are willing to admit that the same principles apply to both types of systems. This type of argument has been made in the past with respect to the automobile and the airplane. The reason the analogy is false is because at no point in the development of the automobile, the airplane or the World Wide web or any other complex network was any element of the design achieved by chance. Only by the most strict application of the rules of engineering and aerodynamics and the intelligent input from human designers were these results obtained. There is no way that a random search could ever have discovered the design of the internal combustion engine or the jet engine or the internet. In all cases, the search for function is intelligently guided. Evolution by random mutation is analagous to problem solving without any intelligent guidance. In the case of every kind of complex, functional system, the total magnitude of all combinational possibilities is nearly infinite. Meaningful islands of function are so isolated that to find even one by chance would be truly a miracle. But the analogy may indeed be valid. If these complex networks, processes and structures that are the product of human design all require a higher intelligence to create, why would one not imagine that the most complex system of all, the living cell, must not have had a similar design component to achieve it’s function?

And further…

I am familiar with many of the aspects of “self rganizing networks” and such a tutorial is not necessary. In fact, I know enough about these systems to state unequivocally that they have nothing whatsoever to do with biological evolution. It therefore puzzles me why you brought this up here in the first place, if not to subtly suggest that the concept of “self-organization” may somehow have significance in the evolution of biological systems. News flash: it does not! The mathematics is not the reason why I characterized their use of the word as “misguided”. The reason was because of the fact that self-organizing networks are irrelevant wrt biological evolution. They are not, however, irrelevant wrt the living cell. Living cells function with a number of imilarities and their topology is important in carrying out their developmental and regulatory processes. But the important point to remember is that these self-organizing networks are the result of intelligent guidance, and any similarities that these networks have with living cells only enhances the notion that there is a component of intelligent guidance involved in their processes, structures and functions as well.

Hi Charlie, thanks for ignoring everything I said and creating your own strawman approach. What must I have been thinking? That ID proponents would admit that their claims about gene duplication are misinformed?

Wagner: Correlating biological evolution with technological advancement is a false analogy unless you are willing to admit that the same principles apply to both types of systems.

What Barabasi and others have done is shown that these networks are NOT really intelligently designed or planned.

Why Charlie raises the strawman of car or airplane is beyond me. However his claim that these networks are not by chance ignores the obvious namely that a process of gene duplication can explain these networks and that no intelligent design is needed. In fact the WWW network is a good example not of a planned network but of one with haphazard additions mimicking the processes found in the genome.

Charlie’s final strawman: Evolution by random mutation is analagous to problem solving without any intelligent guidance.

Which is why evolution is not just by random mutation Charlie. Until you admit that evolution contains two separate processes your claims have to be once again rejected as strawmen.

Charlie: In fact, I know enough about these systems to state unequivocally that they have nothing whatsoever to do with biological evolution.

Well I can accept that you are willing to close your eyes to the scientific evidence but as I have shown they are intrically linked to evolution.

Charlie, your claims, while valiant in its attempt to defend ID, fail to address the real issues namely that simple processes guided by selection are sufficient to explain the protein, RNA and other networks found in biology.

I suggest you read up on the excellent thread on ISCID by Deanne Taylor and Dembski

I understand that Charlie rejects evolutionary pathways a priori and has accepted the need for intelligent guidance without any evidence nor has Charlie presented any mechanisms. Once again ID fails to compete in any scientific manner with evolutionary theory. It does show how ID relies strongly on personal incredulity.

Syntax Error: mismatched tag at line 5, column 2, byte 363 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/ line 187.

For those interested in seeing Lilith rebut Charlie’s arguments see the following thread on

Such as Charlie’s claim that the use of the term is misguided

The term “self-organizing network” is a technical term used in computer science and mathematics. Such networks exhibit statistical mechanics that are born out simple rules applied to the assembly and growth of the network. Your misunderstanding of the underlying mathematics does not provide a valid argument against the terminology as “misguided” since the mathematics applies equally well to all the examples the Barabasi group offers. I suggest a google search on “self organizing networks” and a few days of reading.

Charlie Wrote:

There is no way that a random search could ever have discovered the design of the internal combustion engine or the jet engine or the internet.

Lilith Wrote:

Except self-organizing networks do not result from “random searches” (one thing I’ve noticed about Charlie and some of the other creationists is the eternal call to incredulity).

Living organisms are never randomly “searching”. Cellular networks were not “designed”, nor do they necessarily arise from “random searches”. Gene duplication would add to an existing network, but that is not a random search. Gene duplication occurs through natural proceses, but those are not random searches. Selection is not random, it operates by choosing viable living systems to sustain in certain contexts. But actually, one might say that only the living systems that CAN operate in certain contexts will move into those contexts.

Biological noise – mutation in all its various forms as well as functional variation – is able to supply diversity by which selection can operate. It’s a very simple concept based on observed phenomena. That diversity is not infinite. It is limited by the evolutionary history of the organism. Only those contexts that are within reach of the organism’s diversity will be able to support that organism or its sub-population that can establish the first foothold.

Living systems provide a wide playing field (diversity), and can enter different contexts by selection (natural selection) of a subset of the available choices so that a population results with a different distribution of alleles (evolution).

Read the thread, it provides for some good insight in Charlie’s thinking and what’s wrong with his arguments.

Pim wrote:

Hi Charlie, thanks for ignoring everything I said…

Well, you blinded me…with SCIENCE! ;-)*

There was just too much there to absorb, so I picked out something I was familiar with and used some boilerplate to respond. I have a long flight tonight, so I’ll read it on the plane and reply when I get home.

(* an 80’s song by Thomas Dolby)

Pim wrote:

I suggest you read up on the excellent thread on ISCID by Deanne Taylor and Dembski

You probably know that Deanne Taylor is Lilith, so you also know that I’ve engaged her on She’ll have an easier time with Dembski than she had with me ;-)

Jerry Wrote:

You probably know that Deanne Taylor is Lilith, so you also know that I’ve engaged her on She’ll have an easier time with Dembski than she had with me

Interesting prediction which so far does not seem to be supported by the evidence. Lilith is doing an excellent job at showing what is wrong with Charlie’s ideas about self organizing networks.

I will be posting some additional resources soon where I will show how hierarchical scale free networks can arise through simple processes of duplication and preferential attachment.

Then there are the issues of degeneracy and redundancy. The former one is more often found in biological systems while the latter one is a more common engineering solution (backup systems).


I understand that Charlie rejects evolutionary pathways a priori and has accepted the need for intelligent guidance without any evidence nor has Charlie presented any mechanisms. Once again ID fails to compete in any scientific manner with evolutionary theory. It does show how ID relies strongly on personal incredulity.

There is a big difference between an argument from personal incredulity and an argument from mathematical improbability. Evolutionists accuse ID of the former whereas IDists like Charlie, Dembski and Behe consistently propose the latter. In fact the improbability calculations produce results which indicate “impossibility”. Yes, Behe expresses wonder at the extent and complexity of it all but so does Dawkins. Francis Crick seems impressed by the “impossibility” calculations. I would assume the opinion of someone of such paramount importance in genetics would mean something. Oh, an argument from authority.…

The impossibility calculations correspond with what we observe in every chemistry lab in existence. Complex chemical reactions on the order of functional RNA or protein synthesis never occur spontaneously. Or have I missed it? Is this not scientific proof? How long do we wait for it to happen before we admit it can’t happen?

Should we observe spontaneously synthesized RNA -the bytes- what then distributes the components in the precise manner required for a viable, living creature to emerge -the program? The random accumulation of many thousands of packets of information cannot produce an ordered system. Natural selection will have nothing to work on.

Jack Shea Wrote:

There is a big difference between an argument from personal incredulity and an argument from mathematical improbability. Evolutionists accuse ID of the former whereas IDists like Charlie, Dembski and Behe consistently propose the latter. In fact the improbability calculations produce results which indicate ?impossibility?.

Unfortunately for you, all such calculations anyone has proposed to date are meaningless because the inputs are unrealistic and/or just plain wrong. Including the very few (I only know of one) calculations Dembski has proposed for real-world issues. GIGO.

Syntax Error: mismatched tag at line 56, column 2, byte 4264 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/ line 187.

Syntax Error: mismatched tag at line 8, column 218, byte 736 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/ line 187.

Syntax Error: mismatched tag at line 11, column 2, byte 1087 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/ line 187.

Don’t forget the extensive inventory of complex orgainc molecules that form in deep space. These even have a head start on chirality.[…]809EC588F2D7


Fred Hoyle had this to say:

‘Now imagine 1050 blind persons [that’s 100,000 billion billion billion billion billion people—standing shoulder to shoulder, they would more than fill our entire planetary system] each with a scrambled Rubik cube and try to conceive of the chance of them all simultaneously arriving at the solved form. You then have the chance of arriving by random shuffling [random variation] of just one of the many biopolymers on which life depends. The notion that not only the biopolymers but the operating program of a living cell could be arrived at by chance in a primordial soup here on Earth is evidently nonsense of a high order.

Not arguing from authority but arguing from the unusual intelligence of the quoted person, I think we have to give some credit to the improbability argument. Well I know you won’t, but I will. While a young radar engineer Fred Hoyle correctly predicted that “the only way enough carbon could be made was if there existed a very specific match of nuclear energy levels, or resonance, between helium, beryllium, and carbon under precisely the conditions thought to prevail in the cores of stars at this stage in their evolution. Experiments promptly confirmed Hoyle’s deduction—there was indeed a previously unsuspected resonance, very close to the energy value he gave.”

Fred was a smart guy and a scientist. He was talking out of his field and the “in-field” boys he proposed his idea to thought he was nuts at first and took some persuading to do the experiment. Even Watson and Crick drew back from their original proclamations that the self-assembly of proteins was a piece of cake once it became clear just how complex proteins were. It may be that the improbability argument is incorrect but at the moment it holds a lot of water. Nit-picking over the procedures does not diminish the awesome challenge faced by little molecules in the primordial soup who would like to grow up one day into big strong DNA.

Jack Shea, quoting Hoyle, Wrote:

The notion that not only the biopolymers but the operating program of a living cell could be arrived at by chance in a primordial soup here on Earth is evidently nonsense of a high order.

And I agree! I don’t know of anyone who contends that a genome of anything like the complexity of a modern organism came together in one step, which is what Hoyle’s (and Dembski’s, etc) calculations reflect. If that were likely, we wouldn’t need evolution to explain anything. But it isn’t, and we do.

I made some remarks on calculations like those being discussed here on ARN. The first paragraph is

A topic that often arises in these discussions is the question of how improbable something or other is. Calculating the probability of abiogenesis – the emergence of self-replicating organic entities – is a favorite. Let me describe some of the problems associated with calculating a plausible (or at least defensible) probability of the emergence of life. In my view, there are at least six questions that must be addressed in constructing an equation from which the probability calculation can be made:

and a paragraph following the list of 6 is this:

It is enormously important that the last three factors described above – physical and chemical variables and the role of the inorganic context – do not merely affect the overall probability of the emergence of self-replicating molecules, they preferentially bias the process by changing the odds against certain molecules and/or biochemical pathways. That is, they alter the shape of the probability distribution across molecules, favoring some over others. This differential favoring, this change in the shape of the probability distribution, is an ordering process. It is due solely to natural causes, and in sculpting the probabilistic landscape it begins the ordering that leads to life.


Jack Shea: It may be that the improbability argument is incorrect but at the moment it holds a lot of water.

First of all the actual facts do not suggest that abiogenesis is improbable or impossible and thus any such argument, especially when based on appeal to ignorance or strawmen should be rejected.


And I agree! I don’t know of anyone who contends that a genome of anything like the complexity of a modern organism came together in one step, which is what Hoyle’s (and Dembski’s, etc) calculations reflect.

I think Fred was considering the evolutionary bootstrap theory in toto. He was too vast a mind to miss the procedural argument. He does say “…the operating program of a living cell” which would apply to ancient bacteria. Interestingly Fred the genius was whacked for his panspermia hypothesis when first espoused but it’s now very much back on agenda with small percentage left-handed amino acids pitching in the ninth inning, score tied 1-1.

Dembski and his critics are beyond my feeble comprehension of mathematics and I can only claim to support his conclusions …which I have reached by other means. So you might be right.

This version of probability, based on random combinations, has no bearing on the actual likelyhood of anything, unless that thing actually formed that way. This type of probability is simply a measure of ignorance; the ‘random’ assumption amounts to basing the conclusion on maxium ignorance, so of course you get a small number.

This has nothing to do with biology, both because it is not the way biology works and because it is a general property of particulars; everything’s improbable in the usual way. How about a fluffy white cloud in a blue sky? Or the exact arrangement of atoms in a thimble of air?

And by the way, the impressive inventory of organic molecules that form in space, and would have been included in the comets that contributed much of the outermost layer of the young earth, support the origin of life itself right here on earth.

Jack: I think Fred was considering the evolutionary bootstrap theory in toto. He was too vast a mind to miss the procedural argument

Me: You’ll have to show me where he relates his “calculations” to a realistic or even feasible scenario.

In the meantime, I’ll continue to think that, as a biologist, Hoyle makes a great astrophysicist.

Do you know you’re a googlewhack?


About this Entry

This page contains a single entry by PvM published on May 23, 2004 8:09 PM.

Where’s the beef, Paul? was the previous entry in this blog.

Archaeology and the Explanatory Filter? is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.



Author Archives

Powered by Movable Type 4.381

Site Meter