Origin of “junk” DNA

| 24 Comments

A paper published online in Molecular Biology and Evolution claims to have "rigorous proof that [junk DNA was] added to DNA 'late' in the evolution of life on earth--after the formation of modern-sized genes, which contain instructions for making proteins" according to a press release from the National Institute of Standards and Technology, whose Center for Advanced Research in Biotechnology (CARB) was involved with the research (link)

The press release states:

Research from the CARB group appears to resolve a debate over the "early versus late" timing of the appearance of introns. Since introns were discovered in 1978, scientists have debated whether genes were born split (the "introns-early" view), or whether they became split after eukaryotic cells (the ones that gave rise to animals and their relatives) diverged from bacteria roughly 2 billion years ago (the "introns-late" view). Bacterial genomes lack introns. Although the study did not attempt to propose a function for introns, or determine whether they are beneficial or harmful, the results appear to rule out the "introns-early" view.

The CARB analysis shows that the probability of a modern intron's presence in an ancestral gene common to the genes studied is roughly 1 percent, indicating that the vast majority of today's introns appeared subsequent to the origin of the genes. This conclusion is supported by the findings regarding placement patterns for introns within genes. It long has been observed that, in the sequences of nitrogen-containing compounds that make up our DNA genomes, introns prefer some sites more than others. The CARB study indicates that these preferences are side effects of late-stage intron gain, rather than side effects of intron-mediated gene formation.

Ref: Wei-Gang Qiu, Nick Schisler, and Arlin Stoltzfus, "The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences" MBE Advance Access published March 10, 2004, 10.1093/molbev/msh120

24 Comments

Following up on the theme of defending microbes (see comments to post below), there is an error in the press release. Bacteria do in fact have introns (they are called Group I and II introns). They are rare, but seem to be important particularly in horizontal transfer of bacterial DNA. For more info, see the paper’s below (or just search NCBI for “bacterial introns”

http://www.pubmedcentral.gov/articl[…]artid=307953

Nucleic Acids Res. 1994 April 11; 22 (7): 1167–1171 “Evidence for a group II intron in Escherichia coli inserted into a highly conserved reading frame associated with mobile DNA sequences.”

Note: not meant to dispute the interesting findings of the paper, although I would point out that the identification of introns in eubacteria suggests an even earlier origin than postulated by the authors.

I would be wary of calling introns “junk”. While their individual length and sequence certainly vary and are probably under no or very loose selection (except when they contain regulatory elements, of course), it has certainly been argued that the existence of introns, and their position within genes, could have significant evolutionary implications. For instance, presence of introns could result in more frequent productive reshuffling of protein structural modules (“domains”) by intragenic recombination, transposition, etc, leading to new functional combinations. This in turn would have an impact on long-term “evolvability”, for instance. I am a little baffled by the use of the term “junk” in the NIST press release, they should know better.

An interesting point- In ‘artifical evolution’, specifically Genetic Programming, the presence of what amounts to introns can have a noticeable effect on the rate of improvement in the population. Basically, the presence of nonused subtrees allows useful sections to be formed in a way that doesn’t harm the fitness of the individual being changed.

The presence of ‘junk’ DNA may well be advantageous in that it allows mutation to produce possibly useful genes ‘offsite’ if you would. All of which means.… little to nothing. But its a thought =)

Thanks for posting that, John. I’ve long been a fan of “introns-late”, and it’s good to see that the evil “introns-early” people have been taken down a notch yet again. Last I checked, the consensus, to the extent that there was one, was that introns are a mixed bag of early and late. Seems now, if these conclusions are correct, that very few introns were early.

I’m with Andrea about how misleading it is to call introns “junk”. And I would also add that it can give the false impression that all non-functional (or non-transcribed) DNA consists of introns, whereas in reality it consists of a diverse variety of retroelements, pseudogenes, microsattelites, and so forth. A favorite IDist thing to do is to pretend as if it’s all the same.

Steve:

No problem :)

I too agree with the problem with the designation of introns as “junk” (and indeed the point you make about pseudogenes etc).

Hey Aardvark, do you have any specific references for that GP comment? I’d be real interested in them. Thanks!

RBH

Thanks for your comments on this work. Yes, the word “junk” is objectionable, isn’t it? There are some words that one strives to avoid in one’s own writing (“junk” doesn’t appear in our paper), but one can’t stop these words from appearing in press releases.

Having said that, the notion that introns or so-called “junk DNA” (more generally) confer an advantage of some type does not represent an evolutionary hypothesis unless one is proposing that this advantage is the *cause* of something. Usually this is not clear, and thus ideas about benefits are not distinguishable from natural theology, i.e., introns are good thus god (nature) is good, which is the way that I would read (for instance) the famous Gilbert (1978) paper about how introns “speed evolution”. If anyone here wants to propose a specific hypothesis to the effect that some attribute or consequence of introns is the evolutionary cause of their existence, their persistence, or their properties, then this would be a good way to stimulate discussion.

With respect to introns in bacteria: yes, they exist. This is another one of those simplifications that just can’t be kept out of press releases. However, these are different sorts of intron than the ones that are found in the eukaryotic nucleus, and that are the subject of the Qiu, et al. paper and the historic ‘introns-early’ vs. ‘introns-late’ debate.

Arlin

The May 23, 2003 issue of Science contained an article titled:

“Not Junk After All”

“So, the question is, “Why do we need so much DNA?” Most researchers have assumed that repetitive DNA elements do not have any function: They are simply useless, selfish DNA sequences that proliferate in our genome, making as many copies as possible. The late Sozumu Ohno coined the term “junk DNA” to describe these repetitive elements.

Although catchy, the term “junk DNA” for many years repelled mainstream researchers from studying noncoding DNA. Who, except a small number of genomic clochards, would like to dig through genomic garbage? However, in science as in normal life, there are some clochards who, at the risk of being ridiculed, explore unpopular territories. Because of them, the view of junk DNA, especially repetitive elements, began to change in the early 1990s. Now, more and more biologists regard repetitive elements as a genomic treasure. Genomes are dynamic entities: New functional elements appear and old ones become extinct. It appears that transposable elements are not useless DNA. They interact with the surrounding genomic environment and increase the ability of the organism to evolve. They do this by serving as recombination hotspots, and providing a mechanism for genomic shuffling and a source of “ready-to-use” motifs for new transcriptional regulatory elements, polyadenylation signals, and protein-coding sequences. The last of these is especially exciting because it has a direct influence on protein evolution.”

The article concludes:

“Risking personification of biological processes, we can say that evolution is too wise to waste this valuable information. Therefore, repetitive DNA should be called not junk DNA but a genomic scrapyard, because it is a reservoir of ready-to-use segments for nature’s evolutionary experiments.”

Could the tag “junk DNA” actually have stifled research into non-coding DNA?

This is an example of the difficulty mentioned earlier. The quotations from the _Science_ article don’t specify an evolutionary hypothesis, do they? They are mostly just playing on the aphorism “one man’s junk is another man’s treasure” (and radically rewriting the history of evolutionary genomics!). The scientific issue is utterly unclear. The author asks “why do we need so much DNA?”, and goes on to associate the idea of “junk DNA” with an absence of “function”. Later the author expresses excitement that introns and repetitive DNA “influence” protein evolution. If we take this seriously and try to roll it into an evolutionary hypothesis, it would be something like this: the genome “needs” secondary DNA to “influence” evolution or to “increase the ability of the organism to evolve”. But there is no such need in evolution, as proved conclusively by the existence of prokaryotes.

Arlin

Hi Arlin: great to have you here!

First of all, a comment about the “junk” label. It has always been my understanding that the possibility of “re-functionalization” (ugh, what a horrible word, let’s call it “recycling”, to stay in theme) of “junk” DNA had already been discussed very early on. Certainly, when I started studying biology in the early 80s, I remember being instructed about the clear difference between the words “junk” and “garbage” in English (we don’t have directly corresponding words in Italian).

Now, to the fun part, related to Arlin’s challenge: “If anyone here wants to propose a specific hypothesis to the effect that some attribute or consequence of introns is the evolutionary cause of their existence, their persistence, or their properties, then this would be a good way to stimulate discussion.”

I think there are three questions here: 1. where did introns come from? 2. why did they persist/expand? 3. what kind of advantage (if any) did they bring to the carrier organisms?

I am not an intronologist, so what I am about to say may easily be entirely trivial, or entirely idiotic, based on existing evidence in the field.

Let’s say that introns were originally some form of selfish mobile DNA element (which seems to be the consensus in the field, as far as I understand it), struggling to stay one step ahead of the deletional bias in their target organisms’ genomes.

Where is the best place to “hide” for one of this things? Inside genes, of course, provided that a mechanism is in place not to alter their functionality (otherwise integration would be counter-selected). In other words, a mobile element capable of accurately splicing itself out from an mRNA (therefore retaining gene function) would be very hard for deletional bias to eliminate, because deletions would more likely than not alter the coding sequence.

So, let’s say that a mobile genetic element aquired a ribozyme activity that allowed the self-splicing of its RNA sequence from primary mRNA. This element would spread relatively safely: its insertions would be neutral, and it would be relatively protected from deletional bias.

Does this make any sense? Also, I am not sure to what extent this is testable, but perhaps one would expect to find some mobile genetic element that included a similar ribozyme activity still going around in some bizarre organism, or maybe a ribozyme activity molecularly related to introns, but performing some other function (e.g, cutting and pasting specific mRNAs?) which could be the precursor of the activity in introns.

That’s as far as I can venture about where introns came from and why they spread. Next is the question of what good are introns to us, trying to avoid general platitudes. I would put it this way: once a set of organisms had acquired, through replicative spreading and drift, a hefty dose of these new, selectively neutral intron-like mobile elements as described above, their presence resulted in higher chances to generate new functional hybrid proteins by genetic recombination, compared to their intron-less counterparts. The reason for this is of course that intron sequences are much more tolerant of recombination, since they can’t cause new frameshifts (other than those intrinsic to intron phase) and internal deletions. So, it’s not so wrong to say that introns allow “accelerated evolution”, as long as one turns the sentence around: once introns have accumulated for some other reason (eg, selfish replication), they do end up conferring an advantage to the organisms which bear them, by allowing new kinds of adaptation and innovation. Thus, these are the organisms we would expect to be more likely to grow in complexity, both genetically (more genes, more functions) and, as a consequence, morphologically. Thus, the selective advantage is not in the presence of introns in and of itself, but in what the widespread presence of introns allows.

Now, I know this hypothesis already has more holes that Swiss cheese, but go ahead and poke some more.

A

Andrea, thanks again for your thoughtful comments. First, you are right that “recycling” junk was an early idea. In fact, the fellow who coined the term “junk DNA”, Susumu Ohno, is otherwise famous for writing a book about the overwhelming importance of gene duplication in evolution (Ohno, 1972). Ohno believed that extra copies of genes created by duplication degenerate in a non-functional form, and are then brought back into service, and that without this kind of process, evolution would be too conservative because natural selection is such an “effective policeman”. In other words, the fellow who coined “junk DNA” believed that in order to account for the complexity of jaguars or flowers, “junk” must be recycled into “treasure”.

In your comments, you offer two kinds of hypotheses. One is the explanation for a singular historical event, the origin of spliceosomal introns as a class. This is the kind of thing that is rarely testable in a meaningful sense, though it can be fun to speculate about (I have done it myself). A historical event is a ‘particular’. We can have a theory about games or tables, but not a theory about one particular game or table. Well, to clarify, if we have a theory about the game “poker”, it is a theory about *any* game of poker that might be played, and not about one particular game of poker that was played last Friday night. So I won’t respond to those comments.

The other kind of hypotheses potentially involves recurrent processes, and this gives the basis for testing. For instance, if intron gains are neutral or deleterious, then (other things being equal) they should occur more often in species with smaller population sizes, because selection is a less effective policeman in a small population. Your suggestion about selfish introns can be tested by looking for this kind of correlation. Myself and others such as Michael Lynch (see Lynch & Conery, recent _Science_ article) are interested in doing this sort of thing. Likewise, the rate of spread of a selfish element should be higher in outcrossing populations. Both of these implications are based on fundamental principles of population genetics.

Then in your last major paragraph, you raise the issue of an advantage in your suggestion that introns increase the chance of forming new genes. During most of the paragraph it seems clear that you are NOT proposing this as a “function”, as a _raison d’etre_ for introns, but as a consequence, i.e., once the introns spread selfishly, they have the consequence of increasing the formation of new genes. We could test this by simply asking whether the rate of formation of new genes correlates with the density of introns. But it would not be clear what this hypothesis signifies about “junk DNA” or about evolution. Then in the last sentence, you say that this is a “selective advantage”. This suggests a different hypothesis. Are you suggesting that organisms with more introns have succeeded better, i.e., have increased their relative representation in the earth’s biota, as a consequence of having introns?

Arlin

Arlin, let me give you some short answers, for the time being. Then in your last major paragraph, you raise the issue of an advantage in your suggestion that introns increase the chance of forming new genes. During most of the paragraph it seems clear that you are NOT proposing this as a “function”, as a _raison d’etre_ for introns, but as a consequence, i.e., once the introns spread selfishly, they have the consequence of increasing the formation of new genes. Correct. We could test this by simply asking whether the rate of formation of new genes correlates with the density of introns. Or, whethe multi-domain proteins, that combine multiple activities associated with well-defined structural “modules”, are more common for intron-bearing organisms than for non-. For instance, just in my field, I can think of entire families of receptors and transcription factors involved in immune system function whose members differ for the presence of identifiable functional modules (eg, for specific protein-protein interactions, phosphorylation sites, etc). I am sure this is probably the rule rather than the exception for multigene families with significant functional diversification. This kind of “modularity”, in which members of a family appear at some evolutionary point to have acquired the coding sequences for specific protein domains from unrelated genes, would intuitively seem more difficult for intronless genes than for genes with introns. [Incidentally, this may have some relationship with phase bias, but we can talk about this later, if you wish.] It would not be clear what this hypothesis signifies about “junk DNA” or about evolution. Not really about “junk”, but about introns and evolution - see below. Then in the last sentence, you say that this is a “selective advantage”. This suggests a different hypothesis. Are you suggesting that organisms with more introns have succeeded better, i.e., have increased their relative representation in the earth’s biota, as a consequence of having introns? Well, one can’t really say that, since bacteria are still the dominant organisms. However, assuming that the ability to generate new functional combinations of protein domains correlates with “evolvability” (gosh, I hate that word), then yes, perhaps organisms that were riddled with essentially selectively neutral (at the individual level) introns ended up reaping, in the evolutionary long term, the unintended benefit of being able to more readily evolve new protein functions from those introns’ presence. This may have correlated with the potential to increase in complexity and/or colonize peripheral (to the bacterial world) ecological niches, which is after all what eukaryotes have been doing quite successfully. Does it make any sense? A

Andrea, you wrote “I am sure this is probably the rule rather than the exception for multigene families with significant functional diversification.”

Actually, the empirical issue is not so clear. Gene families often diversify without obvious chimaerism (e.g., globins). Recognizably chimaeric proteins typically are chimaeras of globular domains, and not of “modules” of some other sort. Such proteins are abundant in both prokaryotes and eukarytoes, and are typically involved in operations that involve localization, as one might expect. An example case of prokaryotic chimaeras would be phosphotransferase system proteins (search for pts AND saier-mh [auth] in PubMed). I will try to find an answer to the question of whether multi-domain proteins are more frequent in prokaryotes or in eukaryotes.

Also, you wrote that “assuming that the ability to generate new functional combinations of protein domains correlates with “evolvability” (gosh, I hate that word), then yes, perhaps organisms that were riddled with essentially selectively neutral (at the individual level) introns ended up reaping, in the evolutionary long term, the unintended benefit of being able to more readily evolve new protein functions from those introns’ presence.”

If you were saying that introns increase long-term reproductive success, this is something we would know how to measure, but you are saying that it increases a quantity called “new protein functions”. Do you just mean “proteins”, or perhaps, “loci”? Where does “function” come into it?

With respect to “evolvability”, the concept is inert within neoDarwinism, which does not provide the means to consider cases in which the course of evolution may be caused by specific propensities of the variation-producing process (i.e., the heresy of “orthogenesis”). Lets begin instead from the sort of mutationist perspective (“mutation proposes, selection disposes”) that is common among molecular evolutionists, in which evolution is seen as a proposal-acceptance process with a rate R_ij = u_ij * N * p_ij, where u_ij is the rate of mutation from the current state i to some alternative state j, N is the population size– thus uN is the rate of introduction of new mutants into the population– and p_ij is the probability that a newly introduced j allele will be fixed in an population of i’s (e.g., for beneficial mutants, p ~ 2s). In this framework, the natural way to define short-term “evolvability” is in terms of a sum over different values of j. A change in evolvability is understood in terms of its effects on the distribution of u’s and p’s. I’ll stop here before going further.

Arlin

Toussaint, one of the authors of the No Free Lunch Theorems, has written a thesis and several papers on the topic of evolvability. A fascinating topic which helps us understand why neutrality is such an important factor in the genetic code, RNA and DNA sequence space. See Toussaint for additional information. A nice extension of Darwinian ideas in a mathematical format. We may now understand the scale free properties of RNA and protein space much better

Pim,

Toussaint has contributed to the literature on “No Free Lunch” theorems, but the originators of the theorems are, precisely, David H. Wolpert and William G. MacReady.

I stand corrected, Toussaint’s contribution follows the original work by Wolpert and MacReady.

Recent Results on No-Free-Lunch Theorems for Optimization

and

On Classes of Functions for which No Free Lunch Results Hold

With Wolpert commenting on Dembski’s NFL, it would be interesting to hear from Toussaint as well.

Arlin: you make all good points.

Again, this is not my field, but I always thought that the evidence for exon shuffling in evolution was quite substantial. For instance, I believe exon shuffling is the main current explanation for the widespread distribution of certain conserved functional motifs and domain among otherwise structurally unrelated proteins (eg, ITIMs and ITAMs, SH2/SH3, PEST, etc). To me, this seems like an obvious way by which functional diversification of proteins can occur, by adding or replacing “modules” which have already been selected for functional significance in another context. This kind of rearrangement would seem more likely in an intron-containing organism than in an intronless one.

To reprise the “evolvability” argument (to which I give no orthogenetic implications, though I realize the term is ambiguous and has a checkered history) in your formalism, I think the presence of introns increases both u (the rate of mutations in terms of actual “shuffling” events) and p (in term of the chances that a shuffling event would generate a new protein with a selectable functional property).

Again, it is not actually the presence of introns in and of itself that would be an advantage, but the generation of new functional chimeric proteins that the presence of introns allows.

So, imagine a protein-protein interaction or regulatory network in an organism, in which interactions are mediated through specific protein motifs. To add a new node to the network, or connect two separate networks, one would need a new member protein to acquire a specific interaction motif. This can occur by aminoacid residue substitution of the protein to eventually match the motif (but that’s unlikely, especially for complex motifs), or by wholesale acquisition of an existing domain through some gene rearrangement, “shuffling” process. Intron-bearing organisms would be more likely to undergo this kind of rearrangements, and thus to eventually develop more complex and “layered” interaction and regulatory networks.

First, I was not disputing that, in eukaryotes, chimaeric genes may arise by illegitimate recombination bringing together blocks of exons from different sources, i.e., “exon shuffling”. The evidence for this sort of thing has been very clear for maybe 15 years (for more info, find a review by Laszlo Patthy, the world’s foremost expert, e.g., search Patthy-L [auth] on PubMed). However, this fact tells us little, since chimaeric genes also arise in prokaryotes without the use of introns.

Second, I don’t see why (in the context of our formalism of defining ‘evolvability’ as a sum over origin-fixation terms R = u_ij * p_ij) you think that “the presence of introns increases both u (the rate of mutations in terms of actual “shuffling” events) and p (in term of the chances that a shuffling event would generate a new protein with a selectable functional property)”. ‘Exon shuffling’ means rearranging the exonic parts of the gene by recombination within introns. In exon shuffling, the presence of an intron between sites n and n+1 is operationally no different than the presence of a recombination hotspot at the phosphodiester bond between n and n+1. So, don’t the introns just increase (relatively) the value of u_ij for some events, and decrease it for others? How can they effect the p’s?

Arlin

However, this fact tells us little, since chimaeric genes also arise in prokaryotes without the use of introns. Correct, but the question is, can they occur more frequently if introns are present? I don’t know much about the bacterial proteome, but in mammals “shuffled” proteins with multiple domains, apparently picked here and there, are all over the place.

In exon shuffling, the presence of an intron between sites n and n+1 is operationally no different than the presence of a recombination hotspot at the phosphodiester bond between n and n+1. So, don’t the introns just increase (relatively) the value of u_ij for some events, and decrease it for others? Uhm… If the chance of recombination between 2 adjacent base pairs is roughly equal throughout the genome, introns would increase the intragenic recombination frequency simply by increasing the number of potential recombination sites. However, we know that recombination frequencies are not uniformly distributed, because of hotspots. But hotspots are often associated with specific sequence motifs, such as repetitive elements, which are more common in intronic sequences that within coding sequences. So, assuming this can be tested, I would not be surprised if recombination events occurred more frequently in introns than in exons, and in intron-bearing genes more than in intronless genes.

How can they effect the p’s? I can perhaps see 3 potential mechanisms. 1. Illegitimate recombination events are often associated with loss/duplications of nucleotides at the recombination sites. If a new exon is plopped within an intron, assuming it is in frame (see below), the structural change to the resulting protein will only consist of the new aa sequence encoded by the exon. On the other hand, if a recombination events falls within the coding sequence, indels at the recombination site would add to that the risk of disruption of the original protein’s existing domain structures, affecting folding, original activities etc. 2. If exons tend to - more or less - follow domain/motif organization (i.e., if exons encoding one or few discrete domains/motifs were more frequent than “hybrid” exons encoding partial sequences for one or two domains)(*), then exon shuffling would mostly result in addition/replacement of domains/motifs leaving existing neighboring domains/motifs intact. On the other hand, insertion of partial domains/motifs within truncated/split domain/motifs is certainly more likely to be disruptive. 3. Recombination within intronless genes result in 2/3 of rearrangements being out-of-frame. In organisms in which, one way or another, intron phase preferences are established, this frequency would be somewhat lower. For instance, given the 5:3:2 intron phase bias from de novo insertion, as mentioned in your paper, in-frame events would be 38%, instead of 33%. A small increase, but who knows.

(*) I don’t know if this is the case in general, though it obviously is for some proteins, eg, immunoglobulin superfamily members. Note also that this would be a self-reinforcing rule: exons which contain discrete domains would be more likely to be functional after “transplant” into other genes, and thus would potentially become more common (if their selectable function is widely useful).

Dang, I didn’t realize we implemented QuickCode already. Please read this post instead of the previous one, in which quotes are not highlighted.

“However, this fact tells us little, since chimaeric genes also arise in prokaryotes without the use of introns.” Correct, but the question is, can they occur more frequently if introns are present? I don’t know much about the bacterial proteome, but in mammals “shuffled” proteins with multiple domains, apparently picked here and there, are all over the place.

“In exon shuffling, the presence of an intron between sites n and n+1 is operationally no different than the presence of a recombination hotspot at the phosphodiester bond between n and n+1. So, don’t the introns just increase (relatively) the value of u_ij for some events, and decrease it for others?” Uhm… If the chance of recombination between 2 adjacent base pairs is roughly equal throughout the genome, introns would increase the intragenic recombination frequency simply by increasing the number of potential recombination sites. However, we know that recombination frequencies are not uniformly distributed, because of hotspots. But hotspots are often associated with specific sequence motifs, such as repetitive elements, which are more common in intronic sequences that within coding sequences. So, assuming this can be tested, I would not be surprised if recombination events occurred more frequently in introns than in exons, and in intron-bearing genes more than in intronless genes.

“How can they effect the p’s?” I can perhaps see 3 potential mechanisms. 1. Illegitimate recombination events are often associated with loss/duplications of nucleotides at the recombination sites. If a new exon is plopped within an intron, assuming it is in frame (see below), the structural change to the resulting protein will only consist of the new aa sequence encoded by the exon. On the other hand, if a recombination events falls within the coding sequence, indels at the recombination site would add to that the risk of disruption of the original protein’s existing domain structures, affecting folding, original activities etc. 2. If exons tend to - more or less - follow domain/motif organization (i.e., if exons encoding one or few discrete domains/motifs were more frequent than “hybrid” exons encoding partial sequences for one or two domains)(*), then exon shuffling would mostly result in addition/replacement of domains/motifs leaving existing neighboring domains/motifs intact. On the other hand, insertion of partial domains/motifs within truncated/split domain/motifs is certainly more likely to be disruptive. 3. Recombination within intronless genes result in 2/3 of rearrangements being out-of-frame. In organisms in which, one way or another, intron phase preferences are established, this frequency would be somewhat lower. For instance, given the 5:3:2 intron phase bias from de novo insertion, as mentioned in your paper, in-frame events would be 38%, instead of 33%. A small increase, but who knows.

(*) I don’t know if this is the case in general, though it obviously is for some proteins, eg, immunoglobulin superfamily members. Note also that this would be a self-reinforcing rule: exons which contain discrete domains would be more likely to be functional after “transplant” into other genes, and thus would potentially become more common (if their selectable function is widely useful).

One possible mechanism that can distinguish introns from run-of-the-mill rearrangements as far as promoting things like exon shuffling is described by Miriami et al. (Regulation of splicing: the importance of being translatable. RNA 10, 1-4, 2004). Briefly (and simplistically), it seems that, when given a choice of splice site consensus sequences, splice site choice is affected by the ultimate effect on open reading frame - splice sites that result in premature termination codons are not selected, while those that permit orf preservation are. I think the phenomenon is still pretty poorly understood, and, to my knowledge, no one has explored the evolutionary ramifications of it. But it seems (naively) to me that some mechanistic linkage between splice site choice and orf preservation could buffer the deleterious effects that would accompany the joining of exons with otherwise different reading frame registers.

Let me make this more simple. Here is an event of intron-mediated gene fusion represented symbolically, where each letter indicates a nucleotide or a block of nucleotides, and the lower-case i’s are the introns:

EEEiiiFFFF X GGGiiiiiiHHHH = EEEiiiiHHHH + GGGiiiiiFFFF

The “X” indicates a break-and-rejoin event, i.e., an event of ‘conservative’ recombination. Now, if we can arbitrary break and rejoin sequences, we can also do this:

EEEFFFF X GGGHHH = EEEHHHH + GGGFFFF

That is, we can break and rejoin an intronless gene. The difference is in how likely is this outcome. To address this quantitative issue, we may begin with the assumption that, whatever the mechanism of break-and-rejoin recombination, its rate is an increasing function of L1 and L2, where these are the lengths (in inter-nucleotide sites) of the regions to be recombined.

Lets consider a numeric example. Assume that we are going to break and rejoin the following two sequences, where each letter is a single nucleotide:

AAAiiiiiiiiiiiiiiiiiiiBBBiiiCCC DDDiiiiiiiiiEEEiFFF

The top gene has 3 + 3 + 3 = 9 exon nucleotides and 19 + 3 = 22 intron nucleotides. The bottom gene has 9 exon nucleotides and 9 + 3 = 12 intron nucleotides. We can break a gene at any inter-nucleotide site, so there are 30 places to break the first gene, and 20 places to break the second one, or 20 * 30 = 600 possible break-and-rejoin events.

Of these events, the largest equivalence class is the class of events that produces AAAEEEFFF and DDDBBBCCC, which can be achieved by breaking the first gene at any of the 20 inter-nucleotide sites of intron 1, and by breaking the second gene at any of the 10 inter-nucleotide sites in intron 1. This event then has a relative probability of 20*10/600 = 0.33. That is, one-third of all break-and-rejoin events will produce this outcome.

By contrast, in the intronless case, we have 8 * 8 = 64 possible events, and the event described above is only one of these, thus it has a probability of only 1 in 64, or 0.016.

Arlin

Hi Arlin,

I may be missing something, but it does not seem as if your simple example takes into account the possibility that different exons will have different reading frame registers.

Pat, you are right that my simple example does not address different reading frame registers, or what we often call the “phase” of the intron. In crossing these two genes:

AAAiiiiiiiiiiiiiiiiiiiBBBiiiCCC DDDiiiiiiiiiEEEiiiFFF

we do not have to worry about reading frames (when breaking-and-rejoining within introns), because each exon has one complete codon, and all the introns are between codons (“phase 0” introns). But if the location of the first intron in the first gene is shifted 1 bp upstream, like this:

AAiiiiiiiiiiiiiiiiiiiABBBiiiCCC DDDiiiiiiiiiEEEiiiFFF

then a break-and-rejoin recombination between the first introns of each gene results in a mRNA with the sequence AAE EEF FF or DDD ABB BCC C (putting spaces between the triplets for clarity).

The probability that the reading frame register will be preserved is easy to calculate. For break-and-rejoin of intronless genes, it is simply 1/3. If we are limiting ourselves to the case in which the breaks occur within introns in intron-containing genes, then the chance of maintaining the reading frame is the sum of squares of phase frequencies for all three phases, that is, Sum_p( freq_p^2 ).

Interestingly, 1/3 is the minimum value of this function, i.e., the minimum occus when phase frequencies are uniform. The more non-uniform the phase frequencies, the greater the probability that a randomly chosen pair of introns has the same phase. For eukaryotic protein-coding genes, the phase frequencies are about 0.5, 0.3, and 0.2, so the chance of a random match, computed as the sum of squares, is 0.25 + 0.09 + 0.04 = 0.38, slightly higher than 1/3.

Arlin

About this Entry

This page contains a single entry by John M. Lynch published on March 25, 2004 9:20 AM.

The Irony of Our Site Statistics was the previous entry in this blog.

His Holiness Rael endorses Intelligent Design is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.361

Site Meter