More on the origination of new protein-coding genes

Recently, we learned of an instance of the de novo origination of a new protein-coding gene in yeasts. This instance involved a mechanism or pathway that seems difficult to some, namely the random appearance of an open reading frame in an otherwise noncoding segment of DNA via judicious appearance of translation start and stop codons. The question naturally arises as to the relevance of such a pathway to real-life biology; was/is this a rather rare event that doesn’t really contribute to protein evolution, or is it a common means by which the protein-coding capacity of a genome is augmented?

A paper that is in press in Genome Research (Zhou et al., “On the origin of new genes in Drosophila”) gives us some insight into this question. The abstract of this paper summarizes things as well as I can:

Several mechanisms have been proposed to account for the origination of new genes. Despite extensive case studies, the general principles governing this fundamental process are still unclear at the whole genome level. Here we unveil genome-wide patterns for the mutational mechanisms leading to new genes, and their subsequent lineage-specific evolution at different time nodes in the D. melanogaster species subgroup. We find that, 1) tandem gene duplication has generated about 80% of the nascent duplicates that are limited to single species (D. melanogaster or D. yakuba); 2) the most abundant new genes shared by multiple species (44.1%) are dispersed duplicates, and are more likely to be retained and be functional; 3) de novo gene origination from non-coding sequences plays an unexpectedly important role during the origin of new genes, and is responsible for 11.9% of the new genes; 4) retroposition is also an important mechanism, and had generated approximately 10% new genes; 5) about 30% of the new genes in the D. melanogaster species complex recruited various genomic sequences and formed chimeric gene structures, suggesting structure innovation as an important way to help fixation of new genes; and 6) the rate of the origin of new functional genes is estimated to be 5 to 11 genes per million years in the D. melanogaster subgroup. Finally, we survey gene frequencies among 19 strains from all over the world for D. melanogaster-specific new genes, and reveal that 44.4% of them show copy number polymorphisms within population. In conclusion, we provide a panoramic picture for origin of new genes in Drosophila species.

To be brief, I’d point out two things:

  • First, with regard to the earlier essay, the mechanism for gene origination described by Cai et al. would seem to be a significant contributor to new genes in the course of evolution (being responsible for almost 12% of the new genes identified by Zhou et al.). This answers the question I pose above – this mechanism is not impossibly rare, but a significant way by which new genes arise.
  • Second, almost 1/3 of the new genes identified by Zhou et al. are chimeras that involve the cobbling together of different sequences much as occurred with the origination of T-urf13. This brings into greater prominence this latter example of de novo origination of new genes.

There is much more to Zhou et al. than these points, and I would encourage readers to read the paper (in preprint form, as I have done, or in a more final form once it is processed by Genome Research). This is the best way to appreciate that this one pillar of ID thought, that new protein-coding genes cannot arise by “natural” means, is an illusion.

(This essay may also be found here.)