Phyloseminar: Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes

Next up on phyloseminar:

Mike Lin speaks Tuesday, April 26th at 12pm PST on “Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes”

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes—especially at synonymous sites. We developed a method to systematically locate short regions within known ORFs that show conspicuously low estimated rates of synonymous substitution, based on phylogenetic codon rate models and likelihood ratio tests.

We applied this method to genome alignments of 29 placental mammals, resulting in more than 10,000 “synonymous constraint elements” (SCEs) with resolution down to nine-codon windows. These are found within more than a quarter of all human protein-coding genes and contain ~2% of their synonymous sites. We collected numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. We also ruled out certain alternative explanations such as codon usage bias and neutral rate variation.

Our initial results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape. Furthermore, anticipating the future availability of additional mammalian and vertebrate genomes, we are currently developing Bayesian codon modeling methods to measure synonymous rates at even higher resolutions, perhaps eventually allowing the detection of individual regulator binding sites embedded in protein-coding ORFs.

Japan 04:00 (04:00 AM) on Wednesday, April 27
New Zealand 07:00 (07:00 AM) on Wednesday, April 27
West Coast USA 12:00 (12:00 PM) on Tuesday, April 26
East Coast USA 15:00 (03:00 PM) on Tuesday, April 26
England 20:00 (08:00 PM) on Tuesday, April 26
France 21:00 (09:00 PM) on Tuesday, April 26