(The following is a slight adaptation of this essay. Readers may post questions and/or comments there as well as here.) As this series of essays has explained, the polyadenylation of messenger RNAs is a vital aspect of gene expression in eukaryotic cells (and a not-so-unimportant facet of RNA metabolism in other contexts). Polyadenylation is mediated by a sizeable complex that includes various RNA-binding proteins, nucleases, and other interesting activities. Genetic studies in yeast indicate that virtually every subunit of the core complex is essential - for viability and for pre-mRNA processing and polyadenylation in vitro and in vivo. (This review is freely available and serves as a good starting point for readers who wish to explore the subject further.) Biochemical and/or immunological depletion studies reveal a similar scenario in mammals, and a less-expansive set of studies suggests that a similar rule of thumb will apply in plants. The bottom line of all of this is that almost all of the subunits of the polyadenylation complex seem to be essential - remove one, and the complex cannot function. In the vernacular of a proponent of intelligent design, the polyadenylation complex would seem to be irreducibly complex.
It is in this context that the recently-completed genome of the parasitic organism Giardia lamblia enters the fray. Last year, the complete sequence of G. lamblia, some 12 million base pairs, was determined and analyzed. The authors of the study published in Science noted a number of interesting things - a preponderance of genes encoding protein kinases, evidence for substantial horizontal gene flow from bacteria and archaebacteria, and a streamlined core gene expression machinery (transcription and RNA processing). This streamlining is especially notable in the case of the polyadenylation machinery. Remarkably, of all the subunits in the yeast complex, genes for only three* can be found in G. lamblia (see the figure that follows this paragraph - adapted from Fig. 1 of Morrison et al.).
Naturally enough, one of these is the poly(A) polymerase (PAP). The other two polyadenylation-related proteins encoded by the G. lamblia genome correspond to Ysh1 and Yth1 (whose mammalian counterparts are CPSF73 and CPSF30, respective). Interestingly, as summarized here, these two subunits are the two to which nuclease activity has been ascribed. Also interestingly, the only RNA binding subunit amongst those seen is Yth1 (=CPSF30). Other subunits are missing. Thus, no other RNA binding subunits are apparent, none of the scaffolds (CPSF160/Yhh1, CstF77/Rna14, Fip1, symplekin/Pta1) are seen, and most of the subunits that have been shown to interact with the transcription complex (CPSF100/Ydh1, CstF50, and Pcf11, to name three) are absent. Indeed, entire complexes (CstF, CFmI, CFmII) appear to be missing.
What might these startling omissions mean? One possibility is that functional counterparts for most of these proteins exist, but that they have diverged so extensively as to be unrecognizable. This might be the case for some of the missing proteins, but many of these are so highly-conserved between plants and animals that this seems an unlikely explanation.
Another possibility is that mRNAs are in fact not polyadenylated in G. lamblia. This is apparently not the case, as cDNAs can be prepared using the usual methods (priming reverse transcription with oligo-dT). Moreover, these cDNAs have untemplated poly(A) tracts, and some limited sequence-gazing can identify a putative polyadenylation signal.
Neither of these possibilities seems likely. Which leaves us with the remarkable likelihood that mRNA polyadenylation in Giardia is mediated by a highly-reduced complex of but 3 proteins. This in turn brings us to some fascinating discussion, about both function and evolution.
First, about function. Absent some studies dedicated to polyadenylation mechanisms in Giardia, it’s hard to make sense of the absence of so many essential components of the polyadenylation apparatus. But the fact that the Giardia complex consists of the two known endonucleases is interesting, as it suggests that the very core of the complex in eukaryotes is an endonucleolytic one. It also suggests that, as we peer ever more closely into the complex in other organisms, these two subunits will attract more attention. Other questions about RNA recognition and of links with transcription and splicing also come to mind. For example, might the RNA-binding activity of Yth1/CPSF30 play a more prominent role in polyadenylation signal recognition than has been assumed? Is there an obligatory link between transcription and polyadenylation? If so, what is the link in Giardia, and what might this suspected mechanism tell us about the analogous link in other eukaryotes? Etc., etc., etc.
Which brings us to the evolution of the complex. Giardia has gained some notoriety of sorts, having been identified at times as a very primitive, pre-mitochondrial eukaryote, or as a still-primitive eukaryote that lost its mitochondria. These two scenarios regarding the mitochondria of Giardia give us a similar set of contrasting pathways regarding the evolution of the polyadenylation complex. One scenario would be that the Giardia polyadenylation complex resembles the primordial eukaryotic complex, that the first polyadenylation apparatus consisted of little more than a nuclease and a polymerase. The complex we seen in other eukaryotes would be derived from a series of co-options, recruitments, and duplication events, all building on this simple beginning. Of course, the most exciting aspect of this scenario is that it gives us a remarkably clear link to nucleolytic activities in bacteria; this follows from the structural and functional similarities between CPSF73/Ysh1 and RNAse J (noted here).
The alternative is that the Giardia complex has lost most of the subunits that we see in other organisms. This seems unlikely, given the essential nature of most of the subunits in yeast. However, some differences in this regard exist between yeast and other eukaryotes; thus, Yth1 is essential in yeast, but its Arabidopsis counterpart is dispensible for viability. In any case, this alternative would provide us with a clear example of how extensively an irreducibly complex mechanism can evolve.
Hopefully, this essay has taught readers a thing or two. More importantly, in the best of cases, it has raised a number of questions. There may be some answers, but for many of these there await much experimentation and exploration.
Morrison, H.G., McArthur, A.G., Gillin, F.D., Aley, S.B., Adam, R.D., Olsen, G.J., Best, A.A., Cande, W.Z., Chen, F., Cipriano, M.J., Davids, B.J., Dawson, S.C., Elmendorf, H.G., Hehl, A.B., Holder, M.E., Huse, S.M., Kim, U.U., Lasek-Nesselquist, E., Manning, G., Nigam, A., Nixon, J.E., Palm, D., Passamaneck, N.E., Prabhu, A., Reich, C.I., Reiner, D.S., Samuelson, J., Svard, S.G., Sogin, M.L. (2007). Genomic Minimalism in the Early Diverging Intestinal Parasite Giardia lamblia. Science, 317(5846), 1921-1926. DOI: 10.1126/science.1143837
* - in the figure from Morrison et al., Pab1, RNA polymerase II and Glc7 are also noted as polyadenylation factor subunits. Pab1 is one of two poly(A)-binding proteins that plays roles in polyadenylation in yeast; this protein, as well as the G. lamblia protein identified in this study, is distinct in sequence and domain composition from the nuclear poly(A) binding proteins seen in mammals and plant. RNAP II is so considered because it is a scaffold of sots, upon which numerous other polyadenylation factors assemble; this function is needed for efficient polyadenylation. Glc7 is a protein that consistently purifies with the yeast polyadenylation complex. As it is not considered historically to be a core polaydenylation complex subunit, I have not elaborated on it in this essay.
[updated on Aug 30 - Carl Zimmer noted, if more briefly, the curious reduction of the Giardia polyadenylation complex in this essay. I didn’t know of this until today, but thought it appropriate to mention.]