A few months ago, we were looking at the concept of a fitness landscape and how new technologies are creating opportunities for biologists to look in detail at relationships between genetics and fitness. The first post discussed the concepts of a fitness landscapes and adaptive walks, with some focus on the limitations of the metaphor. The second post summarized some recent work on bacterial fitness and mutation rates, with the concept of a fitness landscape as a theme, and the third post reviewed another recent paper, one that described techniques for studying fitness landscapes in detail by linking protein function (which can be screened and/or selected) and genetic information. Here we’ll look at yet another approach to the problem, in which the subject of the analysis is not an organism (as in the first paper) or a protein (as in the second paper) but an RNA molecule.
Recall that Fowler et al. set out to design a system in which one can study a protein’s function (its “fitness”) as it varies in sequence. The idea is to look at all (or at least nearly all) of the variants of a particular protein to see how well each one works, and then to map this measure of fitness onto the sequence space of the protein. Such a map would be a form of fitness landscape. Fowler and colleagues (henceforth called the UW group) used a previously-known technique (protein display) to link each variant of the protein to its gene sequence, then used next-generation gene-sequencing technology to rapidly determine the gene sequences of millions of variants.
Last October, a group at the Fred Hutchinson Cancer Research Center in Seattle reported the results of a somewhat similar experimental effort. Jason Pitt and Adrian Ferré-D’Amaré co-authored the paper in Science, and their title identified their research objective: “Rapid Construction of Empirical RNA Fitness Landscapes.” The first couple of sentences of the abstract should sound familiar by now. (The genotype is the gene sequence. The phenotype is the function.)
Evolution is an adaptive walk through a hypothetical fitness landscape, which depicts the relationship between genotypes and the fitness of each corresponding phenotype. We constructed an empirical fitness landscape for a catalytic RNA by combining next-generation sequencing, computational analysis, and “serial depletion,” an in vitro selection protocol.
And they identify two major challenges, both of which we have already discussed:
First, even for macromolecules of modest length, the sequence space is vast; a 20-mer RNA or protein has ~1012 or ~1026 possible sequences, respectively. Second, to characterize the landscape, the phenotypic fitness of each individual genotype needs to be measured, or an indirect measure of fitness needs to be validated.
The authors tackled the challenges using a strategy very similar to that of the UW group: first they designed a functional screen, a way to subject an enormous population of variants to a gauntlet of selection, so that the population would be altered in structure after each round of selection. Think of it as evolution in a test tube. But the UW group had a problem that Pitt and Ferré-D’Amaré didn’t have to worry about: the linkage of protein function with the underlying gene sequence. Why the difference? Pitt and Ferré-D’Amaré didn’t study protein. They studied RNA - specifically, they analyzed the function of a ribozyme, which is a molecule of RNA that is capable of altering chemical reactions the way protein enzymes do. This means that there was no translation problem for them, since the gene sequence (the base sequence of the RNA) also comprises the structure of the molecule that is being functionally assessed.
So, like the UW group, they took a known molecule and made zillions of variants, through the use of random mutation. Then they assessed the function of the variants by putting them into pools (huge groups) and forcing them to compete with each other. (The competition involved binding to a specific target; the UW group used a similar approach.) Each round of competition (selection) led to the pool being enriched for “functional” molecules. And, importantly, they demonstrated that the binding competition really does select for function; that is, the selection process is enriching for higher “fitness.” After selection, they saw the enrichment that they expected: random sequences (added as a control) were depleted, whereas sequences very similar to the known “normal” sequence were enriched. And, interestingly, lots of other sequences were intermediate between those. Now, how can we graphically depict this? Pitt and Ferré-D’Amaré decided to plot the rate of change in frequency over time for each genotype (i.e., for each variant as identified by sequencing) against a representation of genotype space. The challenge of representing genotype space, or sequence space, is daunting: it will hardly do to put every sequence onto the axis of a graph. So the authors devised a similarity score as an indicator of sequence space, with the known normal sequence as the standard for comparison. Their empirical fitness landscape, from Figure 2B, is on the right.
Each dot is a single sequence. (Actually, each dot is a whole set of sequences that have the same similarity to the reference sequence. In Figures 1C and 1E the authors introduce another dimension to show the spread that each dot represents.) The green dots show enrichment of sequences after 1 minute of competition; the reference sequence is on the far right, such that the steeply-sloping peak on the far right represents sequences that are similar to that reference sequence. As we might expect, the more similar a sequence is to the reference sequence, the more “fit” it is (in general). Fitness is indicated by extent of enrichment, which the authors term “fecundity.” The magenta dots represent not enrichment, but depletion; in a reciprocal experiment, the investigators removed the most fit molecules from the pool by subtracting the best-binding population from the pool. Notice that the depletion landscape is basically a mirror image of the enrichment landscape, as we would expect if the process is truly selecting based on binding activity.
There’s a lot of data in that graph. Here’s how the authors describe the result:
…the fecundity of any individual sequence provides a metric of its fitness, and we can create an experimental fitness landscape composed of ~107 different RNA genotypes in a single experiment.
And yet the picture is a vast oversimplification of that huge data set. For one thing, the graph provides no specific sequence information even though the sequence of every one of those 10 million variants is known. Pitt and Ferré-D’Amaré write:
The empirical fitness landscape we generated is a high-dimensional object. We visualized it by computing the information content per residue of the master sequence, in essence projecting the landscape onto the ribozyme sequence.
The resulting visualization (in Figure 4A) is a heat map of the actual structure of the catalytic RNA. It’s simpler than it seems: each base in the RNA is colored according to information content as indicated on the color scale. More information means more diversity at that position; low information content means that there is little diversity at that position, indicating strong conservation due to functional constraint. The graph seems utterly unlike the topographical landscape that Sewall Wright sketched, but it’s a fitness landscape nonetheless, made possible by the creativity of Pitt and Ferré-D’Amaré and by the power of next-generation sequencing.
So, we’ve looked at three significant articles in the last year or so on fitness landscapes, in which talented scientists explored the relationships between genotype and phenotype, on scales barely imaginable just a decade ago. All three studies were carried out in Seattle, Washington, within just a few miles of Biologic Institute, where the scientists of the intelligent design movement work on questions of the same ilk. If those scientists really want to be taken seriously, if they really seek to understand how structure and function and evolution are related, they’ll have to understand fitness landscapes and their experimental applications. Fortunately, they can find some of the world’s experts on that very subject right in their own backyard. Whether that amounts to tragic irony or a golden opportunity is a choice for the intelligent design apologists of the Seattle area. May they choose wisely.
[Cross-posted at Quintessence of Dust.]
Pitt, J., & Ferre-D’Amare, A. (2010). Rapid Construction of Empirical RNA Fitness Landscapes. Science, 330 (6002), 376-379 DOI: 10.1126/science.1192001