Evolution of “Hello World” using random mutation and selection

| 145 Comments

Sometimes serendipity presents you with an opportunity to educate those who are confused by the claims of Intelligent Design and somewhat unfamiliar with evolutionary theory. So let me start with the answer and then look at the question.

Genetic Programming (GP) has a proven capability to routinely evolve software that provides a solution function for the specified problem. Prior work in this area has been based upon the use of relatively small sets of pre-defined operators and terminals germane to the problem domain. This paper reports on GP experiments involving a large set of general purpose operators and terminals. Specifically, a microprocessor architecture with 660 instructions and 255 bytes of memory provides the operators and terminals for a GP environment. Using this environment, GP is applied to the beginning programmer problem of generating a desired string output, e.g., “Hello World”. Results are presented on: the feasibility of using this large operator set and architectural representation; and, the computations required to breed string outputting programs vs. the size of the string and the GP parameters employed.

Genetic Evolution of Machine Language Software Ronald L. Crepeau NCCOSC RDTE Division San Diego, CA 92152-5000 July 1995

The Results?

From Figure 5 it can be seen that this run achieved a correct output (fitness = 352) at about 150,000 spawnings (100 to 1200 generations). By about 450,000 spawnings, the agent was composed of less than 100 instructions. Ultimately, the agent size reduced to 58 instructions before the process was terminated.

Next question?

Of course, the question proposed at Uncommon Descent is flawed for many reasons. In fact, an unfamiliarity with evolutionary theory combined with a false analogy quickly results in what is known as a strawman argument.

GilDodgen Wrote:

What is the probability of arriving at our Hello World program by random mutation and natural selection?

Source

Pretty darn good as I have shown.

GilDodgen Wrote:

How many simpler precursors are functional, what gaps must be crossed to arrive at those islands of function, and how many simultaneous random changes must be made to cross those gaps? How many random variants of these 66 characters will compile? How many will link and execute at all, or execute without fatal errors? Assuming that our program has already been written, what is the chance of evolving it into another, more complex program that will compile, link, execute and produce meaningful output?

Source

I’d love to see some research in this area. Appeal to ignorance is not really that appealing to me. These are excellent questions and should be answered before rejecting the plausibility in a somewhat ad hoc fashion. It’s time to abandon these ‘just so stories’ and do some real scientific work.

Of course, one may object to my choice of method and one may raise a myriad of objections based on the (unjustified) claim that the method required significant intelligent design or the fact that the fitness function is smooth, and so on but it shows that under ‘reasonable assumptions’ natural selection and variation can indeed create the required output string. In fact, this is hardly surprising given the state of knowledge about evolutionary computing.

Perhaps, it would be better if ID activists would present an argument based on an analogy which shows at least some minimum similarity with evolution such as for instance a redundant genotype-phenotype mapping, self-replications, and a way to introduce selection into the process in an acceptable format or replacing a single fixed goal with a more realistic evolutionary goal. As Lenski and others have shown however, that the processes of variation and chance can indeed generate complexity and even irreducibly complex systems.

In the end the question is not much dissimilar from Dawkin’s “Weasel” example and thus all known limitations apply. So what have we learned from this example?

  • That a quick google search can once again answer many of the questions
  • Intelligent Design once again excels at creating strawmen
  • Intelligent Design once again lacks scientific relevance
  • In fact, most of the argument was based on one from ignorance
  • GilDodgen Wrote:

    I can’t answer these questions, but this example should give you a feel for the unfathomable probabilistic hurdles that must be overcome to produce the simplest of all computer programs by Darwinian mechanisms.

While indeed the hurdles may appear intuitively to be unfathomable, it quickly becomes clear that a combined process of variation AND selection can be very efficient in overcoming these hurdles. Dawkins already showed this several decades ago.

145 Comments

I was struck by the relationship between this posting and the previous one on Kirschner’s work. If I understand Kirschner correctly he is proposing that “random” variation might be biased towards mutations that are at the subroutine or object level rather than the individual instruction. Which I guess would make the evolution of quite complex programs much quicker.

GilDodgen muddies the waters further when he asks “How many simpler precursors are functional?” Translated to biology that can be seen as asking “what was the first self-replicator (or other-replicator plus symbiosis)?”

This changes the problem into a question of abiogenesis rather than of evolution with a given replicator. Genetic algorithms all assume self-replication in their modeling of biology.

Here’s another distortion: “what gaps must be crossed to arrive at those islands of function,…?”

The term “islands of function” is also misleading. Drain his pool a little and instead of islands you get mountain ranges and hilly valleys that let a “hill-climbing” algorithm do its work.

Island hopping algorithms might need foresight, hill climbers can be blind.

http://gaul.sourceforge.net/tutoria[…]limbing.html

GilDodgen (apparently a member of the increasing desperate Dembski personality cult) wrote:

What is the probability of arriving at our Hello World program by random mutation and natural selection?

Once again demonstrates the failure of the Infinite monkey theorem

The theorem graphically illustrates the perils of reasoning about infinity by imagining a vast but finite number. If every atom in the Universe were a monkey producing a billion keystrokes a second from the Big Bang until today, it is still very unlikely that any monkey would get as far as “slings and arrows” in Hamlet’s most famous soliloquy.

It does appear however that a finite number of atoms (plus or minus a few on a daily basis) arranged in an evolved ape called William Shakespeare had very little trouble producing that line and no amount of evolving will allow some people to see that.

Gil continues and ponders the enormity of death before enlightenment,:

I can’t answer these questions, but this example should give you a feel for the unfathomable probabilistic hurdles that must be overcome to produce the simplest of all computer programs by Darwinian mechanisms.

“The fault, dear Brutus, is not in our stars, But in ourselves, that we are underlings.”

GilDodgen muddies the waters further when he asks “How many simpler precursors are functional?” Translated to biology that can be seen as asking “what was the first self-replicator (or other-replicator plus symbiosis)?”

I think that Gil did not fully think through his example and failed to appreciate the requirement for a self replicating system to arise before evolution can take over but I think that can be relatively easily rectified.

In the end all comes down to the shape of the “fitness landscape” and without looking at actual examples, it is hard to make any specific claims about such landscapes. For RNA for instance it seems that the landscape is quite open to evolution. For protein the concept of ‘holey landscapes’ makes evolution almost inevitable (Gavrilet).

staring, intrigued, at the description “Dembski personality cult”. can anybody think of an example of a personality cult which revolved around a smaller amount of personality?

I fail to see how this ‘hello world’ program can be taken as an example of evolution by mutations and natural selection. AFAIK, this process is not supposed to reach any objective or solution.

The probability to produce Homo sapiens from Homo habilis with RM + SN is almost zero, of course. This number is also completely irrelevant.

???

one may raise a myriad of objections based on the (unjustified) claim that the method required significant intelligent design

I’m always amused by that claim. It’s like saying that, by the mere act of painting a bunch of concentric circles on the wall, you’ve given someone all they need to be a master archer.

Good to see that GP research is feeding back into the biology field. This kind of experiment is directly testing Darwin’s hypothesis in the true scientific spirit and method: predict (there exists a system that, implementing Darwin’s evolutionary rules, can produce a behavior meaningful to us but not to the system), design build and run the experiment so that all factors other than those operated on by the process under test are known and fixed, and see it produce the hypothesized result. Very nice.

Somewhat irrelevant are the questions about the logical level chosen for the experiment - could be any level, or all, from atomic through processor architecture thru instructions through algorithmic entities. In life all levels are simultaneously undergoing select and test. But in an experiment, it is only useful when the questioned independent variables are the only ones manipulated, in order to show the relationships. So choosing the highest level while keeping the lower levels fixed and known is the way. Point of interest in scale: this result is obtained on a very small collection of parallel processes and spawnings. Scale up several trillion trillion times in both number of parallel experiments and run time and you begin to approach what actually happened here on Earth.

Now I’ll bet the virtual machine itself underwent evolutionary changes along the path to its final design - that is the normal creative process at work, and anyone whose life work involves the development of ideas is familiar with this, though most probably do not (yet) relate their intellectual experience to it’s being a direct experience of the evolutionary process in fact, not merely a passing fancy.

Jeannot says:

“I fail to see how this ‘hello world’ program can be taken as an example of evolution by mutations and natural selection.”

“GEMS employs mutation of two types.” (Crepeau, p5.)

“As each off-spring is bred, it is evaluated for insertion in the pool using a modification of the process which [Altenberg 1994] calls “upward mobility” selection.” (Crepeau, p4.)

“AFAIK, this process is not supposed to reach any objective or solution.”

The immediate objective is to increase fitness for survival to reproduce. What that means varies as the species and environment changes so there is no overall objective.

“The probability to produce Homo sapiens from Homo habilis with RM + SN is almost zero, of course. This number is also completely irrelevant.”

Exactly. We know it happened, however unlikely it was. We also know evolution happens, so there is no probability associated with that either. What was your point to raising this straw man?

“???”

:| :| :| !!!

Mark, Re Kirchner, Crepeau says that he thinks his complicated multiple instruction environment succeeds in outputting strings because he intentionally specified a minimum amount of input and output instructions. He also speculates that his phase 2 of shortening the successful programs will be better if he intentionally specifies a minimum amount of halt instructions at this phase.

Corkscrew, A nice view and metaphor.

The claim is also misunderstanding experiments. All our experiments, instruments or data handling are designed in some manner or they wouldn’t give results. That doesn’t make them illustrations of creationist theory. It is analogous ‘thinking’ to their requirement for supernatural explanations. It is also wrong to single out biology.

You are right, it is laughable.

Torbjörn, you didn’t understand me. But my English might not be perfect. Evolution doesn’t have any specific objective nor problem to resolve. A fitness increase is not a specific objective, contrary to the program described here.

Using this environment, GP is applied to the beginning programmer problem of generating a desired string output, e.g., “Hello World”.

Therefore, I concluded that this computer program is totally irrelevant to evolutionary biology. Calculating the chance to reach the state “hello world” is as useless as calculating the probability for Homo habilis to become H. sapiens.

Or did I miss something ?

staring, intrigued, at the description “Dembski personality cult”. can anybody think of an example of a personality cult which revolved around a smaller amount of personality?

In the country of the blind, the one-eyed man is king – Erasmus

staring, intrigued, at the description “Dembski personality cult”. can anybody think of an example of a personality cult which revolved around a smaller amount of personality?

In the country of the blind, the one-eyed man is king – Erasmus

Evolution doesn’t have any specific objective nor problem to resolve. A fitness increase is not a specific objective, contrary to the program described here.

Fitness, although difficult for humans to evaluate, is an objective concept.

A strain of bacteria that lives in a nutrient-poor but nylon-rich pool is not given a goal of evolving a nylon-digesting enzyme by some higher authority. Nevertheless, we can show that deriving energy from nylon compounds is indeed a selective criteria.

You are indeed missing something.

Caledonian, I think you missed something in what I said. But we have the same ideas. What if I told you I am a graduate student in evolutionary biology? ;-)

I said that a fitness increase was not a specific objective (goal if you prefer) similar to the ‘hello world’ program, I didn’t said it wasn’t an objective concept (that’s another matter).

I totally agree that nylonase never was an objective. Thus, it would be useless to calculate the probability to produce the nylonase gene in a initial bacterial population.

Just to make myself clear :

talkorigins Wrote:

Genetic algorithms are not perfect evolutionary simulations in that they have a predefined goal which is used to compute fitness. They demonstrate the power of random variation, recombination, and selection to produce novel solutions to problems, but they are not a full simulation of evolution (and are not intended to be). In simulations of biological evolution, fitness is evaluated only locally; survival and reproduction is based only on information about local conditions, not on ultimate goals. However, the simulations demonstrate that distant fitness peaks will be reached if there are conditions of intermediate fitness (Lenski et al. 2003). Evolutionary processes do not “search.” They respond to local fitness topography only. The fact that evolution (occasionally) reaches fitness peaks is a by-product of evolving on correlated fitness landscapes using purely local fitness evaluation, not an intended outcome.

http://www.talkorigins.org/indexcc/CF/CF011.html

Jeannot, I think you raise an important distinction, but I also think it’s a little harsh to call GAs/GP “totally irrelevant” to evolutionary biology. When we plate bacteria on nylon-rich media, we effectively define a fitness function which favors the ability to produce nylonase. When, lo and behold, nylonase-producing bacteria arise and flourish, should we not attribute this to RM/NS because we (effectively) set an artificial objective? After all, this objective is not one that would arise in nature (without manmade nylon).

I think it can be argued that in the GP example, the objective is similarly contained within the fitness function, which is this time explicitly defined; the “organisms” are not “aware” of it and cannot “consciously” work towards it. Of course, there are a lot of other issues in drawing any direct comparisons, but I think GAs can be helpful to evolutionary biologists insofar as they illustrate the general computational principles underlying RM/NS.

On the other point, I also can’t believe that at this point anyone is still parading around “Sequence length n, alphabet length k, probability 1/k^n, HA!”. It’s a total emabarassment.

Yes, I may have been a little harsh with my ‘totally irrelevant’ if these simulations can be assimilated to a climb toward an adaptive peak. But at most, regarding evolutionary biology (not informatics), these simulations are rather useless since the fundamental theorem of natural selection has been demonstrated by Fisher 76 years ago.

jeannot Wrote:

Evolution doesn’t have any specific objective nor problem to resolve. A fitness increase is not a specific objective, contrary to the program described here.

It may not have a specific objective, but evolution often does have a specific problem to be solved. For instance, say a prey animal hides in narrow gaps. In order to get more food, an predator would get a selective advantage from getting the food out of the gap. This is a specific problem to be solved. The exact biological solution to the problem varies (some birds use sticks, others use narrow beaks, chimpanzees use sticks, octopi have narrow tentacles and a highly deformable body). There are often many possible solutions to a given problem. You can easily often give artificial problems to be solved. For instance surviving an antibiotic would be a problem for bacteria to solve. There may be multiple different mutations that could allow for a solution to that problem. What could be said here is that they are supplying a specific problem to be solved. Such specific problems are extremely common in nature. Organisms that are better able to solve a specific problem, even if they are not perfect at it, would still have an advantage. Animals that can only get food from shallow or fairly wide gaps would still have an advantage over those than can’t get food out of gaps at all. Bacteria that are only partially immune to antibiotics often have a selective advantage, especially if the antibiotic treatment is ended early.

The problem they want the program to solve here may be arbitrary, but it is fundamentally no different than any other specific problem in nature that organisms have evolved to solve. In fact if there was no problem to be solved, then natural selection could not occur because the organism must already be perfectly suited to its environment. I say this is directly analogous to real-world evolution. So-called “filling an ecological niche” is to solve a specific biological problem, even if the evological niche is artifically induced. There is no purpose in it, just the selective advantage that comes from solving a problem that the competitors cannot solve or solving it better or more efficiently than they do. The solution may not be pre-ordained, in fact that whole point of evolutionary algorithms is to find a new solution to a problem. But there nevertheless are specific problems in nature that organisms have evolved to solve.

Another point, quite apart from genetic algorithms, may be worth mentioning. The beginning programmer quite likely did not get his “Hello world” program correct the first time. Then he had to make modifications to it and select the version which worked best. In other words, his program evolved. This is certainly the case in more complicated computer programs. The programmers of these take much of their code from previous programs, and even then most of their work goes into fixing bugs which were inadvertently introduced. The process is different in important ways from biological evolution, to be sure, but the process still embodies the basics of evolution: modification and selection of existing forms. In short, design is a kind of evolution. To accept design is to accept evolution, and to reject evolution is to reject design.

DaveScot responds:

DaveScot Wrote:

The response is empty. Pim Van Meurs cites a program (written by intelligent agents I presume) that can create a “Hello World” program from some unspecified genetic algorithm.

Although the actual code isn’t given, the algorithm is described in detail. On the other hand, the process used by ID’s designer remains a mystery.

DaveScot Wrote:

The way this is accomplished is not disclosed and if it were disclosed I’m sure we’d find the program is cheating by sneaking information in via the filter which ranks the “fitness” of the intermediate outputs.

Did you read the paper, Dave? It specifically states that the fitness scoring is based on the correctness of the output string. There’s nothing sneaky about it. If the fitness criteria were generated randomly and it turned out to be based on the closeness to the string “ewij fopsdajf ifjsofjeij”, then then end result would be an agent that outputs “ewij fopsdajf ifjsofjeij” and the point of the paper would be unchanged.

DaveScot Wrote:

Trial and error and the choosing of partially successful intermediate solutions requires purpose and direction. These are supplied by the programmers of the so-called genetic algorithms.

And in nature they’re supplied by environmental selection.

The frequency of structures that reproduce more efficiently increases, independently of the notions of problem and solution. These notions only exist in intelligent minds that can identify them.

Similarly, adaptive peaks don’t exist before they are reached.

But this is just a problem of semantics.

Several things pop out immediately from this data.

The first is the incredible power of a little bit of natural selection pressure.

The odds of generating the final 58 line program by random chance is truly huge (660^58). That kind of number has the flavor of the improbability numbers that the ID proponents like to throw out everyday. I doubt that there’s enough storage space on the planet to hold all those permutations.

Yet throw in a little survival-of-the-fittest pressure and you can get answers like that in a few thousand generations.

Secondly, look at the graph of fitness over time. Damned if there wasn’t a little mini “Cambrian Explosion” right at generation 400, where the program suddenly became much, much more fit in a very short span.

Oh, and the third thing is how quickly a primitive precursor of “Hello World” established itself.

And GilDodgen responds:

GilDodgen Wrote:

I would be curious to see the intimate details of the Panda’s Thumb program. I’ll bet dollars to donuts that the programmer cheated by defining intermediate fitness goals with the Hello World program in mind.

I’ll take that bet. Want to put your money (or donuts) where your mouth is, Gil?

At the very least, I think an apology is in order unless you have actual evidence that the programmer cheated.

DaveScot wrote:

Trial and error and the choosing of partially successful intermediate solutions requires purpose and direction. These are supplied by the programmers of the so-called genetic algorithms.

In these simulations, the least fit programs are removed by filters, and the more fit programs are left to breed, improving the program pool.

In the real world, the least fit animals are removed by snow leopards and the more fit animals are left to breed, improving the gene pool.

The rules are simple and no intelligence required. Deal with it.

Wrote:

The frequency of structures that reproduce more efficiently increases, independently of the notions of problem and solution. These notions only exist in intelligent minds that can identify them.

The notion of problem and solution may only exist in intelligent minds, the goal to solve the problem may exist only in intelligent minds, but the problem itself exists in nature. It is a fact that in an environment with a high concentration of an antibiotic, bacteria will die unless they evolve resistance to it. This is a natural, objective problem, outside of any intelligent mind. It was a problem bacteria have had to overcome since long before humans had evolved, and will continue to be as long as bacteria exist. The actual understanding of the problem and the ability to identify it as such is a trait of intelligent minds, but the problem itself exists independently of any intelligence. Similarly, food does exist in crevices or holes. That is a natural problem, and has been a problem since long before land animals even evolved. The ability to conceptualize it as a problem is unique to an intelligence, the desire to solve it may be unique to an intelligence, but the problem itself nevertheless exists in nature. Notions exist only in intelligent minds, but the notions can still reflect objective natural phenomenon.

In some cases, evolution may be directed towards solving a specifc problem. Bacteria that do not develop resistance to a high concentration of antibiotics do not survive, period. When forest turns to grassland, tree-dwelling apes that do not evolve the ability to move around on the grassland do not survive. Other times, it is more open. There could be a great many problems an organism could solve. There are any number of possible ways an organism could get food. They could look for food in crevices, or perhaps dig it out of the ground, or catch it on the run or on the wing. Nevertheless, getting a hold of a given type of prey is a real problem that exists in nature outside of an intelligent mind. Insects do live under bark, getting ahold of them is a problem. It is by no means the only problem, and in most cases the organisms are not specifically directed to one particular problem above all others (although many organisms end up evolving to solve one particular problem in a given area, like finding food), but the problems are still there. The organisms may not be specifically trying to solve the problem, in most cases they are not even aware of it, and there is nothing making them orient themselves towards one problem except perhaps too much competition for another, but that does not mean the problem does not really exist.

I think this particular experiment would be more akin to an environment where an antibiotic starts off at low concentration and slowly increases in concentration in time with the evolution of more successful antibioitic resistance. Any bacteria that does not keep up is killed off. This sort of thing could happen if a bacteria evolves a new antibiotic and begins to slowly spread through the environment (this is apparently a major issue for soil-dwelling bacteria). The bacteria must get better and better at coping with the antibiotic or they will not survive. Shortening the code would be analagous to antibiotic-resistant bacteria competing with each other by developing more and more efficient antibiotic resistance genes that do not require as much resources or do not negatively impact the bacteria as much.

Circuits can be and are designed both by versions of the genetic algorithm and by conscious human design. While both kinds of solutions work, those produced by artificial natural selection tend to be more robust in actual use than those made rationally. Like living systems such as metabolic pathways, artificially evolved systems go on functioning even when some of their parts are damaged while designed systems tend to be much more brittle. Andreas Wagner discusses this constrast in his book Robustness and Evolvability in Natural Systems. It is the zillionth reason to believe that living things were produced by chance and selection rather than conscious design.

Jim Harrison wrote:

Circuits can be and are designed both by versions of the genetic algorithm and by conscious human design. While both kinds of solutions work, those produced by artificial natural selection tend to be more robust in actual use than those made rationally. Like living systems such as metabolic pathways, artificially evolved systems go on functioning even when some of their parts are damaged while designed systems tend to be much more brittle. Andreas Wagner discusses this contrast in his book Robustness and Evolvability in Natural Systems. It is the zillionth reason to believe that living things were produced by chance and selection rather than conscious design.

Give us a break, man. We’re only working with about three pounds of jello-like cerebral grey matter!

Maybe we should call these genetic algorithms intelligent and realize that we’ve already passed into Vernor Vinge’s technological singularity?

http://www-rohan.sdsu.edu/faculty/v[…]ularity.html

Henry J Wrote:

I wonder though if recipe might be a better analogy than computer code (and never mind that my idea of “cooking” is punching buttons on a microwave), since at least a large part of the function seems to be adding of “ingredients” when they’re called for.

I don’t think so. A recipe is in effect nothing but a (timer and event driven) sequential program processed by a cook, to wrap it in computer terms.

Now, after thinking it over, I think the best computer science analogy for a genome is hardware design. Actually that goes more into electrical engineering, but still. If you’ve ever dabbled in hardware design (the stuff you commonly write in Verilog or VHDL), you will know that it is very different from computer programs.

With that, a gene with its triggers has an equivalent in a logic gate: It has one or more inputs on which it constantly performs a logic operation which determines the output signal - a voltage in hardware, an RNA transcription in genes.

The main difference is that the output in hardware is very simple (one voltage representing “1” and one representing “0”) but the wiring is complex - each output must be connected to all required inputs and must not be connected to another output. In the genome the output is complex - a more or less long string of RNA - but the connectivity is trivial. After all, the whole genome is floating in the same contained liquid, everything can drift anywhere. It’s enough for a gene to have a trigger receptor for some molecule, it doesn’t need an explicit connection path from the source of those molecules. That makes the connection network very malleable compared to wire connected networks as in hardware.

The other main difference is of course that the genome isn’t a binary digital operation. A single gene viewed in isolation may be, but the connections between them aren’t and are probabilistic instead.

To take it further, the genome is equivalent to a chip. A chip has lots of interconnected logic gates, some input pins delivering signals from the outside to some of these gates and some output pins delivering the outputs of some gates to the outside world.

Rilke's Granddaughter Wrote:

The local optima; not the optima for the entire space.

Then where is global optimum of the fitness function defined by Eq. (16), if not at z = kt?

Just a quick reminder of what’s necessary for Darwinian evolution to take place:

PZ Myers Wrote:

1) Darwinian logic is quite simple and clear. Here’s a short summary:

* If heritable variation exists, (which, of course, it does) * if excess reproduction occurs, (also obviously true, or we’d be up to our ears in mice) * if variants differ in their likelihood of survival and reproduction, (a little trickier, but still fairly obvious) * then the relative frequencies of the variants must change.

Andreas,

Re “the genome is equivalent to a chip.” Except that evolution of the gene pool can rewire that “chip” a whole lot easier than a hardware chip can be rewired. ;)

Come to think of it, I guess some aspects of that “chip” would get rewired during development of the organism, as well.

Henry

The fact of the matter remains: Random mutation and natural selection as an explanation for all of life’s complexity, functionally integrated machinery, and information content is wishful speculation, unsupported by convincing hard evidence. This should simply be admitted.

Oh, don’t worry, Gil. In a week or so, Paul Nelson’s going to be presenting Ontogenetic Depth v 2.0 at the Society of Developmental Biology meeting, and I’m sure that will obliterate Darwinism, you know, like the Explanatory Filter did, and the NFL theorems, and your analogies to computers, and Irreducible Complexity, and Sal’s plane anecdotes, and the last 400-500 dumb things you guys have said, and Intelligent Evolution will in the future, &c, &c, &c.…

Dear Steve,

I appreciate your intellectually satisfying refutation of my thesis.

The fact of the matter remains: Random mutation and natural selection as an explanation for all of life’s complexity, functionally integrated machinery, and information content is wishful speculation, unsupported by convincing hard evidence.

Says you. (shrug)

Glad you simply admitted it.

And I look forward to all the analogies I’m sure you’ll present in the future, and the concomitant incredulity.

Gil Wrote:

The fact of the matter remains: Random mutation and natural selection as an explanation for all of life’s complexity, functionally integrated machinery, and information content is wishful speculation, unsupported by convincing hard evidence. This should simply be admitted.

False. In the first place, the theory of evolution is far more than ‘random mutation and natural selection’. In the second place, the Avida experiments (among others) demonstrate that you’re wrong. In the third place, PvM has demolished your contention that “Hello World” couldn’t be produced via variation and selection.

Either be sufficiently mature to admit that you’re wrong; or provide actual evidence that you are right. Your opinions do not an argument make.

I suppose this is old-fashioned of me, but

optimum == the best

hence, ‘global optimum’ is redundant whilst ‘local optima’ is at best confusing. For the latter, ‘local maxima’ is certainly to be preferred.

Erik Wrote:

Then where is global optimum of the fitness function defined by Eq. (16), if not at z = kt?

There isn’t one. There is an optimum at that particular node, because one can determine based on the fitness algorithm suggested/black boxed that such an optimum must exist.

Evolutionary fitness in the real world is determinable only from the actual factors that affect a specific individual member of a population.

jeannot Wrote:

To me, the use of an optimum in order to determine an adaptive landscape and/or simulate evolution is based on the assumptions: - that an optimum exists - that it can define alone the fitness of suboptimal replicators (individuals, alleles\x{2026}).

The first assumption is not related to the use of an optimum as reference point in order to compute fitness. An optimum exists for any reasonable fitness function. Thus, the first assumption is implicit in the use of a fitness function, regardless of how it is calculated.

The second assumption is not actually made. It is not an optimum genotype/phenotype, but rather the difference between it and a genotype/phenotype of interest, that is assumed to determine fitness. The number of free parameters in a genotype/phenotype is the same as the number of free parameters in the difference between the genotype/phenotype and an optimal genotype/phenotype. Therefore the assumption actually made is equivalent to assuming the existence of a fitness function in the first place.

Evolution by natural selection is not based on these assumptions, at least not in the books or papers I\x{2019}ve read.

It’s not clear what you mean by this, but “evolution by natural selection” is of course a much more general notion than any specific population genetics model. For example, the model in the Burger & Lande paper, cited above as a counter-example to the claims that fitness cannot “veridically” be computed in a way that references the global optimum, is only intended to capture some aspects of stabilizing and directional selection. It is not intended to capture, say, frequency dependent selection.

For instance, you want to simulate the evolution of beak size in a population of finches, by setting the optimum as the adapted size for consumption of the more frequent seeds: you make those two assumptions. But you\x{2019}re not sure the optimum is correct of even exists. As the population evolves, different (and extreme) beak sizes may be favored by instra-specific competition for resources (or whatever), resulting in an unstable polymorphism or even speciation. Can this be modeled with a program à la METHINKS?

Sure, in principle it can. You just need a more general form of the fitness function. In Dawkins’s METHINKS program the fitness is a function of a single variable, namely the genotype to be evaluated. In the case of your finches, one would probably need a fitness function that depends not only on the genotype to be evaluated, but also on the composition of the rest of the population. That would allow for frequency dependent selection.

David B. Benson Wrote:

I suppose this is old-fashioned of me, but

optimum == the best

hence, \x{2018}global optimum\x{2019} is redundant whilst \x{2018}local optima\x{2019} is at best confusing. For the latter, \x{2018}local maxima\x{2019} is certainly to be preferred.

In standard terminology, a function has a local maximum (minimum) at a particular point if it is the largest (smallest) function value within some local region around the point. A function has a global maximum (minimum) at a particular point if the value at that point is equal to the highest (lowest) value that function attains. A function has a global (local) optimum at a point if that point is either a global (local) minimum or a global (local) maximum.

Naturally, if something is a global optimum, then it is also a local optimum.

Erik Wrote:

Then where is global optimum of the fitness function defined by Eq. (16), if not at z = kt?

Rilke's Granddaughter Wrote:

There isn\x{2019}t one. There is an optimum at that particular node, because one can determine based on the fitness algorithm suggested/black boxed that such an optimum must exist.

Since z=kt is so obviously what in standard terminology would be called a “global optimum”, I assume that you have your own private terminology in which “global optimum” means something different. It would help if you explained how your private notion of a “global optimum” differs from the conventional meaning of the term.

Evolutionary fitness in the real world is determinable only from the actual factors that affect a specific individual member of a population.

Maybe what you want to say is that you believe that fitness depends on so many things that it isn’t possible in practice to accurately calculate the fitness of real-world genotypes/phenotypes without simulating the life-histories of their carriers?

I believe what he’s trying to say is that any perceived difference between the model and real life, no matter how trivial or irrelevant, will be seized upon as rhetorical evidence that evolutionary theory is false.

In previous comments by others, a few partly overlapping concerns about fitness functions have been suggested. I would identify and summarize them like this:

* The algorithm used to evaluate fitness matters in some important way. In particular, two algorithms that give identical fitness values need not be equally veridical.

* There is an important distinction between calculations of fitness that refer to a reference genotype/phenotype (typically the optimal one) and calculations of fitness that do not. The former kind is completely unrealistic.

* Fitness should not depend on genotypes/phenotypes that are not represented in the evolving population.

* Dawkins’s WEASEL program, while perhaps good for demonstrating the difference between cumulative selection and independent random sampling, is a prime example of the above mentioned objectionable ways of evaluating fitness.

I regard the first, second, and fourth of these concerns as wrong. The third I agree with provided that a proper interpretation of the word “depend” is made.

What is the function of a fitness function?

For the purposes of modelling evolution and, in particular, how genotype/phenotype frequencies change over time, one highly relevant type of quantity are measures of how much offspring are produced by carriers of a particular genotype/phenotype. The task done by a fitness function is to provide us with such a measure for every individual. By summarizing the entire life-histories of carriers of a genotype/phenotype in a single number—the fitness value—population geneticists and like-minded scientists can simplify their models, perhaps at the expense of some accuracy, by avoiding any explicit modelling of individuals’ lives.

One very direct way of calculating fitness values can of course be to nevertheless try to simulate the lives of individuals in a way that gives us much more than just a fitness value, e.g. in addition we might get some kind of “movie” showing us how the individual developed and/or competed with others. But such extra bonus information that result as a by-product of the fitness calculation is going to be ignored by the model. If one really wants to take life-histories into account, one should not model the dynamics via a fitness function.

Distance-to-optimum calculations of fitness

Dawkins’s WEASEL program is not the only case of a fitness calculation that refers to an optimal genotype/phenotype. A look the population genetics literature reveals that completely analogous fitness functions are not uncommon in that literature. Anyone who thinks that there is something objectionable in principle about Dawkins’s choice of fitness function is therefore put in the awkward position of having to advance the same objections against the works of many famous population geneticists. Here are two clear examples just to drive this point home:

“Fitness is taken to be determined entirely by Gaussian stabilizing viability selection on the phenotypic value of the trait. The relative fitness of individuals of genotypic value G arises from an average of viability over environmental effects (see e.g. Turelli, 1984 or Bulmer, 1989) and is given by

w(G) = exp[-(G - Zopt)^2 / (2 Vs)],

where Vs^{-1} (>0) is a direct measure of the intensity of selection on genotypic values of the trait and Zopt is the optimal phenotypic value (and also the optimal genotypic value).”

quoted from Y. Bello and D. Waxman, Near-periodic substitution and the genetic variance induced by environmental change, Journal of Theoretical Biology, 239(2):152-160, 2006 http://dx.doi.org/10.1016/j.jtbi.2005.08.044

“In Fisher’s and Kimura’s analyses, it was assumed that all traits are under stabilizing selection of identical intensity. In particular, it was assumed that the fitness of a phenotype is a monotonically decreasing function of its Euclidean distance from the optimal phenotype. Geometrically, this corresponds to a “fitness landscape” that is spherically symmetric. “Surfaces” of constant fitness are hyperspheres (i.e., circles when n = 2, spheres when n = 3, …) that are centered on the optimal phenotype. If we choose to measure each trait in such a way that its optimal value is 0, then the optimal phenotype will lie at the coordinate origin, z = 0 = (0,0,…,0). Fitness is then a function of ||z|| = (z_1^2+z_2^2+…+z_n^2)^{1/2}.”

quoted from D. Waxman and J.J. Welch, Fisher’s Microscope and Haldane’s Ellipse, American Naturalist, 166:447-457, 2005, http://www.lifesci.sussex.ac.uk/hom[…]croscope.pdf

One can have several reasonable concerns about the use of these specific fitness functions. For example, they are very idealized (the first takes into account only one trait while the second is very symmetrical). Another example is that we might prefer to think of what is “global” in the model as actually just a small part of some bigger real fitness landscape, most of which is left unmodeled. Or we might think that provided fitness functions model different parts of the fitness landscape with different accuracy.

However the proponents of the view that there’s something fundamentally objectionable about Dawkins’s WEASEL program might want to understand the fact that completely analogous fitness functions are often used in population genetics, it is inescapable that the objections to WEASEL are at least as applicable to population genetics.

There is also the problem that there are always many equivalent ways of expressing a fitness function, although some are easier for mere humans to analyze and handle than others. Some of these expressions will refer an optimum genotype/phenotype and some will not. As a simple example, consider quadratic stabilizing selection for a single trait z, which, with some suitable choice of units, can be written

w(z) = 1 - (z - 0.17)^2

When we write the function in this way it is obvious that z = 0.17 is the optimum and function is expressed in terms of the distance to this optimum. But expanding the squared parenthesis, we obtain this completely equivalent expression

w(z) = 0.9711 + 0.34 z - z^2.

In this form there is no longer a reference to the optimum (0.17).

“Depend”

Obviously, the fitness of a genotype/phenotype shouldn’t be influenced by things that don’t exist, in particular it shouldn’t be influenced by genotypes/phenotypes that aren’t carried by any individual in the population. Well, some reservations are needed because it is entirely legitimate to normalize the fitness scale taking a non-existent genotype/phenotype and declaring that “my system of units is such that this genotype/phenotype has fitness 1”. Apart from some reservations like that, it should be clear that fitness isn’t influenced by things that don’t exist.

But how is that desideratum expressed and verified mathematically? Is it enough to check if the expression used for calculating fitness refers to some potentially non-existent genotype/phenotype (such as the optimal one)?

The answer to the second of these questions is no.

I apologize in advance for what will surely turn out to be a totally clueless question:

since in reality organisms do not appear to be “assessed for fitness” by reference to an abstract Platonic ideal, isn’t there something fundamentally wrong with doing so in modelling?

Aureola Nominee Wrote:

since in reality organisms do not appear to be \x{201C}assessed for fitness\x{201D} by reference to an abstract Platonic ideal, isn\x{2019}t there something fundamentally wrong with doing so in modelling?

Firstly, what you looks to you like an abstract Platonic ideal might look to modellers like just a particularly convenient, but ultimately arbitrary, reference point. In many cases it is natural to assume that there is an optimum somewhere (fitness functions that lack optima are too pathological to be reasonable anyway). Having made that assumption, one can, without making any further assumptions about the fitness function, agree on the convention to measure certain quantities relative to the position of this optimum. Such practices are common in modelling (e.g. optima are popular expansion points in Taylor expansions, because the first-order term will then be zero), but to onlookers they might appear to reflect abstract Platonic ideals.

Secondly, what difference do you think it makes? How could it possibly be significant whether we use, say, the fitness function

w(z) = 1 - (z - 0.17)^2

or the completely equivalent fitness function

w(z) = 1 - (z - 100.27 + 100.1)^2 ?

Thirdly, computer calculations do many things that reality does not. How is that in any way significant? For example, reality surely does not determine the trajectory of a projectile by numerically integrating some mathematical model.

Thank you, Erik.

First, I don’t see those two equations as different; I see them as the same equation, written in two different ways. Therefore they both suffer from the same fundamental flaw (if it is a flaw), or neither does (if it isn’t).

Second, my point is that “reproductive fitness”, in my layman’s eyes, does not depend from how close or how far a given organism is from a theoretical optimum, because it is relative, not absolute.

In other words, if I have a population of organisms, the reproductive success of each individual depends on how much better or worse than the others it is, not on how much worse than the local optimum it is.

So, for instance, if I take the function you mention

w(z) = 1 - (z - 0.17)^2

and use it to calculate a relative fitness

w(z1) - w(z2) = (z2 - 0.17)^2 - (z1 - 0.17)^2

I obtain

w(z1) - w (z2) = (z2)^2 - (z1)^2 - (0.34 * (z2 - z1))

which seems to me to be a very different kettle of fish!

As I said, I do not presume to correct people who have devoted their careers to this stuff; but I would really like to understand why, instead of using relative fitness (which would avoid the whole problem of “comparing to a non-existing ideal”, we seem to be using absolute fitness. Where is my mistake?

P.S. Your remark on modelling trajectories seems to me not to address this aspect.

Erik Wrote:

It is not an optimum genotype/phenotype, but rather the difference between it and a genotype/phenotype of interest, that is assumed to determine fitness.

Sure. I didn’t mean that fitness was the optimum.

Erik Wrote:

You just need a more general form of the fitness function. In Dawkins’s METHINKS program the fitness is a function of a single variable, namely the genotype to be evaluated. In the case of your finches, one would probably need a fitness function that depends not only on the genotype to be evaluated, but also on the composition of the rest of the population. That would allow for frequency dependent selection.

Yes, but in my example, the ‘optimal beak size’ in the model doesn’t turn to be optimal at all. To me, this parameter is not the optimum, but the selective pressure at the beginning of the simulation (the size of the more frequent seeds). In fact, your model will involve local fitness calculations.

Aureola Nominee Wrote:

First, I don’t see those two equations as different; I see them as the same equation, written in two different ways. Therefore they both suffer from the same fundamental flaw (if it is a flaw), or neither does (if it isn’t).

Good.

So, for instance, if I take the function you mention

w(z) = 1 - (z - 0.17)^2

and use it to calculate a relative fitness

w(z1) - w(z2) = (z2 - 0.17)^2 - (z1 - 0.17)^2

I obtain

w(z1) - w (z2) = (z2)^2 - (z1)^2 - (0.34 * (z2 - z1))

which seems to me to be a very different kettle of fish!

The first expression for w(z1) - w(z2) does contain a reference to the global optimum (0.17). If rewriting w(z) cannot remove this flaw (if indeed it is a flaw), then why would rewriting be able to remove the same flaw in w(z1) - w(z2)?

As for the merits of using fitness differences like w(z1) - w(z2) instead of w(z), there’s no reason to refrain from doing that in those cases when it happens to simplify the treatment. But, being equivalent to the use of w(z), it cannot remove any flaws in w(z) (of course, I don’t agree that what others here have claimed as flaws really are flaws).

Mathematical equations … now my head hurts. Owwwwwwwwwww.

(grin)

Sorry, but I’ve always been mathematically-challenged. It’s why I was an English major and not a science major.

Erik:

My point is precisely that using the difference is not equivalent to using the absolute value. I’m not entirely convinced that the effects of modelling absolute fitness vs. relative fitness are negligible; however, not being a professional in this field, I’ll defer to expert opinion.

Aureola Nominee Wrote:

My point is precisely that using the difference is not equivalent to using the absolute value.

But the argument you advance in favour of this point doesn’t seem to be any more valid for fitness differences than for fitness.

I’m not entirely convinced that the effects of modelling absolute fitness vs. relative fitness are negligible; however, not being a professional in this field, I’ll defer to expert opinion.

OK. For the record, I am not myself an expert in this field. (But I have supplemented my own non-authoritative arguments by citing a few examples of experts who seem to have no trouble computing fitness by reference to global optima.)

About this Entry

This page contains a single entry by PvM published on June 10, 2006 10:59 PM.

Nick in New Mexico was the previous entry in this blog.

Laudan, demarcation and the vacuity of Intelligent design is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.361

Site Meter