Icons of ID: No Free Lunch Theorems

In this installment of Icons of ID I will explore the evolution of the reliance of Dembski’s ID arguments on the No Free Lunch Theorems. While originally Dembski was suggesting that the “No Free Lunch Theorems” play a major role in his arguments against Darwinian pathways, he seems to have changed his tune when it was pointed out that his application of these theorems was flawed or as Wolpert, who is one of the original authors of the these theorems mentioned “written in jello” and “fatally informal and imprecise”.

1999

Dembski referred to the results being essentially a corollary of the No Free Lunch Theorems

It follows that the vast majority of fitness functions on the phase space that coincide with our original fitness function on the target but reshuffle the function on the partition elements outside the target will not land the evolutionary algorithm in the target (this result is essentially a corollary of the No Free Lunch theorems by Wolpert and Macready). Simply put, the vast majority of fitness functions will not guide E into the target even if they coincide with our original fitness function on the target (see Appendix 8).

This result refutes the claim that evolutionary algorithms can generate specified complexity, for it means that they can yield specified complexity only if such algorithms along with their fitness functions are carefully adapted to the complex specified targets they are meant to attain. In other words, all the specified complexity we get out of an evolutionary algorithm has first to be put into the construction of the evolutionary algorithm and into the fitness function that guides the algorithm. Evolutionary algorithms therefore do not generate or create specified complexity, but merely harness already existing specified complexity. Like a bump under a rug, the specified complexity problem has been shifted around, but it has not been eliminated.

Why Evolutionary Algorithms Cannot Generate Specified Complexity, Nov 1999

Let’s look at the definition of corollary

An immediate consequence of a result already proved. Corollaries usually state more complicated theorems in a language simpler to use and apply

• A proposition that follows with little or no proof required from one already proven.
• A deduction or an inference.
• A natural consequence or effect; a result.

2001

Let’s look at the introduction of Dembski’s book “No Free Lunch”. Dembski not only introduced the book but also the origin of the title of the book as well as the relevance of these theorems to his arguments.

The title of this book, No Free Lunch, refers to a collection of mathematical theorems proved in the past five years about evolutionary algorithms. The upshot of these theorems is that evolutionary algorithms, far from being universal problem solvers, are in fact quite limited problem solvers that depend crucially on additional information not inherent in the algorithms before they are able to solve any interesting problems. This additional information needs to be carefully specified and fine-tuned, and such specification and fine-tuning is always thoroughly teleological. Consequently, evolutionary algorithms are incapable of providing a computational justification for the Darwinian mechanism of natural selection and random variation as the primary creative force in biology. The subtitle, Why Specified Complexity Cannot Be Purchased without Intelligence, refers to that form of information, known as specified complexity or complex specified information, that is increasingly coming to be regarded as a reliable empirical marker of purpose, intelligence, and design.

and

The Design Inference laid the groundwork. This book demonstrates the inadequacy of the Darwinian mechanism to generate specified complexity. Darwinists themselves have made possible such a refutation. By assimilating the Darwinian mechanism to evolutionary algorithms, they have invited a mathematical assessment of the power of the Darwinian mechanism to generate life’s diversity. Such an assessment, begun with the No Free Lunch theorems of David Wolpert and William Macready (see section 4.6), will in this book be taken to its logical conclusion. The conclusion is that Darwinian mechanisms of any kind, whether in nature or in silico, are in principle incapable of generating specified complexity. Coupled with the growing evidence in cosmology and biology that nature is chock-full of specified complexity (cf. the fine-tuning of cosmological constants and the irreducible complexity of biochemical systems), this conclusion implies that naturalistic explanations are incomplete and that design constitutes a legitimate and fundamental mode of scientific explanation.

Since then various authors have pointed out the flaws in Dembski’s arguments and his reliance on the “No Free Lunch Theorems”, starting with Wesley Elsberry, Richard Wein, Mark Perakh and culminating with Wolpert, one of the original authors of these theorems.

Wesley Elsberry wrote on June 2001 on Talk.Origins

It’s my opinion that Dembski misconstrues or misunderstands what the NFL theorems say. I’ve passed word along that Dembski’s choice of “No Free Lunch” for the title of a book that is due out this fall sets him up for embarrassment. That’s still the title, so far as I know. It will be interesting to see how the reviews turn out.

2002

Now compare this with Dembski’s reversal

Given my title, it’s not surprising that critics see my book No Free Lunch as depending crucially on the No Free Lunch theorems of Wolpert and Macready. But in fact, my key point concerns displacement, and the NFL theorems merely exemplify one instance (not the general case). The basic idea behind displacement is this: Suppose you need to search a space of possibilities. The space is so large and the possibilities individually so improbable that an exhaustive search is not feasible and a random search is highly unlikely to conclude the search successfully. As a consequence, you need some constraints on the search – some information to help guide the search to a solution (think of an Easter egg hunt where you either have to go it cold or where someone guides you by saying “warm” and “warmer”). All such information that assists your search, however, resides in a search space of its own – an informational space. So the search of the original space gets displaced to a search of an informational space in which the crucial information that constrains the search of the original space resides. I then argue that this higher-order informational space (“higher” with respect to the original search space) is always at least as big and hard to search as the original space

Evolution’s Logic of Credulity: An Unfettered Response to Allen Orr by William Dembski

Or let’s look at the book description on ARN

But by employing powerful recent results from the No Free Lunch Theory, Dembski addresses and decisively refutes such claims.

No Free Lunch announcement on ARN

Or Koons’s review of “No Free Lunch”

Chapter 4, on evolutionary algorithms, is in some ways the crux of the book. It is here that Dembski discusses the No Free Lunch theorems that give the book its title. Dembski demonstrates that so-called “evolutionary” (or trial-and-error) algorithms cannot generate CSI. Instead the problem of generating CSI is simply pushed back from a high CSI result to a very high CSI fitness function. A fitness function is a function that differentially “rewards” or “selects” different trials, whether these trials be algorithms, behavior patterns, neural-net configurations, or biological genotypes. To solve a problem using an evolutionary algorithm, the designer must find a fitness function that can enable the sort of evolution likely to lead to an acceptable solution to the problem. Where there is a large amount of CSI required to solve a problem directly, the NFL theorems entail that an equally large quantity of CSI is required to find a suitable fitness function. I hope that committed Darwinists will accept this brilliant and illuminating analysis. Even if you are convinced that Dembski is completely wrong about Darwinism, he has made an indispensable contribution to the development of the information-theoretic analysis of evolution. I hope that relatively few commit the genetic fallacy of rejecting Dembski’s work on the grounds that he holds “scientifically incorrect” opinions on other maters.

Koons reviews NFL

Or from the Conservative BookClub, we find What are the “No Free Lunch” theorems, and how do they undercut Darwinian theory and support Intelligent Design?

35. Displacement and the No Free Lunch Principle How do the No Free Lunch theorems undercut Darwinian theory and support intelligent design?

On ISCID Iain Strachan

Many critics of Dembski’s use of the “No Free Lunch” theorems take issue with it by pointing out that these theorems are based on averaging over all possible fitness landscapes, the vast majority of which are completely irrelevant, and represent problems that are not interesting to solve, because the fitness landscape looks like a random array of spikes.

I would have to say that I also share this worry. I was aware of the No Free Lunch theorems long before I was aware that Dembski had written a book with the same name and I initially wondered if this might provide a theoretical underpinning to the idea that design cannot be achieved without intelligent input. But then precisely the same objection occurred to me (that the generality of the result severely limited its relevance to practical problem solving), and I felt that one should be cautious in citing the NFL theorems to justify the design hypothesis.

It may be that I have misunderstood Dembski’s argument, and I would certainly like to see him respond to this criticism.

Dembski did not respond on this thread. Nor on this thread where Iain raised the same issue.

Information Theory, Evolutionary Computation, and Dembski’s Complex Specified Information by Wesley Elsberry and Jeffrey Shallit

Dembski mistakenly invokes the “No Free Lunch” theorems of [95] as justification for his view that evolutionary computation cannot generate CSI. The NFL theorems compare efficiency between algorithms and characterize averaged performance over all cost functions, but Dembski’s claim is not about average performance: Dembski makes the universal claim that for all evolutionary computation, no instance of CSI can be attributed to any such computation. Dembski’s initial summary of the NFL theorems characterize them as establishing a problem of “regress” to a “higher-order phase space.” This does not appear to reflect what is actually discussed in the papers Dembski cites on NFL. (Search for “regress” in the text of “No Free Lunch theorems for search”, “No Free Lunch theorems for optimization”, and “On the futility of blind search” comes up empty.) The point made by Wolpert and Macready is not that a search of the space of fitness functions must be conducted, but rather that one must examine the fitness function of interest in order to select an algorithm which will perform more efficiently given that fitness function. This is a far cry from Dembski’s assertion of some “regress” in finding the source of information seen in the output of an algorithm.

Wesley Elsberry collected a number of good essays

Orr on Boston Review: Book Review: No Free Lunch

Dembski responds to Orr Sheer versus Real Possibilities

While essentially Orr’s critical remarks are well taken, their formulation was insufficiently careful from a mathematician’s viewpoint so it may create a distorted impression of what Orr had in mind. Indeed, in a private communication Orr, who is a biologist, provided an explanation of his actual views on the subject of the NFL theorems’ application to Darwinian evolution, and these views are essentially in accord with a proper interpretation of the NFL theorems. Since, however, the rendition of that problem in Orr’s review of Dembski’s book may be confusing for unprepared readers, I will try to clarify the problem by first explaining in way as simple as possible the NFL theorems, and second, briefly discussing their misapplication by Dembski.

Yin-Yang: No-Free-Lunch Theorems for Search at the 1995 International Conference on Genetic Algorithms (ICGA-95)

Recent Results on No-Free-Lunch Theorems for Optimization by Christian Igel, Marc Toussaint

On Classes of Functions for which No Free Lunch Results Hold by Christian Igel, Marc Toussaint

In a recent paper it was shown that No Free Lunch results hold for any subset F of the set of all possible functions from a finite set X to a finite set Y iff F is closed under permutation of X. In this article, we prove that the number of those subsets can be neglected compared to the overall number of possible subsets. Further, we present some arguments why problem classes relevant in practice are not likely to be closed under permutation.

No Free Lunch Theorems for Search (1995) by David H. Wolpert, William G. Macready

No Free Lunch Theorems for Optimization (1996) by David H. Wolpert, William G. Macready

No Free Lunch Theorems: Many good links by Martin Sewell

Critiques and reviews of the works of William Dembski at infidels.org

NO FREE LUNCH: Why Specified Complexity Cannot Be Purchased without Intelligence From the Book – WATCH FOR IT!

I should serve you notice that “No Free Lunch” will prove a bad title for Bill’s new book. Question: Did Dembski confer with David Wolpert over his deployment of NFL principles in “Can Evolutionary Algorithms Generate Specified Complexity?”? I already know that the answer is, “No.” I’ve tried to point out on at least two prior occasions that Bill’s use of NFL has major problems. Bill is entitled to ignore my opinions on NFL, but there are others who might be less easily dismissed when they weigh in. If Bill insists on going so far as putting Wolpert and Macready’s theorem on the cover of his book, I think that there is going to be a high likelihood of major embarrassment over that. Fair warning.

Source: email from me to Paul Nelson, October 4, 2000

Perhaps it is appropriate to point out that the anthology titled Why Intelligent Design Fails: A Scientific Critique of the New Creationism (edited by Matt Young and Taner Edis) which is coming this month from Rutgers University Press, contains, among other chapters, my chapter titled There Is a Free Lunch After All: Dembski’s Wrong Answers to Irrelevant Questions. This chapter is mainly about Dembski’s misuse of the No Free Lunch theorems and his so called displacement problem.

It looks like changing the tune after having encountered critique (especially from Wolpert, against whom Dembski had no chance as far as the NFL theorems were in question)Dembski made a very awkward attempt to alleviate embarrassment at being rebuffed by the author of the theorems in question - he tried to play down the role of the NFL theorems in his theory. Instead, he stressed the significance of what he calls displacement problem. Unfortunately for Bill Dembski, his appeal to the displacement problem has not improved his position at all but rather should add more embarrassment to him as the problem in point is a phantom, as discussed in detail in the above mentioned chapter.

Koons’s accolade and his cry to Darwinists to accept Dembski’s allegedly brilliant and illuminating analysis is especially funny given Koons’s evident lack of familiarity with information theory and with the essence of the NFL theorems: in a letter to Science Insights (the organ of the National Association of Scholars -see vol. 7, No 5, 2003, at www.nas.org ) Koons wrote that the NFL theorems are part of information theory! This is news for all those having a minimal knowledge of either information theory or optimization theory.

Koons is in general a good comedian. In a blurb to Dembski’s Intelligent Design he wrote that Dembski’s law of conservation of information was a great breakthrough in information theory. However, after Dembaki alleged law was substantially dismantled by a number of critics, in his (more recent) mentioned letter to Science Insights Koons asserted that Dembski in fact did not suggest any new law but only showed how to apply the NFL theorems to biology, etc. He seemed not to realize that Dembski’s pseudo-law and his attempts to apply the NFL theorems stand in no relation to each other. What a circus! With friends like Koons Dembski needs no adversaries.

Hey, Dr. Bill: What about admitting at least a single error for change? After all, you have a place to go - the barbecue stand in Riesel Tx is not a bad backup, is it.

Particularly ironic is the following statement by Dembski

Perhaps most significantly, however, we can admit our mistakes and receive instruction. The evolutionists cannot. Indeed, the moment they admit that we might have a point, they let the genie out of the bottle. To open evolutionary theory to critical scrutiny would destroy their monopoly over the study of biological origins.

DEALING WITH THE BACKLASH AGAINST INTELLIGENT DESIGN

To open ID arguments to self correction, to admit errors, would let the genie out of the bottle. In fact, I would argue that the reluctance to do so can be well understood in terms of ID not being interested in scientific contributions but more in socio-political progress. As such one cannot disappoint one’s supporters which distractions such as admission of error.

In 1996 I showed that NFL is a symptom of conservation of information in search. Repeating a quote of Dembski above:

The upshot of these theorems is that evolutionary algorithms, far from being universal problem solvers, are in fact quite limited problem solvers that depend crucially on additional information not inherent in the algorithms before they are able to solve any interesting problems. This additional information needs to be carefully specified and fine-tuned, and such specification and fine-tuning is always thoroughly teleological.

Under the theorems’ assumption of a uniform distribution of problems, an uninformed optimizer is optimal. To be 99.99% sure of getting a solution better than 99.999% of all candidate solutions, it suffices to draw a uniform sample of just 921,029 solutions. Optimization is a benign problem with rare instances that are hard. Dembski increases the incidence of difficult instances by stipulating “interesting problems.” At that point it is no longer clear which NFL theorems he believes apply. Incidentally, an optimizer cannot tune itself to the problem instance while solving it, but its parameters can be tuned to the problem distribution from run to run. It is possible to automate adaptation of an optimizer to the problem distribution without teleology.

Iain Strachan (quoted above):

Many critics of Dembski’s use of the “No Free Lunch” theorems take issue with it by pointing out that these theorems are based on averaging over all possible fitness landscapes, the vast majority of which are completely irrelevant, and represent problems that are not interesting to solve, because the fitness landscape looks like a random array of spikes.

I hope critics will drop the averaging complaint. The proof of the main theorem demonstrates identical distributions which have, of course, identical means. Those uninteresting instances are algorithmically random, and for all but toy problems are too big to fit in the world. Since 1995 some have said that this suggests that NFL does not hold in the world, and I recently proved it. But it remains possible that the “real-world” distribution approximates an NFL distribution closely.

Tom English wrote

In 1996 I showed that NFL is a symptom of conservation of information in search.

and

Since 1995 some have said that this suggests that NFL does not hold in the world, and I recently proved it. But it remains possible that the ?real-world? distribution approximates an NFL distribution closely.

RBH

Never mind. Found ‘em.

RBH

All,

Sorry for the duplicate post. My browser said the connection had timed out on the first attempt, and I didn’t realize the post might have succeeded.

RBH,

I have some papers, c.v., etc. on my home page, http://www.TomEnglishProject.com. Here’s the one where I obtain NFL in terms of conservation of information:

English, T. M. “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch,” in Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming, pp. 163-169, 1996.

The following shows that NFL does not hold in the world, and treats search algorithms as operators on probability distributions on problems. The geometry of search in the space of distributions is quite interesting.

English, T. “No More Lunch: Analysis of Sequential Search,” to appear.

Tom English:

Thanks for your comment and the reference to your articles. A few questions:

1. I am confused by the dates you indicated. As far as I am aware, Wolpert’s NFL theorems for supervised learning appeared in 1996, and Wolpert-Macready’s NFL theorems for search, in 1997, but you refer to 1995 and 1996 in regard to the theorems for search.

2. Have you communicated with Dave Wolpert regarding your proof that the NFL theorems are invalid for the real world problems, and if you did, what his reaction was? (I believe, Wolpert and Macready themselves admitted that their theorems are not very instrumental for solving real world problems, but your approach seems to be even more radical, is it?)

3. Regardless of what Wolpert or other experts may think of it, your discourse is beyond of what non-experts in optimization theory can comprehend. Therefore, IMHO, referring to the averaging as a point negating Dembski’s use of the NFL theorems as alegedly proving the impossibility of evolution (which can be easily understood by laymen) seems to be justified. I discussed this point with Dave Wolpert and he did not offer objections to the use of the averaging argument. Cheers, Mark

More on references

NFL might have become something useless to Dembski if we had been more open to an unusual perspective. The first discussion of algorithmic complexity and NFL came soon after Wolpert and Macready released their tech report. Search for Kihong Park in the 1995 archive at http://www.aic.nrl.navy.mil/galist/. His comments were outstanding, but Wolpert dismissed them, and I was scurrying too much to contemplate learning new math.

I don’t know a publication that clearly justifies the claim that run-to-run optimizer adaptation can be immunized against teleology. But consider uniformly sampling the space of optimizer parameters and evaluating the different parameter vectors on a sample of instances drawn according to the problem distribution. The problem distribution is given, and the sample of possible parameter vectors is uninformed, so there can be no teleology. Dembski would claim, perhaps, that it would take a great deal of time to get good results with this approach, but I predict otherwise. To be 99% sure of obtaining a parameter vector with performance ranked in the top percentile requires only a sample of 458 vectors. Massively parallel computers support simultaneous evaluation of many more vectors than that. Thus it may take no longer to discover good optimizer parameters than to evaluate one instance of the optimizer.

Is this invocation of brute-force parallel computation cheating? I’m no biologist, but I hear that nature is plagued with it.

Hi, Mark.

1. There has been a lot of confusion as to where ideas originated. Your timeline should begin with

C. Schaffer. “A conservation law for generalization performance,” in Proc. of the 1994 Int’l Conference on Machine Learning, San Mateo, Ca.: Morgan Kaufmann, 1994.

In January 1995, Wolpert and Macready disseminated over the Internet a tech report proving a conservation law for optimizer performance. But they referred to “no free lunch.” They knew of Schaffer’s work when they prepared their 1997 TEC article, but said nothing of it. They also did not cite any of the NFL work that had been published in the two years following the tech report. Thus most people are unaware that researchers other than Wolpert and Macready did substantive work prior to 1997. I supplied my 1996 paper to Bill Macready, and you will find my sufficient condition for NFL in the 1997 article.

2. No contact with Wolpert. But when he attended a 2001 talk in which I emphasized algorithmic complexity, he criticized the proliferation of complexity measures. Regarding real-world applicability, I wrote in my 1996 paper:

Do the arguments of this paper contradict the evidence of remarkable adaptive mechanisms in biota? The question is meaningful only if one regards evolutionary adaptation as function optimization. Unfortunately, that model has not been validated. It is well known that biota are components of complex, dynamical ecosystems. Adaptive forces can change rapidly and nonlinearly, due in part to the fact that evolutionary adaptation is itself ecological change. In terms of function optimization, evaluation of points changes the fitness function. The Conservation Lemma [from which NFL derives] clearly does not apply to such a process.

Wolpert writes in his 2003 review of Dembski’s NFL book:

Indeed, throughout there is a marked elision of the formal details of the biological processes under consideration. Perhaps the most glaring example of this is that neo-Darwinian evolution of ecosystems does not involve a set of genomes all searching the same, fixed fitness function, the situation considered by the NFL theorems. Rather it is a co-evolutionary process. Roughly speaking, as each genome changes from one generation to the next, it modifies the surfaces that the other genomes are searching. And recent results indicate that NFL results do not hold in co-evolution.

I’d say we agree.

3. I think you need a loose rendition of an unexpurgated theorem.

To permute a test function means to shuffle the values associated with domain elements.

In a block uniform distribution of test functions, each function has the same probability as all of its permutations.

A search algorithm evaluates all points in the domain of a given test function exactly once, and outputs the sequence of observed values.

Theorem. The distribution of value sequences is identical for all search algorithms if and only if the distribution of test functions is block uniform.

A simpler characterization is that all search algorithms have identically distributed behavior. If there is no statistical distinction in behavior there is no statistical distinction in performance (or mean performance) by any measure.

I would like to make this more intuitive. Tell me what you think.

Mark,

Most people find my theoretical writing difficult, and I am here because I want to explain things in plain language. If my replies are too detailed, give me a nudge in the ribs.

I suppose you have “practical significance” with respect to biological evolution in mind. In nature the fitness function is nonstationary, the search is random, points are revisited, and the domain of fitness functions cannot be defined as a finite set. These are four substantial violations of the suppositions of the original NFL theorems.

I don’t have a coevolution proof. And precisely what to prove is less than obvious. I will give it some thought. My NFL theorem applies to infinitely many distributions of test functions, while Wolpert and Macready’s applies to just one. Mine refers simply to sequences of observed values, while theirs refers to histograms of values. Sequences are simpler for various reasons, and some important aspects of NFL are easy to miss when working with histograms. As you say, “although identical distributions translate also into identical means, the opposite is not necessarily true.”

Would someone mind explaining to me (here or with a link) what the NFL theorems actually state? Specifically the if…then part.

Would someone mind explaining to me (here or with a link) what the NFL theorems actually state?

FWIW, you can find my layperson’s explanation of Wolpert & Macready’s theorem at http://www.talkorigins.org/design/faqs/nfl/#nflt

Tom English sums it up like this: “[Wolpert & Macready’s] main theorem states, loosely, that the average performance of all optimizers is identical if the distribution of functions is average.” (http://www.tomenglishproject.com/EP96.pdf)

I think it can also be crudely explained like this: in a purely random fitness landscape it doesn’t matter how you search, because any point is as likely to be a good one as any other.

Tom, I’m having difficulty understanding your paper “Evaluation of Evolutionary and Genetic Optimizers: No Free Lunch” (http://www.tomenglishproject.com/EP96.pdf). I can’t even work out where you actually state your NFL theorem! Could you please help.

Richard,

I don’t think that’s particular crude. In 1994 I served on the thesis committee of a student addressing function optimization. In his defense he said the ultimate goal was to obtain an optimizer that was superior for all functions, and I pointed out that optimizing a uniformly drawn function was equivalent to “optimizing” the value of a uniform random variable. In the following year, a student who had attended the defense learned of the NFL tech report before I did, and brought me a copy.

Various people understood that no optimizer could be generally superior on “purely random” fitness landscapes prior to 1995. Wolpert and Macready were the first to work out the details, and they did so extremely well, though not without errors.

Something I should have said earlier, and that fits reasonably well here, is that the main NFL results directly address neither evolutionary computation nor biological evolution. Take away the flamboyant attack on fuzzy thinking in EC, and what remains is an outstanding treatment of deterministic sequential search. To debunk the notion that an evolution-inspired algorithm could be generally superior to others, Wolpert and Macready reason about a deterministic sequential simulation of the algorithm. Unfortunately, most of the NFL results cannot be related to EC, and certainly not to evolution, through simulation.

Hi, Richard.

The interesting thing about it, from the “Icons of ID” perspective, is that the central result is the Conservation Lemma, in which I show that information is conserved in search. NFL follows easily from that. Incidentally, that is the first theoretical paper I ever wrote, so forgive the rough spots.

I wonder if there is a connection between conservation of information in my sense and conservation of complex specified information. I tried to grasp an answer for myself, but the Jello went squish. I would love to hear what someone who has gotten information through a side channel has to say.

Tom

Richard Wein Wrote:

I think it can also be crudely explained like this: in a purely random fitness landscape it doesn’t matter how you search, because any point is as likely to be a good one as any other.

My take on the crude explanation would be a little different: (1) random fitness landscapes are the most common sort and (2) in a random fitness landscape it doesn’t matter how you search, because each sampled point provides no useful information concerning other points.

Search algorithms attempt to choose points to sample based on expectations derived from previously sampled points. If one cannot derive an expectation from the previously sampled points, then any search algorithm cannot do better than blind search.

Hi, Richard.

The interesting thing about it, from the “Icons of ID” perspective, is that the central result is the Conservation Lemma, in which I show that information is conserved in search. NFL follows easily from that. Incidentally, that is the first theoretical paper I ever wrote, so forgive the rough spots.

I wonder if there is a connection between conservation of information in my sense and conservation of complex specified information. I tried to grasp an answer for myself, but the Jello went squish. I would love to hear what someone who has gotten information through a side channel has to say.

Tom

(replying to the bit I bolded)

Hi Tom,

As far as I can tell, Dembski has several importantly different definitions of “complex specified information”/”specified complexity” and alternates between them regularly depending on what is convenient. You might be interested in this page on Definitional Complexity.

This is part of a more general problem, which is that creationists have no decent definition of “information” in biology. When they claim that “evolution can’t produce new information” they generally mean things like new genes. However, “information” in this colloquial sense is easy to evolve (duplication and divergence of genes under natural selection), so they have to come up with something abstruse and intimidating to make this claim appear defensible. This results in massive confusion (which often appears to be the goal) since there is no very useful correspondance between information in “new genes” sense and the various definitions of information in the information-theory senses (IMO). Dembski’s “complex specified information” is a case in point.

Wes,

In theory, almost all fitness landscapes are random. This means they are incompressible, and too big to occur in practice.

If one cannot derive an expectation from the previously sampled points, then any search algorithm cannot do better than blind search.

I’ll take that a step further. There may be NFL even if the random fitness values are mutually informative. Put simply, whatever the observed values say about one unobserved value, they say about all unobserved values. Blind search is still optimal, because there is never a basis for choosing one point over another.

Hello, Nick.

As far as I can tell, Dembski has several importantly different definitions of “complex specified information”/”specified complexity” and alternates between them regularly depending on what is convenient.� You might be interested in this page on Definitional Complexity.

Dembski gets my vote for all-time master of equivocation. Enjoyed “definitional complexity,” thanks.

When they claim that “evolution can’t produce new information” they generally mean things like new genes.� However, “information” in this colloquial sense is easy to evolve (duplication and divergence of genes under natural selection), so they have to come up with something abstruse and intimidating to make this claim appear defensible.

This is a topic that interests me greatly, and I will discuss recent work some other time. For now, I’ll say that “definitional complexity” means that the preacher is sure to keep the choir and most of the congregation. Biology needs compelling demonstrations for everyone else, and I think evolutionary computation can help with this.

Back to the topic: Gene duplication constitutes a violation of one of the assumptions (fixed finite search space) of NFL, and my proof that information is conserved in search goes poof. And that is as things should be, for it is trivial to demonstrate information gain in a population of genomes with gene duplication and divergence.

Let me see if I have this right.

If the search space is random, then no specific optimizer can do better on average than a random search can.

How does this pertain to optimizing strategies? In other words, instead of picking a specific optimizer, pick a strategy and tune that strategy to each search space encountered. I think this has been answered, but I want to make sure.

Speaking of information gain, I got around to reading Kimura’s paper on it about a month ago. It really is an elegant proof.

Under no selection the probability that a new allele goes to fixation is 1/N. If instead, that allele is selected for the probability that it goes to fixation is about 1. Therefore, the “information” gained through selection is equivilent to the uncertianity lost: H(1/N)-H(1) = -log(1/N)/N = log(N)/N. This holds better for larger N.

Reed,

If you present a test function to a search algorithm and it responds with the sequence of values it observed, there are many ways to measure the quality of the sequence. What we consider the search to be doing is a matter of how quality is measured. For instance, in optimization the objective is sometimes to locate the minimum of the function with as few evaluations as possible. The quality measure, then, is the index of the first occurrence of a minimum in the sequence of observed values.

When there is NFL in search, all search algorithms are indistinguishable by all measures of quality, including measures of optimization quality. That is a very strong statement. There are some non-NFL distributions of test functions for which all search algorithms are indistinguishable under some measure of quality.

Now it happens that there is serious misunderstanding of optimization and NFL. Wolpert and Macready make many comments about optimization speed in their 1997 article, but their theorems do not support that. There is no way to look at a sequence of observed values and tell how long it took the search algorithm to generate it. A fast algorithm and a slow one can visit the domain points in the same order, giving the same value sequence. Optimization quality can be a count of function evaluations, as I illustrated above, but optimization speed cannot be made into a quality of the value sequence.

Wolpert and Macready seem to make counts into speed by assuming that the running time of all search algorithms is linearly proportional to the number of function evaluations. But almost all search algorithms running in linear time are too big to exist in the world. Most of those that are impractically large are not functionally equivalent to smaller algorithms (i.e., they are incompressible). Some are functionally equivalent to smaller, slower search algorithms. The slower algorithm not only does the search, but spends time decompressing itself.

Thus there are intrinsically slower search algorithms. This means that counting function evaluations is a bad way to measure time. The NFL reasoning of Wolpert and Macready does not lead to the conclusion that all algorithms have identical average optimization speed. On the contrary, no algorithm has better average optimization speed than blind search, but some have worse. You may get the impression from some of Wolpert and Macready’s writing that everything “balances out” but their argument is asymmetric, and is very much geared toward debunking naive beliefs about evolutionary computation.

The situation with NFL and optimization is much more complicated than Dembski seems to have realized.

Tom English Wrote:

Under the theorems’ assumption of a uniform distribution of problems, an uninformed optimizer is optimal. To be 99.99% sure of getting a solution better than 99.999% of all candidate solutions, it suffices to draw a uniform sample of just 921,029 solutions. Optimization is a benign problem with rare instances that are hard.

This is a very important point that it won’t hurt to repeat again. Long ago, I tried to call attention to this fact by, among other things, referring to English’s papers.

Tom English Wrote:

I hope critics will drop the averaging complaint. The proof of the main theorem demonstrates identical distributions which have, of course, identical means.

The uniform averaging over literally all cost functions works to the critic’s advantage, since it makes optimization benign. Nonetheless, we should complain about it, because it is highly unrealistic (and it was not intended to be realistic).

Tom English Wrote:

In January 1995, Wolpert and Macready disseminated over the Internet a tech report proving a conservation law for optimizer performance. But they referred to “no free lunch.” They knew of Schaffer’s work when they prepared their 1997 TEC article, but said nothing of it. They also did not cite any of the NFL work that had been published in the two years following the tech report. Thus most people are unaware that researchers other than Wolpert and Macready did substantive work prior to 1997. I supplied my 1996 paper to Bill Macready, and you will find my sufficient condition for NFL in the 1997 article.

For the record, Wolpert did cite work by Cullen Schaffer in his article on the lack of (and existence of) a priori distinction between learning algorithms – the NFL theorem for learning/inference. I have not read Schaffer’s article, but perhaps it’s emphasis is on learning/inference rather than optimization. In any case, Wolpert & Macready also did not cite the previous work by Wolpert in their 1997 article.

Tom English Wrote:

I suppose you have “practical significance” with respect to biological evolution in mind. In nature the fitness function is nonstationary, the search is random, points are revisited, and the domain of fitness functions cannot be defined as a finite set. These are four substantial violations of the suppositions of the original NFL theorems.

Wolpert & Macready’s 1997 article contains a NFL theorem for time-dependent cost functions and the domain of a continuous cost function can be binned into, say, 10^100000 bins and replaced by the (finite) set of center points of the bins. The NFL theorem is also not restricted to non-random search. Thus, of the four objections listed, only the revisiting of points is a valid objection (albeit not one I would recommend).

The main problem with using the NFL theorem as an argument against evolution is that there is no obvious cost function. The evolutionary process may be thought of as local optimization of a (changing) function that measures the reproductive success, but this function is not what interests creationists. Creationists are more interested in functions that measure the impressiveness of biological features. Such a function depends on peculiarities of the human mind and is not intrinsic to the evolutionary process. Once such a function has been defined, it remains to justify why the optimization (by the evolutionary process) of this function should be considered as a NFL scenario and why the impressiveness observed in real-world biology is different than is expected from the NFL scenario. This has never been done and until someone provides these justifications the invocation of the NFL theorem as an argument against evolution is unsubstantiated.

Unless we specify which quantity we consider as the quantity being optimized, there can be no reason to accept the NFL scenario nor to accept that observed values are unexpected w.r.t. the NFL scenario in a specific case.

Tom English Wrote:

Back to the topic: Gene duplication constitutes a violation of one of the assumptions (fixed finite search space) of NFL, and my proof that information is conserved in search goes poof. And that is as things should be, for it is trivial to demonstrate information gain in a population of genomes with gene duplication and divergence.

Why won’t the set of all genomes of length less than 10^1000000 suffice to treat gene duplication? It is finite and since real-world genomes will never attain the maximum length we can formally stipulate that (e.g.) genomes which are made longer than the maximum length by gene duplication are truncated.

Reed A. Cartwright Wrote:

Under no selection the probability that a new allele goes to fixation is 1/N. If instead, that allele is selected for the probability that it goes to fixation is about 1. Therefore, the “information” gained through selection is equivilent to the uncertianity lost: H(1/N)-H(1) = -log(1/N)/N = log(N)/N. This holds better for larger N.

“Information” in Shannon’s sense is always associated with a stochastic variable – the “information” could take any value depending on which stochastic variable we study, which is why it is important to specify exactly what the “information” is about. Which stochastic variable do you have in mind here? What are it’s possible values? Does the probability distribution over the possible values have an empirical interpretation, an epistemological interpretation, or is it a purely formal invention?

Also, I wonder if you didn’t mean to write H(1/N) = log(N) instead of H(1/N) = log(N) / N.

Erik Wrote:

“Information” in Shannon’s sense is always associated with a stochastic variable — the “information” could take any value depending on which stochastic variable we study, which is why it is important to specify exactly what the “information” is about. Which stochastic variable do you have in mind here?

The probability of the substitution of a new allele.

Also, I wonder if you didn’t mean to write H(1/N) = log(N) instead of H(1/N) = log(N) / N.

No, since H(p)= -p * log(p), H(1/N) = -1/N * log(1/N) = log(N)/N.

Tom English: you have shed a new light on Wolpert-Macready’s paper, and I need some time to digest it. Until now, I had some general idea about the limitations of the NFL theorems, but now you have provided some specifics which I would like to think about. I still think it’d be interesting if you discussed it with David. As to Dembski, there is not much to discuss as he is confused on an elementary level despite his PhD in math. Cheers, Mark

It looks like Igel and Toissant have proven that for the overwhelming majority of fitness functions encountered in the real world the NFL theorems do not hold. Ha! I am really curious what Wolpert thinks about it. I’ll ask him. Most of the other links given in the post that started this thread do not work.

Hello, Erik.

I have characterized the 1997 TEC article as generally outstanding. But authors, reviewers, associate editors, editors-in-chief, and, therefore, journals are not perfect. The paper is very slim on references, and I do not know why. Some of my results appear without citation, and I do not know why. The extension of the formal results to optimization speed is incorrect in some regards. Yet I am not attempting to malign the authors or anyone at TEC.

The uniform averaging over literally all cost functions works to the critic’s advantage, since it makes optimization benign. Nonetheless, we should complain about it, because it is highly unrealistic (and it was not intended to be realistic).

But you had just indicated how I had made optimization benign in a statistical sense without reduction to a scalar figure of merit. Almost every cost function is individually benign, and I clarified this in later work when I argued that almost all cost functions are algorithmically random. Whether random cost functions are realistic or not is irrelevant in treatment of the “microeconomy” (or geometry) of search. My observation that NFL does not hold precisely in the real world is not a criticism of Wolpert and Macready’s work. And some real-world distributions may be approximately NFL.

I have not read Schaffer’s article, but perhaps it’s emphasis is on learning/inference rather than optimization. In any case, Wolpert & Macready also did not cite the previous work by Wolpert in their 1997 article.

But the NFL theorems are conservation theorems, and there is a close logical relationship. It is so close, in fact, that I have treated conservation of generalization accuracy and conservation of search quality together in a paper and in a tutorial. And the “no free lunch” has turned out to be a royal pain to use.

Wolpert & Macready’s 1997 article contains a NFL theorem for time-dependent cost functions and the domain of a continuous cost function can be binned into, say, 10^100000 bins and replaced by the (finite) set of center points of the bins. The NFL theorem is also not restricted to non-random search. Thus, of the four objections listed, only the revisiting of points is a valid objection (albeit not one I would recommend).

Sorry, I know my “original NFL theorems” was ambiguous. I was talking about the main theorems toward the beginning of the paper, but you are correct with regard to nonstationary cost. My point with the impossibility of defining a finite domain is that the genome grows through duplication of genes and other phenomena. In NFL the search must visit every point exactly once. So if you say that biological evolution visits every genome in a finite set, you imply that it is possible that it visits one longer than any in the set, a contradiction. Part of the problem here is that we have no reason to believe that biological evolution ever halts. If it is not algorithmic, then NFL results do not apply. The NFL theorems indeed apply only to deterministic search. An evolutionary computation is instantiated by supplying its pseudo-random number generator with a seed. The seeded EC is a deterministic algorithm. There is no corresponding trick for biological evolution, which is really random, whatever that really means.

I agree entirely that there is no obvious fitness function. Some people I respect persist in seeing biological evolution as optimization, but I have trouble seeing optimization in culling of the less fit.

Tom

Mark,

Until now, I had some general idea about the limitations of the NFL theorems, but now you have provided some specifics which I would like to think about.

Glad to know I’ve managed to do some redecorating. Or was that the nudge in the ribs I asked you to give me?

It looks like Igel and Toissant have proven that for the overwhelming majority of fitness functions encountered in the real world the NFL theorems do not hold.

Unfortunately, “On Classes of Functions for which No Free Lunch Results Hold” proceeds from a false “theorem” that was circulating in the literature, namely that there is NFL for a function distribution if and only if it is uniform on a set of functions closed under permutation (c.u.p.). Their dismissal of NFL hinges on the fact that the number of c.u.p. sets is finite and relatively small. But I had proved in 1996 that the NFL distributions are uncountable. Oops. I doubt there’s much to salvage in the paper.

Fortunately, the authors recently gave one of the three independent proofs that a block uniform distribution is necessary and sufficient for NFL. The set of block uniform distributions is uncountable, as it should be, and includes uniform distributions on c.u.p. sets as a minute subset.

Tom English Wrote:

Unfortunately, “On Classes of Functions for which No Free Lunch Results Hold” proceeds from a false “theorem” that was circulating in the literature, namely that there is NFL for a function distribution if and only if it is uniform on a set of functions closed under permutation (c.u.p.). Their dismissal of NFL hinges on the fact that the number of c.u.p. sets is finite and relatively small. But I had proved in 1996 that the NFL distributions are uncountable. Oops. I doubt there’s much to salvage in the paper.

There Igel & Toussaint’s paper is perfectly fine, but your description of it is not. When you state the theorem in question the position of the word “uniform” is very important.

Your mistaken formulation: “there is NFL for a function distribution if and only if it is uniform on a set of functions closed under permutation”.

Igel & Toussaint’s actual theorem: the NFL result holds for a uniform probability distribution on a set of cost functions if and only if the set of functions is closed under permutation.

The theorem is only a statement about uniform probability distributions and the word “uniform” is thus part of the assumptions, not the conclusion of the theorem. Igel & Toussaint subsequently generalized the theorem to a statement about all probability distributions.

Syntax Error: mismatched tag at line 11, column 67, byte 719 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/Parser.pm line 187.

Syntax Error: mismatched tag at line 11, column 67, byte 719 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/Parser.pm line 187.

Syntax Error: not well-formed (invalid token) at line 18, column 59, byte 1628 at /usr/local/lib/perl5/site_perl/mach/5.18/XML/Parser.pm line 187.

Mark,

No need for apology. I have worked on NFL from 1995 to the present. Wolpert published the opus magnum in 1997 and turned to other research topics. You perceive him as the authority because he wrote the seminal paper, but I think that’s a poor criterion.

Most people love to see others extend their work. Others suffer serious cases of NIH (Not Invented Here). There is a reason you have never managed to get Wolpert to respond favorably to work other than his own. There is a reason you have gotten no coevolution work out of him.

As I suggested above, I have emphasized that conservation (unfortunately renamed “no free lunch”) is a matter of degree, and will never be absolute in the real world. One of the major points of my latest paper is that search transforms one probability distribution on the functions into another, and the degree of divergence from a precise NFL distriction is preserved in the transformation. The question of whether there could be precise conservation in the real world has been open since 1995, and I resolved it this year arguing in terms of Kolmogorov complexity.

These things have a way of working themselves out in time. Other researchers and I have recently made great progress in conservation. There will be a lot to emerge over the next couple years. And this may be very disturbing for someone who wants everything to stay put.

Tom. I’ve tried to work through your paper, but I just can’t get it. Is there any way you can state your result in simplified terms? I’m used to thinking of NFL theorems in the form “NFL results hold for function distributions meeting such-and-such conditions”. Can your theorem be expressed in this way?

Erik Wrote:

The main problem with using the NFL theorem as an argument against evolution is that there is no obvious cost function. The evolutionary process may be thought of as local optimization of a (changing) function that measures the reproductive success, but this function is not what interests creationists. Creationists are more interested in functions that measure the impressiveness of biological features. Such a function depends on peculiarities of the human mind and is not intrinsic to the evolutionary process.

Why can’t creationists take the fitness function to be reproductive success, and the performance measure to be increase in functional complexity (in some suitable sense) over a given time (i.e. number of evaluations)? The concept of functional complexity may not be intrinsic to the evolutionary process, but that doesn’t mean that creationists can’t use it as their performance measure, does it?

I agree with your other points (at least those I feel able to judge).

Tom English, I think we agree on much, but not this particular point:

Tom English Wrote:

My point with the impossibility of defining a finite domain is that the genome grows through duplication of genes and other phenomena. In NFL the search must visit every point exactly once. So if you say that biological evolution visits every genome in a finite set, you imply that it is possible that it visits one longer than any in the set, a contradiction. Part of the problem here is that we have no reason to believe that biological evolution ever halts.

1. It is not necessary that every point is visited once. What is (formally) necessary is that no point is visited twice or more*, but this leaves the possibility that some points are not visited at all. Note, e.g., that Theorem 1 in Wolpert & Macready’s 1997 paper holds for arbitrary m (smaller than the number of search space points), i.e. for samples of arbitrary size. If the theorem required every point to be sampled it would be very trivial and uninteresting!!

2. The set of all genomes is in principle infinite, but in practice we can safely stipulate that no genome is longer 1010000000 nucleotides and that gene duplications that would make a genome longer than this are replaced by the identity operation on the genome. This way we reduce the search space to a finite search space that is closed under gene duplication.

Tom English Wrote:

I agree entirely that there is no obvious fitness function. Some people I respect persist in seeing biological evolution as optimization, but I have trouble seeing optimization in culling of the less fit.

I’m not sure what you mean by “fitness function” (see my reply to Richard Wein immediately below). I wrote that there no obvious cost/objective function, but there is an obvious quantity subject to (local) optimization: Reproductive success. This quantity is locally maximized in roughly the same sense as energy is locally minimized, entropy is maximized, and action extremized in physics. Of course, the evolutionary process does not always find the optima, but I find it very reasonable to think of the dynamics as a local search for them.

———– * This restriction is only necessary because otherwise a search algorithm that finds a good point in the search space could continue to sample this point and in this way obtain a good performance. This way of “circumventing” the NFL theorem is not interesting.

Richard Wein Wrote:

Why can’t creationists take the fitness function to be reproductive success, and the performance measure to be increase in functional complexity (in some suitable sense) over a given time (i.e. number of evaluations)? The concept of functional complexity may not be intrinsic to the evolutionary process, but that doesn’t mean that creationists can’t use it as their performance measure, does it?

The paragraph you quoted was probably poorly formulated – it was meant to be an introduction to the problem, not a statement of the actual problem.

We must be careful not to equivocate between “fitness function” in the evolutionary biologist’s sense of reproductive success and “fitness function” in the optimization theorist’s sense of “the quantity we desire to maximize”. Because the idea behind evolutionary optimization algorithms is to force the reproductive success to be equal to (a function of) the quantity we desire to maximize, the distinction is somewhat blurred within the evolutionary optimization framework, but we should make a careful distinction in the present context. For this reason I will not use the term “fitness function” – I’ll use the term “reproductive success” for… well, reproductive success, and the term “objective function” for the quantity we desire to maximize.

Why “can’t creationists take the fitness function to be reproductive success […]”? If you mean that creationists should take the objective function to be whatever the empirical reproductive success turns out to be, and try to evaluate how well biological evolution could maximize this objective function, then I don’t think it makes much sense. One could, however, take the objective function to be what you call “functional complexity” (provided that this quantity is a function of a single search space point) and the perfomance measure to be, e.g, the average or maximum “functional complexity” realized by biological evolution (the performance measure must be a function of only the sequence of the values of the objective function that the search algorithm encounters).

The actual problem I was trying to point is that in order to invoke the NFL theorem one must first: (i) Define which quantity we take the objective function to represent, i.e. define the physical meaning of the objective function. (My point about there being no intrinsic objective function is simply that one must be supplied, not that extrinsic objective functions cannot be used.) (ii) Justify (in English’s terminology) a block-uniform probability distribution over a set of functions, which we take to be candidates for the way quantity chosen in (i) depends on points in the search space. (iii) Make sure that the interpretation chosen in (i) makes sense for all functions that were assigned a non-zero probability in (ii). (For instance, suppose, for the sake of the argument, that we choose the quantity “number of heads” and wish to use the NFL theorem to study how well biological evolution could optimize this quantity. Then you must make sure that the objective functions that are assigned a non-zero probability in (ii) really can be interpreted as a number of heads — e.g., you’ll find it difficult to self-consistently interpret an objective function that assigns the value 17 to a giraffe, when a giraffe only has one head.) (iv) For quantity chosen in (i), verify that there is a difference between the empirically observed values and the values expected from biological evolution.

I wouldn’t hold my breath.

Erik

Erik, let me try to explain my point better, and refine it in the process. Since you prefer the term “objective function” to “fitness function”, I’ll use that. But note the distinction between the objective function and the performance measure. I find Fig. 1 of Igel and Toussaint’s paper “Recent Results on NFL Theorems For Optimization” (arxiv.org/pdf/cs.NE/0303032) useful for illustrating the various elements of the NFL optimization scenario.

To apply the NFL theorems to biological evolution I propose that we take as our objective function the reproductive potential of the organism, i.e. how good it is at reproducing. I propose that we take as our performance measure the greatest functional complexity found by the algorithm. I’m using “functional complexity” to mean the sort of complexity which we see in biological organisms and want to explain. I can’t give a good definition, but I think we know it when we see it: a watch has more of it than a spanner; a human has more of it than an amoeba. It is I think what Leslie Orgel was getting at when he coined the term “specified complexity” (but it has nothing to do with Dembski’s specified complexity).

I am assuming that functional complexity can be evaluated as a function of the genome of the organism, i.e. its location in search space. It may be objected that the functionality (and therefore the functional complexity) of an organism depends on its environment. But I am assuming–for now–a constant environment, so I think this objection can be dismissed.

It seems to me that this approach solves your “main problem”:

Erik Wrote:

The main problem with using the NFL theorem as an argument against evolution is that there is no obvious cost function. The evolutionary process may be thought of as local optimization of a (changing) function that measures the reproductive success, but this function is not what interests creationists. Creationists are more interested in functions that measure the impressiveness of biological features.

I’m proposing that the “impressiveness of biological features” (or functional complexity) can be taken as the performance measure rather than as the cost/objective function.

Of course, there are other problems with attempting to apply the NFL theorems to biological evolution. I am only addressing your “main problem”.

Richard Wein Wrote:

I find Fig. 1 of Igel and Toussaint’s paper “Recent Results on NFL Theorems For Optimization” (arxiv.org/pdf/cs.NE/0303032) useful for illustrating the various elements of the NFL optimization scenario.

OK, I recommend that you look at Fig. 1 as you read the following. I’m going to use the same notation.

Richard Wein[/quote Wrote:

To apply the NFL theorems to biological evolution I propose that we take as our objective function the reproductive potential of the organism, i.e. how good it is at reproducing. I propose that we take as our performance measure the greatest functional complexity found by the algorithm. I’m using “functional complexity” to mean the sort of complexity which we see in biological organisms and want to explain.

Your choice of performance measure c is inconsistent with your choice of objective/cost function f. Note that c(Y) is a function of only Y, which means that the performance measure only depends explicitly on the sequence of objective/cost function values discovered by the search algorithm. Of course, the objective/cost function values found by the search algorithm depend on the search space points found and on the algorithm used, so there is an implicit dependence of the performance measure on search space points. But there is no explicit dependence.

You have chosen reproductive potential as your objective/cost function f and “functional complexity” as your performance measure c. The performance measure must be fully determined by the sequence of discovered objective/cost function values, so you should be able to answer the following question: Suppose that biological evolution, during a brief period of time, discovers 10 genomes with the following reproductive potentials: 1.0, 1.1, 0.9, 1.2, 1.4, 1.4, 1.3, 1.4, 1.2, and 1.4 offspring/year, respectively. What is the “functional complexity” of this sequence of objective/cost values?

Given your choice of objective/cost function and performance measure, this is the question you have committed to answer, but it’s probably not the question you expected to commit to answering. I’m guessing that you want your “functional complexity” to depend on the discovered genomes (the x’s in Fig. 1), rather than their reproductive potentials (the f(x) values in Fig. 1, with your choice of f), but then you can’t use it as a performance measure, since the performance measure must depend only implicitly on the discovered genomes.

The inconsistency is easily fixed (what you really want is to use “functional complexity” as your objective function and the maximum discovered “functional complexity” as your performance measure). I want to make sure that the relation between the objective/cost function and the performance measure is clear before addressing the rest of your argument, though.

Erik Wrote:

I find this exercise unremarkable and unnecessary. I think it loses all its appeal when we are careful to write “information about X” instead of reifying the concept by calling it merely “information”. We don’t need to compute entropies in order to arrive at the obvious qualitative conclusion that knowing about selection of an allele gives us information about the fixation of the same allele, nor can I think of a sensible interpretation of the quantitative result that the information about the fixation is log(N)/N, nor is this the kind of process that creationists dispute.

Well, Kimura went further in his paper calculating how much “information” would have accumulated since the Cambrian(?), and how much “information” was contained in our genome. He then argued that there is ten times as much capicity in our genome as there could be information in it, based on selection over the past millions of years. It was a little bit odd, but it was an interesting read.

Erik wrote: (the performance measure must be a function of only the sequence of the values of the objective function that the search algorithm encounters). I don’t think there is such a limitation insofar as we are discussing the NFL theorems as such (such a limitaion may, though, possibly arise in some specific applications of the NFL theorems). In many optimization scenarios, performance measure can be, for example, simply the number of iterations the search needs to reach a certain pre-selected value of the fitness function (obvioulsy, this particular choice of the performance measure is not very well suited for biological evolution, but Erik’s statement seemed to be meant as a general idea for optimization searches). Perhaps also some other possible performance measures can be imagined not meeting Erik’s limitation - to state it, a rigosous proof would be needed with exceptions determined. Just \$0.02.

Erik Wrote:

Your choice of performance measure c is inconsistent with your choice of objective/cost function f. Note that c(Y) is a function of only Y, which means that the performance measure only depends explicitly on the sequence of objective/cost function values discovered by the search algorithm.

That’s a good point, which completely demolishes my argument. What more can I say? ;-)

Mark Perakh Wrote:

Erik wrote: (the performance measure must be a function of only the sequence of the values of the objective function that the search algorithm encounters). I don’t think there is such a limitation insofar as we are discussing the NFL theorems as such (such a limitaion may, though, possibly arise in some specific applications of the NFL theorems). In many optimization scenarios, performance measure can be, for example, simply the number of iterations the search needs to reach a certain pre-selected value of the fitness function (obvioulsy, this particular choice of the performance measure is not very well suited for biological evolution, but Erik’s statement seemed to be meant as a general idea for optimization searches). Perhaps also some other possible performance measures can be imagined not meeting Erik’s limitation - to state it, a rigosous proof would be needed with exceptions determined.

I did indeed mean the statement to be taken as a general statement. If Fig. 1 of Igel & Toussaint’s paper fails to communicate this point, then let me explain Wolpert & Macready’s notation and provide a quote that says exactly the same thing.

Wolpert & Macready denote the sequence of the m first visited search space points by dmx and the sequence of cost values corresponding to these points by dmy. For instance, if we’re searching the set of all bit strings of length 5 for the minimum bit sum, then we could have, after five evaluated search space points,

d5x = (10101, 11101, 01101, 01001, 00001), d5y = (3, 4, 3, 2, 1).

The performance measure must be a function of only the cost function values in d3y. On p. 69 of their 1997 article, Wolpert & Macready write:

“Under the oracle-based model of computation any measure of the performance of an algorithm after m iterations is a function of the sample dmy. Such performance measures will be indicated by Phi(dmy). As an example, if we are trying to find a minimum of f, then a reasonable measure of the performance of [the search algorithm] might be the value of the lowest Y value in dmy: Phi(dmy) = mini {dmy(i): i=1…m}. Note that measures of performance based on factors other than dmy (e.g., wall clock time) are outside the scope of our results.”

Wolpert & Macready (1997), IEEE Trans. Evol. Comp., 1:67-82

An example of a valid performance measure is the number of iterations that occured before finding a cost value better than a given fixed threshold, because this can be determined by looking only at the sequence dmy of cost values. Just count the values in dmy until you find a value better than the threshold or get to the end of the sequence!

Reed A. Cartwright Wrote:

Well, Kimura went further in his paper calculating how much “information” would have accumulated since the Cambrian(?), and how much “information” was contained in our genome.

Dear All,

Sorry not to be responding. I am busy at a conference until Wednesday. People filled the room and stood in the doorway for my “No More Lunch” talk. I praised Wolpert and Macready’s work, as always.

I met Robert Axelrod yesterday. By chance I had recently developed a minimalistic evolutionary algorithm, with much the flavor of Axelrod’s models, to illustrate information gain in a process of random mutation and natural selection. I now believe it has scientific value.

Erik Wrote:

Igel & Toussaint’s actual theorem: the NFL result holds for a uniform probability distribution on a set of cost functions if and only if the set of functions is closed under permutation.

You’re right. I had not read the present version of the paper, having reviewed an earlier version submitted to a conference. There is now some careful wording that is technically correct but practically misleading.

Igel & Toussaint Wrote:

The No Free Lunch (NFL) theorems–roughly speaking–state that all search algorithms have the save average performance over all possible objective functions…

Yes, but Wolpert and Macready (1997) included my sufficient condition generalizing NFL to uncountably many non-uniform distributions. They did not need the generality in the paper, so they did not prove a theorem. But in their adversarial argument, Igel and Toussaint should have addressed a strong version of NFL, not a weak one. The present paper hinges upon making the original NFL theorems the NFL theorems.

Igel & Toussaint Wrote:

… we have shown that the statement “I’m only interested in a subset F of all possible functions, so the NFL theorems do not apply” is true with probability close to one (if F is chosen uniformly…)

More careful wording. I think the assumption that we draw classes uniformly from the power set of functions is unreasonable. The c.u.p. sets have much lower Kolmogorov complexity than the typical class, and thus their probabilities under the universal distribution are much greater than that of the typical class. In other words, simple classes should be considered more likely to arise than complex classes.

Richard Wein wrote: “That’s a good point, which completely demolishes my argument. What more can I say?” Bravo, Richard! You have set an example for all of us. What more can I say? Mark

Richard Wein Wrote:

I’ve tried to work through your paper, but I just can’t get it. Is there any way you can state your result in simplified terms? I’m used to thinking of NFL theorems in the form “NFL results hold for function distributions meeting such-and-such conditions”. Can your theorem be expressed in this way?

Hi, Richard. I am in fact trying to change the way everybody thinks about NFL. In a broader analysis, strict NFL is simply not as important as it once seemed.

I represent test functions and search results identically, avoiding an unnecessary distinction between function distributions and result distributions. Thus a search algorithm may be seen as an operator on the set of all such distributions. A search algorithm essentially shuffles the probabilities in the function distribution to give the result distribution. The shuffle is constrained not to change the identity of or distance to the nearest block uniform distribution (i.e., one in which the probability of each function equals the probability of each of its permutations).

A block uniform distribution is mapped to itself by all search operators. That is, we have strict NFL when every search algorithm has a result distribution identical to the function distribution. Now if the function distribution is not block uniform, result distributions will diverge from block uniformity precisely as much as the function distribution does. This places a bound on how much result distributions for different search algorithms can differ from one another. If the function distribution is almost block uniform, then there will be little difference in the result distributions of different search algorithms.

What about quality of search results? We can obtain a distribution of quality scores from the distribution of search results. When their result distributions are identical, search algorithms have identically distributed quality scores by all quality measures. When the result distributions are not equal, I assume that similar result distributions imply similar quality distributions.

I have not proved this, but I am fairly sure the volume of the set of block uniform distributions is zero. Thus Igel and Toussaint painted too rosy a picture of strict NFL by counting c.u.p. sets. But I have shown that we can define neighborhoods of approximate conservation (“approximate NFL” is a linguistic monstrosity) about points of strict conservation (NFL). The upshot is that strict conservation seems to be more a theoretical construct than anything. There is no reason yet to conclude that approximate conservation is not an important practical consideration.

Erik Wrote:

It is not necessary that every point is visited once. What is (formally) necessary is that no point is visited twice or more*, but this leaves the possibility that some points are not visited at all. Note, e.g., that Theorem 1 in Wolpert & Macready’s 1997 paper holds for arbitrary m (smaller than the number of search space points), i.e. for samples of arbitrary size. If the theorem required every point to be sampled it would be very trivial and uninteresting!!

The algorithms are indeed algorithms (they halt) for exhaustive exploration of the search space. By definition, they do not get stuck after m evaluations. See the last paragraph on p. 68 of the TEC article (with symbols I am not going to fool with). I have seen a lot of energy wasted due to the failure to realize that the search algorithm is implicitly a decision tree. I have reasoned about complete sequences of observed values since 1996. It is generally no problem to take prefixes when you want them. Furthermore, note that you can always turn a string prefix into a histogram, but you cannot convert a histogram into a unique string. Histograms served their purpose for Wolpert and Macready, but they also kept people from seeing other important aspects of NFL.

I’m not sure what you mean by “fitness function” (see my reply to Richard Wein immediately below). I wrote that there no obvious cost/objective function, but there is an obvious quantity subject to (local) optimization: Reproductive success.

I often speak in terms of reproductive success, and said “fitness” with others here in mind. I see individuals as having evolved to maximize their reproductive success, but I resist suggestions that biological evolution has any objective. No teleology for me.

Mark Wrote:

Besides the fact that Wolpert authored a seminal paper, I happen also to like that paper. I believe it is a very good one… provide, if possible, an explanation of your theorem in terms as simple as possible (although you certainly owe us nothing).

I evidently have come off as very harsh. Wolpert and Macready’s paper was voted the best in vol. 1 of the IEEE Transactions on Evolutionary Computation, no matter that it destroyed the rationale for many people’s research. I agree that it was the best paper of the year. So if I see further than others have, it is that I stand on the shoulders of… quirkish giants.

Something you may appreciate is that it was the editor-in-chief David Fogel who pushed for inclusion of that paper in the first issue of the Transactions. He gave “No More Lunch” a very close reading, and is acknowledged in the paper.

Keep asking me questions until I get the message across. I hope to publish a simple explanation of my results, and you can help me get there.

Cheers, Tom

Tom, a challenge: first, to note the obvious: if you can only get the message across to Richard, Mark and Eric you are reaching a rather limited audience. You can do better, and the aim of the challenge is to help.

It is of interest to know how the NFL theorem(s) apply, or not, to empirical biological situations. And, sorry to bring this up, it can be difficult to be empirical and uncountable at the same time. So as step 0 of the challenge (this may be the most wrenching part for you :)) I ask you to come down from the continuum, down even from countable, down, down to a number like three or two. Three would be fine although we’ll have to add 1 later on.

The challenge is to work a concrete example.

Suppose we have a population of land snails with three phenotypes, let’s say three color morphs: light pink, brown, and striped. Let’s say the fitnesses of the morphs are 1.1, 1.0 and 0.9 respectively.

The objective is to give the details of a scenario that is both life like and NFL like, that is, the conditions of NFL are satisfied. If this can be done, fine. If not, why not? To make the answer clear, give two scenarios, one life like and the other NFL like.

You have to fill in the blanks from here on: We need a set of functions. What are they? The set must be closed under permutations. What permutations? Do the permutations involve associating each of the morphs with each of the fitnesses? We need a performance measure. What is it?

To go further you may need to consider a mutation that gives us green snails.

Explain the situation(s) with respect to classic NFL, that is, performance of evolution vs blind search. Explain the situation(s) with respect to your NFL as conservation theorems: what, concretely, is conserved or not in which case?

If for any reason you are not able to work with so simple an example, come as close as possible. And if you are willing to humor me on this, thanks very much!

Pete

- “If you can’t explain it to the cleaning lady, you don’t understand it yourself”.

Pete Dunkelberg, this isn’t the realistic example you are seeking, but it is a concrete (if purely fictional) story:

Let me tell you about the intergalactic pet store Unipet’s latest business concept: Make-a-pet. The customers of Unipet, coming from all over the universe, have a truly diverse set of preferences and it is difficult to keep a stockroom of pets so that they have something for everyone. For instance, the Beheians would never consider buying any pet that isn’t “irreducibly complex”. Other customers would never consider buying a pet that is “irreducibly complex”. Unipet’s strategic surveys and polls show that the number of customer profiles needed to accurately capture all customers is gigantic. Hence the need for Make-a-pet.

The idea behind Make-a-pet is that a pet is tailor-made based on the preferences of each individual customer. Since it is just impossible to learn even a fraction of all languages in the universe well enough to discuss the customer’s rather involved preferences, the communication between the customers and the pet engineers is very crude: The pet engineers will show a pet and the customer will simply rate it by holding up a sign with a digit between 0 and 9 on it. (This customer-rating is the objective function.) Based on all previous ratings by the customer, the pet engineers will construct a new pet that hopefully receives a higher rating by the customer, and so on. (The strategy used by the pet engineers to construct new pets based on the customer’s ratings of previously constructed pets is the search algorithm. The fact that the customer rating is the only communication between the customer and the pet engineers makes the search algorithm a “black-box” algorithm.) It goes without saying that the most expensive part of this process is the production of new pets, and it is desirable to minimize the number of pets that need to be produced before one with satisfactorily high customer-rating is discovered. But with Unipet’s rather advanced technology, the company’s experts reasoned that producing pets still beats having a storeroom with a gigantic zoo of pets.

Make-a-pet was neither a stellar success nor a miserable failure. The most remarkable thing about the Make-a-pet project was that it didn’t seem to matter which strategy the pet engineers used. They tried hiring the best statisticians in the universe to infer the preferences of individual customers based on their ratings. They tried ignoring the previous ratings by the customer altogether and just follow a fixed sequence of pets. They tried making pets at random. They tried having a population of pets that was allowed to evolve by differential reproductive success without outside intervention, and picking pets at random from this naturally evolving population. (This is how biological evolution comes in. Letting pets evolve in an unconstrained fashion is just one search strategy among zillions of others.) No matter which strategy the pet engineers used, it turned out that, when averaged over a large number of customers, the number of pets that needed to be produced before one with a satisfactorily high customer rating was discovered was the same for all strategies. The best statisticians in the universe performed no better than a random number generator. (This is the NFL theorem.)

When Unipet’s statisticians discovered this fact they started thinking about what must be true about the distribution of customer preferences. Theoretically, the preferences of a customer can be represented as a list of pairs between possible pets and ratings between 0 and 9. For instance, such a list could begin with

(giraffe, 0), (cat, 7), (rabbit, 6), (ill-tempered sea-bass, 2), …

By swapping the scores of two or more pets in such a list, a new list can be created. For instance, swapping the scores of the rabbit and the ill-tempered sea-bass gives the list

(giraffe, 0), (cat, 7), (rabbit, 2), (ill-tempered sea-bass, 6), …

The criterion for when the strategy used by the pet engineers makes no difference for the average performance is that: All preference lists that can be transformed into each other by swapping scores in the above manner must be equally probable. Preference lists which cannot be transformed into each other in the above manner may have different probabilities, though. (Tom English refers to such probability distributions as “block-uniform”, with the blocks being sets of preference lists that can be transformed into other by swapping scores.)

If Unipet doesn’t give up on the Make-a-pet project first, the company statisticians may refine the analysis in the future, taking into account that some pets are more expensive to produce than others. When nothing is known about the customer’s preferences it makes sense to try the possibilities that are cheap to produce first. Such factors are presently beyond the scope of the analysis, though.