Common ancestry passes another test. News at 11.

| 163 Comments

Many readers will be familiar with longtime TalkOrigins regular Doug Theobald – he is the author of “29+ Evidences for Macroevolution: The Scientific Case for Common Descent,” pretty much the most impressive FAQ of all time. Oh, and he’s a professor too, and has published some other stuff.

Today he has published a pretty impressive paper in Nature. It is entitled “A formal test of the theory of universal common ancestry.” Basically, it applies the likelihood-based and Bayesian phylogenetic techniques that have been developed over the last decade or two, adds in some standard model-selection theory, and uses these to assess “universal common ancestry” (UCA). A lot of arguments “for common ancestry”, e.g. biogeography, are really arguments for the common ancestry of groups of modern-day organisms – like mammals – rather than arguments that every living thing we know about shares common ancestry. There have been some powerful arguments for UCA over the years – e.g. the extremely conserved (if not quite identical) genetic code (and as everyone except Paul Nelson knows, “almost identical” and “identical” are virtually the same thing statistically, so his decade of yammering about the non-universality of the genetic code has had no impact on this evidence). However, although the arguments remain powerful and convincing, they weren’t usually quantitative and statistical, and it takes some serious work to construct a statistical assessment of something as deep and universal as common ancestry. This is what Doug has done.

He’s getting a lot of press. Just in Nature there is a News & Views from Mike Steel and David Penny, and a Nature podcast.

I can’t wait to read creationist/ID reaction to this paper. They will likely do what they always do, which is make up something ad hoc on the spot, like, “Oh, God would have done it [i.e. produced the observed sequence patterns] that way when he miraculously created species.” Until they produce a quantifiable model to compare to the common ancestry one via a likelihood ratio test (LRT) or Akaike Information Criterion (AIC), such verbiage is pretty much pointless. Either that, or there will just be confused bickering based on misunderstandings of likelihood, probability, statistics, etc. It should be great sport.

So that you can follow the chaos, here’s a quickie for those who didn’t learn this stuff in kindergarten or in frequentist-dominated intro stats classes:

1. likelihood = the probability of the data, given a model = P(data|model)

2. Two (or more) models* can be compared by taking a single dataset** and calculating the likelihood under each model. The highest likelihood model confers the highest probability on the data, and is considered to be the model that best explains the data. If the difference in likelihoods is big enough, one can say (using various tests) that one model is statistically significantly better than another model.

* Models like, say, different phylogenetic trees and/or different sets of transition probabilities between DNA or amino acid sequences.

** A single dataset like, for example, an alignment of a bunch of gene or protein sequences.

3. posterior probability = probability of the model, given the data = P(model|data)

4. Bayes’ Theorem allows you to take a prior probability of a model (P(model), e.g. your model could be “this coin has a 50% chance of landing heads on a toss” – these are your initial beliefs), add some data (say, coin tosses), calculate the likelihood of that data given the model, and then calculate a posterior probability (your updated beliefs).

5. So probability, likelihood, and posterior probability are related, but they are not the same thing.

6. For much more, including a primer on the differences between frequentist, likelihoodist, and Bayesian schools of thought in statistics (I get these categories from Sober 2008, Evolution and Evidence, so please argue with me about something other than this), please see these lecture notes for a introductory lecture I recently gave on Bayesian phylogenetics: http://ib.berkeley.edu/courses/ib20[…]ndouts.shtml, Tuesday, March 9 (PDF).

Congrats to Doug! A lot of work went into this paper, and I think it will be a classic. Apart from debunking creationists, it also takes down a few other misconceptions that are pretty silly but have for some become widespread even with scientists, i.e. (1) the idea that lateral gene transfer contradicts UCA; and relatedly, (2) the idea that UCA means that all life descends from one literal single individual organism, rather than from an ancestral population. The latter idea is particularly strange: did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual? I think not; it was always an ancestral population. So why should the common ancestor of all life have been a single individual organism, especially since we have known of bacterial conjugation for 50+ years. I suspect that many people have been mislead by the fact that “ancestor” is a singular, rather than plural, and then mistakenly extrapolated this to mean single individual organism.

So, enjoy, and please post links/comments on creationist reactions. Here’s the first, from Todd Wood: Testing universal common ancestry?.

Full disclosure: I am not entirely neutral, as both I and fellow PTer John Wilkins got to see the paper during its development, and give comments etc.)

PS: Oh yeah. I almost forgot. This quote is for those who think the results are trivial:

“It will be determined to what extent the phylogenetic tree, as derived from molecular data in complete independence from the results of organismal biology, coincides with the phylogenetic tree constructed on the basis of organismal biology. If the two phylogenetic trees are mostly in agreement with respect to the topology of branching, the best available single proof of the reality of macro-evolution would be furnished. Indeed, only the theory of evolution, combined with the realization that events at any supramolecular level are consistent with molecular events, could reasonably account for such a congruence between lines of evidence obtained independently, namely amino acid sequences of homologous polypeptide chains on the one hand, and the finds of organismal taxonomy and paleontology on the other hand. Besides offering an intellectual satisfaction to some, the advertising of such evidence would of course amount to beating a dead horse. Some beating of dead horses may be ethical, when here and there they display unexpected twitches that look like life.”

Emile Zuckerkandl and Linus Pauling, discussing the possibility of the twin nested hierarchy before the first molecular phylogenies had been made. (1965) “Evolutionary Divergence and Convergence in Proteins.” in Evolving Genes and Proteins, p. 101. (source)

163 Comments

Most excellent.Thanks for the write-up Nick.

The paper itself (like the commentary) is paywalled, so all I have to go on here is Todd Charles Wood’s commentary. And I think he makes a couple of valid points. Here’s my version, which may not be Wood’s:

1. Theobald is really, apparently, testing whether the sequences are significantly similar, i.e. more so than can be explained by chance resemblance. If they are, the single-tree model explains the data best. If they aren’t, a multiple-tree model is at least as good. But he began by choosing proteins that were similar enough for homology to be clear. Is that a bias? Well, I’m pretty sure that level of similarity can’t be explained by chance even if we correct for the sequences having been chosen from a universe of many more sequences. But it’s something to think about. The creation model assumed here – randomized sequences – is falsified if we find any proteins at all with detectable homology.

2. It’s not explicitly claimed to be a test of separate creation, but everyone is going to take it that way. Given that, what’s the proper model for separate creation? We don’t know. It could be that a rational creator would have originally made “kinds” using conserved (in the creation model, that equals “under strong purifying selection”) proteins that were initially identical. Why not? (Then again, why? That’s a problem with creation models.) And that’s the same as having a common ancestor on a star tree. This is actually a test of significant similarity, and a creator could have made proteins with any degree of similarity from complete to none. The only creation model falsified here is one in which each “conserved” protein in each kind is randomly chosen from among all possible sequences that would perform the same function. And even then we would have to know how many such sequences there are, and know their range of similarity; this model seems to assume that sequences are effectively unconstrained.

Dr. Theobald is rather busy at the moment with the end of the semester and media inquiries. He has promised me that he will write something up about this research for us later this week.

So THIS explains the Tea Party!

This is actually a test of significant similarity, and a creator could have made proteins with any degree of similarity from complete to none. The only creation model falsified here is one in which each “conserved” protein in each kind is randomly chosen from among all possible sequences that would perform the same function. And even then we would have to know how many such sequences there are, and know their range of similarity; this model seems to assume that sequences are effectively unconstrained.

First of all, creationists regularly assert, basically always ignorantly, that there is a lack of phylogenetic signal in molecular data (tree conflicts and all that), and take this as evidence of their position. So those creationists, which is lots of them, have been hoisted on their own petard.

But regarding any “creation model” interpretations of “separate ancestry” – one pretty good argument Doug uses is that in the absence of any knowledge about parameters God “would have” used, you instead just estimate these straight from the data. So the frequency of each amino acid is taken from the dataset under consideration (not “random”), and the only question is whether or not separate ancestry can produce the sequences with the same likelihood as common ancestry – and by a long shot, they can’t.

I suppose you could say “what if God wanted there to be the same correlation structure as imposed by phylogenetic connection.” This would have the same likelihood as the evolutionary model. But this is on exactly the same level as saying that the Earth is actually young, but God made it look old. Good luck with that…

I see a problem here. A randomized protein, even with preserved amino acid frequencies, is a major assumption. If we suppose that god wanted the protein to function, and to have the same function (more or less) in the different organisms, that’s not a good model. Presumably a small proportion of the randomized sequences would have the appropriate function. What constraints are imposed by functional necessity? We don’t know, and would have to take that into account in any model of separate creation. I strongly suspect that constraints aren’t enough to account for the observed data, but the present study doesn’t show that.

It seems to me there are two more or less reasonable (scare quotes may be appropriate there) creation models: one in which god chooses random sequences from the set of all possible functional sequences, and one in which he chooses identical sequences. The first case emphasizes his infinite creativity, while the second emphasizes his (I guess) efficiency. Either seems equally likely considering his claimed attributes. Again, I haven’t see the paper, but as described by Wood neither model is tested. A randomized model would inflate the likelihood difference compared to that first, functionally constrained model. And the second model would produce (after some considerable evolution, though potentially within separate kinds, as in a lawn) a correlation structure just like real common descent: instead of an ancestor there’s a single identical part common to a variety of models. Like I said, that would be a star tree, so if Theobald also tests and rejects a star tree, he has dealt with that model. Does he?

The latter idea is particularly strange: did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual? I think not; it was always an ancestral population.

I am not a scientist, but this has always puzzled my when asserted regarding abiogenesis as well. I mean authoritatively asserted that there was a single abiogenesis event from which all life has evolved. But understanding how evolution works I can’t see how this can be asserted. If there is no definition of “life” that applies universally, then how can there be a single abiogenesis “cell” that was just on our side of the threshold of what we call life and that everything went on from there?

Mike Haubrich, don’t get hung up on the ‘definition’ red herring. Whole civilizations preceded the definition concept. This has nothing to do with what Nick said; it is immaterial whether some would say “X is life” or “X is not quite life” at some early stage. Don’t think of the first life as a cell as we know them either.

I’m glad Nick made this ‘starter’ post on the paper. This will help everyone next week when Doug gets deeper into it. Stay tuned!

Agreed, but it isn’t the only important paper in this week’s Nature:

Pete Dunkelberg said:

I’m glad Nick made this ‘starter’ post on the paper. This will help everyone next week when Doug gets deeper into it. Stay tuned!

I find quite compelling in its own right the paper from Derek Briggs and his colleagues, announcing the discovery of a Burgess Shale Fauna from the lower Ordovician of Morocco as noted here:

http://www.nature.com/nature/journa[…]ure09038.pdf

Had heard him give a terse summary of this at a private talk he gave in New York City a few weeks ago. I almost fell out of my seat when he said they found an Ordovician Anomalocaris.

This has some interesting implications not only with respect to long-term morphological stasis, but also with respect to long-term stasis of an ecology as represented by the Burgess Shale type fauna.

did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual?

I think that would in fact be the general belief even now. That’s where the notion of a “missing link” came from: that one individual who bridges the entire current gap between current species. It’s that misguided creationist belief that evolution means you have the birth of the “first” man who then has to wait for the birth of the “first” woman so that they can start the human race. Or something like that.

Even recently, wasn’t there a study looking for and finding a probably genetic “Eve” and a genetic “Adam” for the human race? If I recall, they were separated by several tens of thousands of years, which seemed kind of odd.

I consider myself to be fairly well educated (for someone without a PhD), but up until a few years ago (think Dover) if you had asked I might have said there was a single individual common ancestor. Once you all have pointed out why this isn’t the case, it becomes “obvious” you need an evolving population, but that doesn’t appear to be the “common” understanding.

I don’t think Mike was disagreeing with me, just applying the same logic further back. And he’s right, it’s probably populations all the way back to chemicals. Especially if you take the direction of recent OOL work which suggests self-replicating RNA-like molecules might be more likely to emerge from a collection of short sequences than from a single sequence…

Scott – interesting points. The molecular Adam/Eve stuff was based specifically on mitochondria DNA (passed only through the mother) and Y-chromosome (passed only through the father, most of it doesn’t recombine with the X). Each non-recombining marker does indeed trace back to a single individual – this is called “coalescence”. But the single individuals lived tens of thousands of years apart from each other, and each was living in a population at the time. And the thousands of recombining regions in the rest of the genome (the autosomes) each have their own histories, with few/none of them the same as the mtDNA or Y-DNA.

I see a problem here. A randomized protein, even with preserved amino acid frequencies, is a major assumption. If we suppose that god wanted the protein to function, and to have the same function (more or less) in the different organisms, that’s not a good model. Presumably a small proportion of the randomized sequences would have the appropriate function.

This is a slightly strange way to phrase it. The sequences are the data, they don’t change in this study. The question is which model confers the highest probability on the fixed, observed data. So I guess what you are saying is that there might be a model in which God, or any process independently generating the sequences, is operating under functional constraints, and this might increase the likelihood of hitting similar sequences twice, over a model in which the similarity is generated by chance.

I agree in a sense, but it’s got to be a small effect except in cases where sequence similarity is already low and/or the sequences under consideration are short. We have lots and lots of evidence that wholly dissimilar sequences and even structures can perform the same molecular functions. See e.g.:

http://www.ncbi.nlm.nih.gov/Complet[…]Enzymes.html

M.Y. Galperin, D.R. Walker and E.V. Koonin Analogous enzymes: Independent inventions in enzyme evolution, published in 1998 in Genome Research 8: 779-790.

M.V. Omelchenko, M.Y. Galperin, Y.I. Wolf and E.V. Koonin (2010). Non-homologous isofunctional enzymes: A systematic analysis of alternative solutions in enzyme evolution, published in Biology Direct 2010, 5:31.

That said, Theobald’s main research is in fact exactly on detecting remote homologs, distinguishing convergence from homology in these kinds of remote cases, etc. So I’m sure he’ll have more to say…

Scott said:

did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual?

I think that would in fact be the general belief even now. That’s where the notion of a “missing link” came from: that one individual who bridges the entire current gap between current species.

That isn’t where the idea of “the missing link” came from. In Darwin’s day a criticism of him was that there was no fossil connecting humans with apes (or as we would now say, other apes). This missing link was filled in 1891 when Dubois found what we now call Homo erectus, and also in 1924 when Dart found Australopithecus.

There was never any assumption that this was a single individual, what was missing was a species.

The phrase “missing link” is well known, but people don’t remember what it was, and, alas, are unsure whether it is still missing – and that’s why the disgraceful publicity around the fossil “Ida” (Darwinius masillae) was able to blatantly misuse the phrase “The Link”.

Creationists have, for quite a while, been saying that similarity between living things is an indication of a common designer. As if the pattern of similarities and differences reflected in taxonomy could be reduced to “all living things are similar”.

As a practical matter, I think that if one could convince creationists of common ancestry just within the order Primates over a few tens of millions of years, that would be the end of creationism. There would be a few holdouts for separate creation/design of larger taxa, or about supernatural intervention hundreds of millions or billions of years ago, but there would be no general public interest in that.

TomS said:

As a practical matter, I think that if one could convince creationists of common ancestry just within the order Primates over a few tens of millions of years, that would be the end of creationism.

Good point. They don’t care about the common ancestry of all life, or about whether there are separate kinds. What they care about is that humans were separately created. If you’re trying to refute creationism, human evolution is the topic to concentrate on. For one thing, closely related sequences are much easier to deal with than distantly related ones; it’s a slam dunk. Of course that assumes the creationist in question is amenable to evidence and reason, which is a big assumption.

So perhaps creationism wasn’t the main target of Doug’s paper?

I have a question on the Zuckerandl and Pauling quote, and I’m not sure whether it bears on the paper or not (I don’t have a subscription to Nature, so I’m going on what’s written here). I was under the impression that modern ‘organismal biology’ phylogenetic trees incorporated genetic data - i.e. are sometimes corrected based on the genetic data - so the two tree types are no longer independent but rather dependent factors. This would greatly increase the probability of the two matching regardless of whether the model is correct. Is this a relevant issue? If so, how was it addressed?

Actually the term “missing link” is older than its most famous usages in hominid paleobiology, but you’re absolutely correct demonstrating how it has been abused and misused by many people, not just journalists, other members of the lay public, and creationists, but even reputable scientists, as the recent incident with Darwinius clearly demonstrates:

Joe Felsenstein said:

Scott said:

did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual?

I think that would in fact be the general belief even now. That’s where the notion of a “missing link” came from: that one individual who bridges the entire current gap between current species.

That isn’t where the idea of “the missing link” came from. In Darwin’s day a criticism of him was that there was no fossil connecting humans with apes (or as we would now say, other apes). This missing link was filled in 1891 when Dubois found what we now call Homo erectus, and also in 1924 when Dart found Australopithecus.

There was never any assumption that this was a single individual, what was missing was a species.

The phrase “missing link” is well known, but people don’t remember what it was, and, alas, are unsure whether it is still missing – and that’s why the disgraceful publicity around the fossil “Ida” (Darwinius masillae) was able to blatantly misuse the phrase “The Link”.

Scott said:

did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual?

I think that would in fact be the general belief even now. That’s where the notion of a “missing link” came from: that one individual who bridges the entire current gap between current species.

I think the big issue is that fossils usually come in units of one individual, so when an important transition is found it’s easy to point to the head on the table and say “We’ve found the link. There he is.” rather than the far more accurate “Here’s an individual from the transitional population”.

TomS wrote:

“Creationists have, for quite a while, been saying that similarity between living things is an indication of a common designer. As if the pattern of similarities and differences reflected in taxonomy could be reduced to “all living things are similar”.”

Exactly. That is why it is so important to point out that not only are thing “similar” but they are similar in a very specific way. That is that they are similar in exactly the way predicted by descent with modification. Many of these similarities do not make any sense whatsoever from a design perspective. The precise types of similarities and the precise pattern of similarities are what is important, not some nebulous “god could have done it that way so common design” similarity.

I’ve been convinced for some time of a single common ancestor for all living things. This raises some interesting speculation. If the conditions were right for the origin of the common ancestor, perhaps the conditions were right for origin or other potential common ancestors. Was it a matter that they were out competed by the common ancestor? Or were conditions for the origin of the common ancestor so rare and unusual that no competitors originated? This has some connection to speculation about origin of life elsewhere in the universe. If it is difficult and unusual, there may not be much life off earth, If it is common and easy, there may be living things all around.

So perhaps creationism wasn’t the main target of Doug’s paper?

I agree it wasn’t the main target, I may have given that impression. It is a subsidiary target which is conveniently hit by this, I’d say. The main targets are e.g. LGT-means-no-common-ancestry positions. And the general notion that common ancestry is an assumption rather than a testable theory.

“I’ve been convinced for some time of a single common ancestor for all living things. This raises some interesting speculation. If the conditions were right for the origin of the common ancestor, perhaps the conditions were right for origin or other potential common ancestors. Was it a matter that they were out competed by the common ancestor? Or were conditions for the origin of the common ancestor so rare and unusual that no competitors originated?”

Like I said, it could be populations all the way down. It is important to realize that even without competition/selection, in any replicating population, one replicator will eventually take over the entire population. Google “coalescence”. Throw in selection and you get even more of this. This kind of selective sweep probably happened umpteen times in the gradual origin of replicating sequences and then cells and then the LUCA. The only significance of the LUCA is that it represents just the *last* time this happened on a global scale. (but different genes, etc., might have done it differently, producing different trees, but nevertheless there is good evidence that a bunch of genes did this, as Theobald shows)

stevaroni said: I think the big issue is that fossils usually come in units of one individual, so when an important transition is found it’s easy to point to the head on the table and say “We’ve found the link. There he is.” rather than the far more accurate “Here’s an individual from the transitional population”.

My two cents on the whole missing link verbiage is that calling a single fossil a missing link was not inaccurate when we didn’t know much, but now its just confusing. Lucy was a missing link because it was the first scientific discovery of an ancient hominid. More hominids help us understand how hominids evolved but that they evolved has already been established.

Its like this: I hypothesize a bronze age civilization lived in the valley of Stevaroni, for which I have no evidence. Then I find a tomb clearly indicating bronze age settlement. That could be fairly called the missing Stevaronian link. But then I find tons more evidence. Is tomb #254 the missing link? Well, no. It helps me understand how Stevaronians lived, but I’d already confirmed that they lived. No additional evidence is going to serve as the missing link to prove that they lived; its already been done.

I’d suggest that the first couple hominid discoveries like Lucy could fairly be called (no longer) missing links. They confirmed that hominids have an evolutionary history the same as every other animal. Everything else, as the saying goes, is just commentary.

John Harshman said: It’s not explicitly claimed to be a test of separate creation, but everyone is going to take it that way. Given that, what’s the proper model for separate creation?

Just a few comments before I plunge back into finals-land – this is adapted from an email I sent to Todd Wood, who was the first to blog on my paper:

It would be quite difficult, I think, to engage and publish an analysis such as this without keeping in mind what creationists would think. I am, admittedly, more aware than many regarding the myriad creationist viewpoints. Nevertheless, creationists are not my “primary target”.

First, no person is the target of this analysis – rather, the “targets” are hypotheses. It is easy to sit back, after the fact, and rationalize the results, and claim that they are expected. However, when I set out to do this analysis, I honestly did not know what the data would say. There are good reasons, for example, to think that a phylogenetic test such as this could give a very different result than a simple pairwise BLAST-type E-value test (see Section 4 of the Supp Mat, esp. 4.2 and 4.3). Hence there is no “target” even – I’m simply exploring hypotheses. I would have been delighted to find evidence for multiple ancestries.

Second, my major concern is in fact more-or-less what John Wilkins has suggested: that is, what is the influence of HGT and symbiotic fusion events on the reasoning leading to the conclusion of common ancestry? Carl Woese, Mike Syvanen, and Craig Venter, among others, have all suggested in one form or another that common descent may be problematic, due to rampant HGT among early life and/or microbes. There has been confusion about exactly what they mean, since most of these comments regarding common ancestry are pretty cursory, lacking detailed argument and definitions.

One of the main things I am trying to do is to show that (A) common ancestry, (B) the origin of life (e.g., how many origins), (C) HGT and symbiotic fusions (tree of life vs web and/or ring) and (D) the “root” of the tree/web are all separate, and mostly independent, questions. Disproving, say, the “tree of life” hypothesis does not necessarily disprove common ancestry (though the arguments and evidence for one may have relevance for the other). I personally find these questions many orders of magnitude more interesting (as does the scientific community at large) than whether, say, the different orders of animals each have an independent origin 6000 years ago – an idea which, given modern evidence from all of science, not just biology, is patently absurd.

So, on a related note, I am not explicitly testing Woese’s “genetic annealing” hypothesis, whatever that may entail. Rather, I’m testing his assertion that, because of HGT etc., “The time has come for Biology to go beyond the Doctrine of Common Descent.”

Third, if my main target really was creationists, then disproving the independent ancestry of humans would be enough, right? The rest of my analysis would be inconsequential. I don’t think any creationist cares one lick whether the Archaea and Bacteria share ancestry or not. That said, if you look at the last row of Tables 1 and 2, I do consider the hypothesis that humans have an independent ancestry from the rest of life. From my analysis, that hypothesis is roughly 106,100 times less probable than universal common ancestry. So you can consider that particular part as a nod towards testing at least one version of a special creation hypothesis. Of course that’s not the only way to interpret those models.

Cheers,

Douglas

Jim Thomerson said:

I’ve been convinced for some time of a single common ancestor for all living things. This raises some interesting speculation. If the conditions were right for the origin of the common ancestor, perhaps the conditions were right for origin or other potential common ancestors. Was it a matter that they were out competed by the common ancestor? Or were conditions for the origin of the common ancestor so rare and unusual that no competitors originated? This has some connection to speculation about origin of life elsewhere in the universe. If it is difficult and unusual, there may not be much life off earth, If it is common and easy, there may be living things all around.

Since an early replicator or proto-life form would probably be relatively simple compared to what we see today, there is good reason to think that it was born in an energy cascade where it got shuttled into a somewhat more benign environment in which it could stabilize.

That would not preclude the building of a population or the repeated construction of various workable systems, some of which began to replicate once in a suitable environment.

But clearly the formation and stabilization processes could not occur where the system formed was immediately broken up within the same energy (temperature) ranges. The formation process has to cascade down in energy range in order for a system to remain stable and for any processes of synchronization and coordination within the system to take hold.

Scott said:

did anyone ever think that the “common ancestor” of e.g. humans and chimps was a single individual?

I think that would in fact be the general belief even now. That’s where the notion of a “missing link” came from: that one individual who bridges the entire current gap between current species. It’s that misguided creationist belief that evolution means you have the birth of the “first” man who then has to wait for the birth of the “first” woman so that they can start the human race. Or something like that.

Even recently, wasn’t there a study looking for and finding a probably genetic “Eve” and a genetic “Adam” for the human race? If I recall, they were separated by several tens of thousands of years, which seemed kind of odd.

I consider myself to be fairly well educated (for someone without a PhD), but up until a few years ago (think Dover) if you had asked I might have said there was a single individual common ancestor. Once you all have pointed out why this isn’t the case, it becomes “obvious” you need an evolving population, but that doesn’t appear to be the “common” understanding.

I once heard the common ancestor/eve position explained in the following way, and I see no objection to it:

Take all humans living on earth at this instant - about 6.7 billion individuals - and call this population G0. Now take the population of all the mothers of the individuals in G0 and call it G-1. Note that some (many) individuals will belong to both G0 and G-1 and G-1 will contain some individuals not in G0, I. e., now dead mothers of some of the G0 individuals, but that has nothing to do with this argument. G-1 is all females (mothers) and is smaller than G0 since G0 will have many sets of siblings with common mothers. Continuing, G-2 is the set of mothers of all the individuals in G-1 and will be smaller than G-1 because of the sets of siblings (actually sisters since we are now just considering females) in G-1. We continue this process to G-N whose size will depend on the average number of sisters in each preceding population.

Given that family sizes were larger in the past it seems reasonable to assume that the average number of sisters was greater than two, but for the sake of argument let’s assume that it is two, i. e., G-N = 1/2 G-(n-1). Each group of mothers (grand grandgrand.…grandmothers of the initial population) is half the size of the preceding group, their daughters. Going back 10 generations (or about 200 years ago) we find that all 6.7 billion humans alive at this instant had a total of 6.7 million grand(^10)mothers, where grand(^10) is my clumsy way of writing 10 consecutive ‘grand’s. That does not mean that there were only 6.7 million women on earth at that time or just 6.7 million women who bore daughters. There most certainly were many times that number but all but 6.7 million of them had no daughters or had daughters who had no daughters or.…etc. That is only 6.7 million of those females in the populations containing G-10 have descendants that are alive at this instant. All the rest represent a lineage that has died out.

If we carry this process back far enough we get a single female who is ancestral (the grand(^Q))mother or ancestral eve of us all). She will have lived in a large population but all the other females of that population have no currently living descendants (ain’t life a bitch!).

Note that this argument can apply to any grouping you want to devise, i. e., all currently alive Texans plus all squirrels the lived in England in the 19th century plus Alexander the Greats favorite horse, Beaucephalus.

Is this argument valid?

sorry eric, but Lucy wasn’t the first scientific discovery of an ancient hominid. For you to assert that, then you would be ignoring the pioneering work of Raymond Dart (who found the first australopithecine fossil in South Africa), the Leakeys and quite a few others:

eric said: My two cents on the whole missing link verbiage is that calling a single fossil a missing link was not inaccurate when we didn’t know much, but now its just confusing. Lucy was a missing link because it was the first scientific discovery of an ancient hominid. More hominids help us understand how hominids evolved but that they evolved has already been established.

I think I talked with this man on the Dawkins forums. two points. First biogeography is a friend to YEC. The migrations from the ark filling the earth all work fine. In fact otherwise it seems to be a chaos of migration. Its creationist doctrine that all life comes from a common blueprint. Even kinds are just a twist on the blueprint. Everything has eyes, ears, legs, head. It seems clear that there is a thinking being behind such organization and it seems that if evolution was true diversity and great happanchance would make creatures so wildly different looking. The sameness of creatures inside and out and like sameness with all biology suggests simple plans that lead to logical diversity. The sameness and fewness of creatures shows a poverty of evolution but fits limited kinds coming from simple, relative, basic plans from a general blueprint. Perhaps models on what the biology should look like from billions of years and millions etc of selections should be made. The world today and in fossil seems rather bare and simple from what otherwise evolution would predict.

Leave it. You couldn’t ask for better.

Apparently not.

John Harshman said:

If this thread isn’t dead yet (though it was coughin’ up blood last night) I would like to return to the topic with a few questions.

It’s pretty clear how a scrambled sequence order can produce a data set that prefers separate trees to a single one, since that would eliminate an effectively infinitely long branch. But I don’t understand how the unscrambled data set prefers a single tree to separate ones. How can a part of the tree, and two trees are just a single tree minus one branch, have a lower likelihood than the whole tree? I can see how a model selection critierion like AIC might prefer the single tree, given the extra free parameters in the separate trees, but I don’t see how the raw likelihood scores can come out better for the single tree.

Second quetion: I don’t understand how a single sequence (H, for example) can have a likelihood score.

Is anyone still reading?

Proteins highly conserved across the three domains of life were chosen for the study. My understanding is that when the tree is broken into discrete units the correlations that previously existed between separated units were lost. This caused the UCA model to fit the data better than the multi-ancestry models.

0112358 said:

Proteins highly conserved across the three domains of life were chosen for the study. My understanding is that when the tree is broken into discrete units the correlations that previously existed between separated units were lost. This caused the UCA model to fit the data better than the multi-ancestry models.

Thanks, but I still don’t understand why this should be so. You don’t seem to have addressed my questions. The two separate trees should be merely subtrees of the combined tree, and it seems to me that a subtree of the full tree, even a disjunct one, should actually have a higher likelihood than the combined tree, just becuase in the former case there’s a branch you don’t have to include transformations for, and all the other branches should be the same. Or so it seems to me.

John Harshman said:

Thanks, but I still don’t understand why this should be so. You don’t seem to have addressed my questions. The two separate trees should be merely subtrees of the combined tree, and it seems to me that a subtree of the full tree, even a disjunct one, should actually have a higher likelihood than the combined tree, just becuase in the former case there’s a branch you don’t have to include transformations for, and all the other branches should be the same. Or so it seems to me.

For example, select proteins A, B, C and D because they are highly conserved. When A, B, C and D are in one tree you have correlations between AB, AC, AD, BC, BD and CD. If you divide into two trees AB and CD you only have correlations between AB and CD. The other correlations are lost.

You are merely repeating your previous statement. And I don’t see how that affects the likelihood calculation for a tree. The likelihood is the product of a great many probabilities of state transformations between nodes. Separate trees have all the same nodes as a single tree, minus two. The probabilities for the shared internodes should be the same in both cases. A combined tree would divide two internodes and add another. Adding an internode should reduce the likelihood, since its transition probablities are additional multipliers.

Can you explain this?

One potential explanation is that the two broken internodes (that is, the two branches by which the separate trees are attached) have significantly higher transition probabilities when broken than when unbroken. Is that it, perhaps?

John Harshman said:

One potential explanation is that the two broken internodes (that is, the two branches by which the separate trees are attached) have significantly higher transition probabilities when broken than when unbroken. Is that it, perhaps?

Out of fear of the blind leading the blind I’ll end here!

I think what Cornelius Hunter fails to understand is that, for the cases where Doug Theobald is evaluating sets of models which are disjoint and complete, his numbers are not just relative likeliness but absolute likeliness. The argument that they are only relative requires the existence of unaccounted-for alternatives.

For example, when testing “humans share ancestry with the rest of life” versus “humans have separate ancestry from the rest of life”, there is no other possibility. The absolute likelihoods of the two models must sum to 1. Now, it is certainly possible that some other model could be better - for example, it could theoretically have been the case that the model “humans and other primates have separate ancestry from the rest of the tree of life” was more likely than either of the above - but in that case the complementary model would be “humans and other primates share ancestry with the rest of life”.

The key thing here is that adding constraints to a model can only lower its likelihood. So, once you have established that “humans have separate ancestry from the rest of life” has likelihood no greater than 10^-3000 with no additional constraints, you have also shown that ALL models which include that as one constraint (and have other constraints as well) have likelihoods which are no greater. Thus, NO creationist model which includes the idea that humans were created separately can be any better than that.

Howard A. Landman said:

I think what Cornelius Hunter fails to understand is that, for the cases where Doug Theobald is evaluating sets of models which are disjoint and complete, his numbers are not just relative likeliness but absolute likeliness. The argument that they are only relative requires the existence of unaccounted-for alternatives.

For example, when testing “humans share ancestry with the rest of life” versus “humans have separate ancestry from the rest of life”, there is no other possibility. The absolute likelihoods of the two models must sum to 1.

First, who is Cornelius Hunter? Whoa, he’s all the way back on page 2. Try to provide some kind of context here.

Second, I don’t think your main claim is correct at all. Let’s remember that Theobald isn’t evaluating the likelihoods of the trees at all. He’s evaluating the likelihood of the data, given the model (which includes the tree). These likelihoods would only sum to 1, for a particular model (including a particular tree), if integrated over all possible sequences. And that would tell you nothing at all, certainly nothing interesting.

John, I’d like to hear Doug’s view on that. It seems to me that Bayes’ Rule is sufficient to transform what was said in the paper into something very close to what I said.

Specifically, consider two complementary hypotheses like (A) “Humans had no common ancestor with any of the 11 other species considered in the paper” and (B) “Humans had a common ancestor with at least one of the 11 other species considered in the paper”, and let D stand for the data. We obviously must have P(A) + P(B) = 1 and P(A|D) + P(B|D) = 1, and also by Bayes Rule P(A|D) = P(D|A)P(A)/P(D) and P(B|D) = P(D|B)P(B)/P(D).

What Doug gives us is essentially P(D|A) and P(D|B). And yes, those don’t add up to 1, nor did I ever say they did. But if you make the maximum entropy assumption that the a priori probabilities of A and B (ignoring the data) are equal, then we have P(A) = P(B) = 0.5. Given that, then P(A|D)/P(B|D) = P(D|A)/P(D|B), and hence P(A|D) = P(D|A)/(P(D|A) + P(D|B)) and P(B|D) = P(D|B)/(P(D|A) + P(D|B)) by simple normalization.

That was my claim: that you can calculate the absolute probabilities of the hypotheses corresponding to the models from what’s given in the paper with only a few, reasonable, assumptions. You still see something wrong there?

Howard A. Landman said:

Specifically, consider two complementary hypotheses like (A) “Humans had no common ancestor with any of the 11 other species considered in the paper” and (B) “Humans had a common ancestor with at least one of the 11 other species considered in the paper”, and let D stand for the data. We obviously must have P(A) + P(B) = 1 and P(A|D) + P(B|D) = 1, and also by Bayes Rule P(A|D) = P(D|A)P(A)/P(D) and P(B|D) = P(D|B)P(B)/P(D).

What Doug gives us is essentially P(D|A) and P(D|B).

No, he doesn’t. You have mistaken A and B. Theobald’s likelihoods are each for the data given one particular tree and one particular model of protein evolution. To get what you want, you would have to sum probabilities over all trees that fit your requirements and over all reasonable models. I’m not sure even that would be a valid summation; I suspect the whole thing wouldn’t sum to 1.

And yes, those don’t add up to 1, nor did I ever say they did. But if you make the maximum entropy assumption that the a priori probabilities of A and B (ignoring the data) are equal, then we have P(A) = P(B) = 0.5.

That seems a silly prior, if you ask me.

That was my claim: that you can calculate the absolute probabilities of the hypotheses corresponding to the models from what’s given in the paper with only a few, reasonable, assumptions. You still see something wrong there?

Yes. I’m not a statistics expert, just an old country systematist, but I know what a maximum likelihood analysis is telling you. We choose the tree that maximizes the likelihood of the data. It happens to make the data much, much more likely than any other tree. (Though I still don’t understand how that can happen, since I don’t understand how the human sequence, all by itself, can have a likelihood at all, and I don’t understand how a partial tree can be less likely than a complete tree.) Only by picking indefensible numbers for prior probabilities can you claim to have determined the posterior probabilities. What you might be able to say is that common ancestry has a much higher probability over a very wide range of priors. But I prefer to leave it at what the analysis says: the data have a much better likelihood under one model than under the other.

I agree that the paleontological and other data should give common ancestry a very much better than 50% a priori likelihood. I was trying to be (perhaps unfairly) fair to the creationists. When you have a 3000 orders of magnitude advantage, you can afford to be generous.

It is true that the a priori probabilities P(A) and P(B) directly affect the final probabilities. But it doesn’t seem to matter much whether we start believing that common ancestry is 99.99999999% likely or 0.00000001% likely; the end result after applying 10^3000 from the protein evidence is that we either think UCA is 10^3010 times more likely, or that it’s 10^2990 times more likely. Either one is completely overwhelming odds.

I see your point about a single tree not covering all possibilities. However, the two trees Doug compares are (usually) identical except for a single disconnection in one. Thus it seems to me that the relative likelihoods mostly give information about that particular connection or disconnection, and not much about the rest of the tree (which should apply roughly equally to both models, and hence cancel out). I would expect, for example, that if you rearranged a few other branches of the tree the same way in both models, without anything jumping from one subtree to the other, then the relative likelihood would be nearly unchanged. This is easily testable, and if true would mean that the “sum over all trees that fit your requirements” would just give roughly the same answer we already got from the first tree, regardless of what relative weighting we gave those trees in the sum.

The presence or absence of a single edge on the graph corresponds pretty directly to the kind of complementary hypotheses I proposed. So, while I acknowledge that there are some gaps in my chain of reasoning that are not (yet) completely rigorous, it still seems unlikely to me that any of these would throw off the conclusion by very much.

I think you are straying quite far from your original claim here. Would you agree? In fact you have gone from “absolute probability” to “relative probability”, though you think the relative probabilities could be used to estimate absolute probabilities. That may be true, but each new assumption (and you have at least two rounds of them so far in response to my objections) adds another layer of uncertainty. You also fail to consider other alternative trees that divide the taxa into two or more groups.

Also, where are you getting this figure of 10^2990? According to the supplementary information, the log likelihood of ABE is -126,299 while the log likelihood of ABE(-H) + H is -140,339, which would mean that the difference in log likelihoods is nearly 14,000, or a factor of 10^6000. I will note that the log likelihood of ABE(-H) is -121,200, while H is -19,139. So just about the whole difference here lies with H. How does anyone come up with a likelihood for H? That’s one thing I can’t figure out.

About this Entry

This page contains a single entry by Nick Matzke published on May 12, 2010 5:23 PM.

Neandertal! was the previous entry in this blog.

Why science literacy is in trouble is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.381

Site Meter