An argument is ORFaned

| 74 Comments | 2 TrackBacks

Paul Nelson has a “new” argument against common descent. It revolves around the discovery of ORFans, “orphan Open Reading Frames”, ie stretches of DNA that appear to code for a protein (an Open Reading Frame, ORF), but that we have no current idea of what the protein is or does, or what other proteins it is related to (hence ORFan). A powerpoint presentation from one of Dr. Nelson’s talks that mentions ORFans is here. ORFans also loom large in Dr. Nelsons rather forceful commentary on a post by Sahotra Sarkar describing a debate between them.

Are ORFan’s a significant problem for evolution? No, not in the least. The ORFan story, while still not completely understood, represents a good example of how science works, and why it’s a good idea to actually understand evolutionary biology before you criticise it (and why it’s a good idea to not stop reading in 2003).

Paul Nelson thinks that ORFans are a problem for common descent as they represent “discontinuities” in common descent.

Orphan genes – open reading frames with no detectable similarity to any other known sequence – constitute a surprisingly high percentage of the genomes of fully-sequenced organisms.

According to the Theory of Common Descent, all proteins are derived from other proteins, and ultimately from the minimal set present in the LUCA [Last Universal Common Ancestor], by descent-with-modification relationships (e.g., gene duplication).

There are two claims here: 1) ORFans have no similarity to other sequences and 2) Common descent assumes all (or a very high proportion) of current proteins all originated with the LUCA.

Claim 1 is deeply misleading and claim 2 is wrong. We fully expect a reasonable proportion of new genes to be generated de novo during evolution. We even have examples of proteins that are so generated. The most famous of these is the nylonase gene, which allows bacteria to metabolise the artificial polymer nylon. This was produced by a mutation in a piece of non-coding “Junk DNA” which generated a transcribable protein (Okada et al, 1983). The sperm-specific dynein intermediate chain gene was generated by a fusion mutation between two genes (so strictly speaking it falls under the gene duplication rubric), but the coding region of the new Sdic gene is generated from the non-coding intronic regions, so protein homology studies would have a hard time identifying it (Nurminsky et al 1998). Formation of new genes poses no problem for evolutionary biology or common descent.

Lets look at claim 1 in more detail. Nelson gives the impression that these are all single genes with no relation to any other genes. In fact many ORFans actually come in families, and many genes with no apparent relationship to other genes at the time of discovery have often had relatives found after a while. In my own signal transduction field, a coding sequence originally thought to be an ORFan was finally identified as being opioid-receptor like, and its ligand found (called ironically Orphainin), and is now a drug target for analgesics. There are many instances where ORFans have been found to be related to extant genes. When H. influenzae was first sequenced, 64% of its ORF’s were ORFans, now, only 5.2% are.

There are of course things called singleton ORFans. Unique genes that do not currently seem to be related to existing genes per se. In prokaryotes, something like 14% of all bacterial genes are currently singleton ORFans, but this may be expected to decrease as we sequence more genomes (as with H. influenzae). Also we may be missing some related proteins, as ORFans may have diverged so much during evolution we can’t currently identify their nearest relatives. Improved detection algorithms, against a background of improved gene databases, will reduce the number of ORFans.

I’ll remind you again that as well as these singleton ORFans, there are ORFans that are limited to closely related organisms, and ORFans that are found in families of organisms (just as if they were related by *gasp* common descent).

Lets look at Nelsons treatment of this in a little more detail. He quotes Siew N, Fischer D (Twenty thousand ORFan microbial protein families for the biologist? Structure. 2003 Jan;11(1):7-9) a three page mini-Review.

“The Total Number of ORFans in Microbial Fully Sequenced Genomes Continues to Grow (Fig. 1, Siew and Fischer 2003, p. 8)”

Unfortunately, he ignores or overlooks the full paper from this group (Siew N, Fischer D. Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins. 2003 Nov 1;53(2):241-51.)

From Siew and Fischer, Proteins: “We have shown that the number of ORFans is currently growing, whereas their fraction among ORFs is slowly diminishing.”

ie as you sequence more genomes, you find more relatives. This is what you would expect on the basis of evolutionary theory, and does not represent a problem for common descent (remember H. influenzae going from 64% of its ORF’s being ORFans, to only 5.2% now as more genomes were added, this is the universal pattern for all organisms).

Again he quotes Siew and Fischer minireview:

“If proteins in different organisms have descended from common ancestral proteins by duplication and adaptive variation, why is it that so many today show no similarity to each other?”

But they also provide answers to their own questions, they say “There are two noteworthy observations about our ORFan database. The first is that over half of the ORFans are shorter than 150 residues. Possible explanations for this bias could be that some of the shorter ORFans may not correspond to expressed proteins [27], or that their abundance is a result of a limitation of computational sequence comparison; it is harder for current tools to detect sequence similarity for short sequences.”

In the larger paper, they also say:

“It is probable that some of the short ORFs are the result of random distributions of nucleotides, or of sequencing errors that lead to frame shifts and to wrong stop codons”

and again:

“Another possible reason for the abundance of short ORFans could be technical: It may be more difficult for sequence comparison programs such as BLAST22 to find significant matches for shorter sequences (see the work of Mackiewiez et al.26 for yet another possible explanation).” And there is evidence that a significant fraction of ORFans represent unrecognized divergen proteins (see below). They also say “ORFans may correspond to highly divergent sequences that actually belong to known families (but are beyond recognition capabilities of current tools),2 or to sequences that correspond to new, unique, single-member families.”

Nelson also cites Siew N, and Fischer D, (Unravelling the ORFan puzzle, Comparative and Functional Genomics 2003, 4, 432 – 441.) in his blog entry as evidence of “A world class puzzle” (Siew and Fischer actually say it “entails interesting evolutionary puzzles”). But he ignores their analysis

The authors say this:

“We propose the following model to explain the origin and abundance of ORFans and PCOs, which is somewhat consistent to the models discussed above. Many ORFans may have been generated as the result of a number of possible evolutionary events, which may include horizontal transfer, rapid evolution and gene-loss. ORFans (and other ORFs) without selection pressure have been deleted throughout microbial deletion mechanisms, and thus, microbial genomes are kept at ‘reasonable sizes’ [43]. ORFans that have retained or acquired an important function are kept, thus creating new sequence families with a seed of a single ORFan.”

Again, while we have no definitive answer to ORFans, they represent no threat to common descent, and we have several entirely reasonable explanations (that are proposed in the very publications Dr. Nelson cites).

Lets summarise the main explanations: 1) Some ORFans may be artefacts. 2) Some ORFans may have relatives, but we haven’t sampled enough genomes yet. 3) Some ORFans may have relatives, but our tools aren’t good enough to detect these relatives yet. 4) Some ORFans may be de novo generated proteins.

Now that was the state of play in 2003 (and remember, there was evidence for these explanations even then). Unsurprisingly, the field has moved on a bit and these explanations have been tested. Incidentally, these explanations were not pulled out of thin air, but had supporting evidence. Dr. Nelson doesn’t mention these explanations in the powerpoint slides. Lets look at the explanations and some recent evidence.

1) Some ORFans may be artefacts: As noted above, many ORFans are very short, 100-150 codons long. It is likely that many of these represent database or annotation errors. Also, in any genome, one would expect some random ORFs being formed. Fukuchi S and Nishikawa K. (Estimation of the number of authentic orphan genes in bacterial genomes. DNA Res. 2004 Aug 31;11(4):219-31, 311-313.) closely examined sequences and estimated that about half of all short ORFans are sequencing or other errors.

2) Some ORFans may have relatives, but we haven’t sampled enough genomes yet. While we have something like 150 complete bacterial genomes sequenced, there are many, many more bacteria that are not yet sequenced, and will have genomes quite divergent from the human pathogens that form the majority of current sequences. This will be especially important as horizontal transfer from a distantly related bacteria that has not been sequenced will look like an ORFan (until that distantly related bacteria is sequenced). A recent paper shows that many E. Coli ORFans are the result of horizontal gene transfer from bacteriophages (Daubin and Ochman, 2004; bacteriophages are viruses, which is why they don’t turn up in bacterial database comparisons).

3) Some ORFans may have relatives, but our tools aren’t good enough to detect these relatives yet. Siew and Fischer, not content to rest on their laurels having posed an interesting puzzle, have tried to solve on aspect of it. Using improved fold recognition software, and a larger database of fold family structures, they have found that in Bacillus sp, some related ORFans are members of the of the alpha/beta hydrolase superfamily, and most likely derive from the haloperoxidases (Siew et al., 2005).

So evolutionary biologists have proposed a puzzle, suggested solutions to that puzzle, tested these solutions and largely confirmed them. Testing is by no means over yet, but all the evidence so far confirms that ORFans pose no threat to evolutionary biology. Indeed, if a large proportion of non-artefactual orphans are due to horizontal transfer from bacteriophages, as recent experiments suggest (Daubin and Ochman, 2004), then they may prove to be a valuable tool in understanding the phylogeny of bacteria, in the same way that families of LINES, SINES and pseudo genes have been. Far from being a threat to common descent, the patterns seen of the nested hierarchies of singleton, lineage specific and family specific ORFans are those you would expect from common descent. Some (very small number of) ORFans are also going to be de novo generated proteins. Biology is quite happy with the generation of new genes, it’s a process we have seen and we don’t demand all proteins come from the LUCA.

In summary: Dr. Nelson has relied on some short review papers from 2003 to claim that ORFan genes are a threat to common descent. In fact, the data from these review papers, let alone the other research papers from this time, are fully compatible with common descent. To claim otherwise is disingenuous in the extreme. Papers published since these 2003 reviews were published have confirmed the major explanations for the origin of these ORFans, and supported the common descent model. Dr. Nelson would do well to examine recent literature, rather than selectively rely on old reviews. Even a cursory glance at the literature of the past two years would show that evolutionary explanations would suffice.

References: Daubin V, Ochman H. Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. Genome Res. 2004 Jun;14(6):1036-42. Fukuchi S, Nishikawa K. Estimation of the number of authentic orphan genes in bacterial genomes. DNA Res. 2004 Aug 31;11(4):219-31, 311-313. Nurminsky DI, Nurminskaya MV, De Aguiar D, Hartl DL. Selective sweep of a newly evolved sperm-specific gene in Drosophila.Nature. 1998 Dec 10;396(6711):572-5. Okada H, Negoro S, Kimura H, Nakamura S. Evolutionary adaptation of plasmid-encoded enzymes for degrading nylon oligomers. Nature. 1983 Nov 10-16;306(5939):203-6. Siew N, Fischer D. Twenty thousand ORFan microbial protein families for the biologist? Structure. 2003 Jan;11(1):7-9. Siew N, Fischer D. Analysis of singleton ORFans in fully sequenced microbial genomes.Proteins. 2003 Nov 1;53(2):241-51. Siew N, Azaria Y, Fischer D. The ORFanage: an ORFan database.Nucleic Acids Res. 2004 Jan 1;32(Database issue):D281-3. Siew N, Fischer D.Structural biology sheds light on the puzzle of genomic ORFans. J Mol Biol. 2004 Sep 10;342(2):369-73 Siew N, Saini HK, Fischer D. A putative novel alpha/beta hydrolase ORFan family in Bacillus. FEBS Lett. 2005 Jun 6;579(14):3175-82.

2 TrackBacks

Ian Musgrave has a good summary of genes appearing from non-coding DNA (ORFans) on the Panda's Thumb. I have written about ORFans here and here (dude's gotta link to himself sometimes). Ian's post is targeted at some claims made by... Read More

ORFans! from Pharyngula on April 26, 2006 7:00 PM

Paul Nelson has been twittering about ORFans for some time now—he seems to precede his talks by threatening to make us evolutionists tremble in our boots by bringing them up, but he never seems to follow through. Ian Musgrave got... Read More

74 Comments

Anyhow, shouldn’t something as “obviously designed” as the flagellum be the result of ORFan genes? Or don’t IDists care about making real predictions, correlating evidence in comprehensible ways?

OK, those sorts of questions aren’t even jokes anymore. “Intelligent design” is a set of disparate criticisms of real science, designed only to produce incoherence that they will claim is the result of “the designer”. That is to say, the best result that these guys can imagine is a biology which proves not to make any sense. Unfortunately for them, the vast majority of biology makes sense even now, and we have only cause (trends) to suppose that more will in the future.

ID arose at a most unfortunate time for its proponents, the time when the old legitimate evidence became ever more correlated with the new DNA evidence. Far from being an alternative science, ID appears to merely react against the gains in knowledge that are occurring. So Nelson ignores the publication of human chromosomes with their inevitable evidence regarding our evolution via mutations, chromosomal rearrangements, and selection, while seeking for incoherence that he can ascribe to his God.

In the doing, his God becomes increasingly senseless and chaotic. IDists tell us that DNA has so many correspondences throughout living organisms because they have the “same designer”, yet Nelson is claiming non-similarity as the actual evidence for this “designer”. Thus they cannot keep their stories straight, nor show that any general characteristic of life is entailed by positing a designer of the sort that is known to produce actual designs.

And their “research” remains a mere mining of the literature coming out of real science, to emphasize whatever remains unknown. Apologetics has moved a long way from finding God in the workings of consistent and regular patterns within the “natural world”.

Glen D http://tinyurl.com/b8ykm

Drosophila provide a nice system for studying gene gain and loss due to excellent taxonomic sampling for genome sequencing projects in the D. melanogaster species group. For instance, this paper reports novel genes in a couple of species that do not have homologs in closely related species. And check out this article (it’s just a news report, and it’s not on Drosophila) which reports some other discoveries of de novo genes.

We’d expect most ORFans would encode small proteins – the longer the gene, the lower the probability it evolves from scratch – if they are not Intelligently Designed. Guess what? ORFans tend to have short protein coding sequences. For the example of Acps (Drosophila sperm accessory gland proteins), they are rapidly evolving genes with either low selective constraint or under strong positive selection. And they’re tiny little proteins.

Syntax Error: not well-formed (invalid token) at line 5, column 139, byte 584 at /usr/local/lib/perl5/site_perl/5.16/mach/XML/Parser.pm line 187.

Perhaps someday the people making these arguments (ORFans, bacterial flagellum, etc.) will have it drilled into their heads by their fundamentalist supporters that what it all boils down to is, “God left fingerprints that our ancestors could never have know about, and we’re only now becoming clever enough to find them. Our ancestors had to rely on faith to know God, but we don’t need faith anymore since we have scientific proof that God exists.”

This is known, IINM, as an act of hubris.

If God was created by man out of a desire for order as reflected in the seasons, the observations of the regular movements of the planets, a need to add heavenly authority to Kings for social regulation, collect taxes, make war and keep priests in a style of comfort to which they would not normally be accustomed if they had to work like everyone else then Nelson’s projection is that of a deranged, disordered, useless incoherent misanthrope. An against mankind ad hominem attack on knowledge and truth.

A desperate clutching at tiny details while ignoring the overwhelming evidence of existing reality to push deluded misinformation for what purpose ? Oh that’s right, the more ridiculous the claim the more he makes.

Create a class of people who have been programmed to reject facts and tell them that the more they are told they are wrong the more THAT PROVES they are right. When you have a captive audience just print the biggest load of rubbish you can and sit back and collect the money.

None of my links worked above. Here they are:

NCBI Microbial Genomes page:

http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi

Glass et al. global transposon mutagenesis experiment in Mycoplasma (28 percent genes of unknown function among essential set):

http://www.pubmedcentral.gov/articl[…]did=16407165

Wilson et al. survey of trends in the discovery of ORFans (roughly linear increase in number, with no sign of leveling off:)

http://www.ncbi.nlm.nih.gov/entrez/[…]ubmed_docsum

The OrphanMine is a good place to go exploring:

http://www.genomics.ceh.ac.uk/orpha[…]han_home.php

Paul Nelson Wrote:

Ask yourself if Mycoplasma is likely to be the only prokaryote with 28 percent ORFans among its essential hardware.

Did he not read the post? Every time I read something like this from someone like Nelson I’m reminded of the quote from Saul Bellow: A great deal of intelligence may be invested in ignorance if the need for illusion is deep.

A suggestion to consider for those ORFans that remain after the above hypotheses are explored. (Actually, it somewhat overlaps the above.)

Viruses, especially RNA viruses, evolve so fast it can make your head spin (relatively and figuratively speaking, of course). As a result, there is no shortage of “ORFans” in viral genomes, and I suspect there’s no shortage of viruses we’re not even aware of. These guys could transduce their extremely exotic genes to their hosts in rare lateral gene transfer events similar, for instance, to those thought to account for intronless pseudogenes (integration into the genome of a reverse-transcribed RNA).

Gee, it’s too bad that by the time Paul’s, uh, magnum opus is finished, ID will already be dead, buried and long forgotten. Like YEC already is. (shrug)

But hey, Paul, there are a few questions I have for you that you seem not to have answered yet.

They’re at: http://www.geocities.com/lflank/nelson.html

Hey Paul Nelson, great to see you. I tried to tell these guys that ORFans had too much Ontogenetic Depth to have evolved, but they wouldn’t listen to me. LOL.

Paul Nelson Wrote:

You misrepresented my slide, however. I said that “all proteins are derived from other proteins, and ultimately from the minimal set present in the LUCA [Last Universal Common Ancestor], by descent-with-modification relationships (e.g., gene duplication).” That is not equivalent to saying that all proteins originated in LUCA.

Your statement is still wrong. Why can’t new proteins arise? We know that random polypeptides exhibit enzymatic activity, so there is no reason to believe that new proteins couldn’t have arisen from random sequences at times in evolution other than at the time of the LUCA. Gene duplication isn’t the sole possible source of novelty.

I’d also propose that some sequences could have had so many substitutions that even if they descended from a gene in the LUCA, they would not be recognizable as such.

I’d also propose that some sequences could have had so many substitutions that even if they descended from a gene in the LUCA, they would not be recognizable as such.

You mean like the zebrafish protein mentioned in a recent Science paper? Function was conserved, but it appears that the DNA sequence was not (I am not clear on the details, but that was the gist of it).

It’s sort that way with evolution–things change, and we still don’t know many of the particulars of how and why.

Glen D http://tinyurl.com/b8ykm

Paul Nelson Wrote:

I said that “all proteins are derived from other proteins, and ultimately from the minimal set present in the LUCA [Last Universal Common Ancestor], by descent-with-modification relationships (e.g., gene duplication).” That is not equivalent to saying that all proteins originated in LUCA.

Yes, it does. As you wrote the segment above, genes that are produced by gene fusion, or duplication or domain swapping still have to originate from proteins originally drevived from the LUCA (and this is the sense I use it in when I discuss the Sdic gene, which, although its protein coding region is a revamped intron, it still ultimately derives from proteins earlier in the lineage.

Shall we discuss your misrepresentation of the Siew and Fisher papers (and the Wilson paper)? From Siew and Fisher and Wilson you point out the increasing number of ORFans, which is only to be expected as we sequence more genomes, but ignore the exponentially decreasing percentage of ORFans, (see Figure 1 of Wilson et al). It seems the more sequences we add the more relatives we find. Curious that eh? Just what we would expect from evolutionary biology. Looking at a dataset where proposed artefactual sequences have been removed, something like 2% of sequences are ORFans (and with current trends around 1% of sequences will be orphans when we double the number of genomes sequenced.)

Hmmm, 2% ORFans, and this estimate doesn’t include finding things like the horizontally transferred bacteriophage genes, or the new algorithms that will improve homology detection. But lets ignore that and say that after extensive sequencing of genomes 1% are ORFans, and that all represent new de novo genes (again, remember that ORFans come in hierarchically organised families, as well as singletons so this will be a gross over estimate even then). A problem for common descent? I think not.

Paul Nelson Wrote:

Ask yourself if Mycoplasma is likely to be the only prokaryote with 28 percent ORFans among its essential hardware.

Ah, no. Read the paper again. Mycoplasma currently has 28% of its proteins of unknown function, not ORFans (it is often the case where we can clearly see a phyogeny for gene sequences, but not know what they actually do). Some of these will actually be ORFans. However, there has been much recent progress on identifying phylogeny in Mycoplasma ORFans, so, as with all other organisms, it expect the number of ORFans to go down significantly as we understand the genomes better and produce better homology tools.

None of my links worked above. Here they are:

NCBI Microbial Genomes page:

http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi

Glass et al. global transposon mutagenesis experiment in Mycoplasma (28 percent sequences of unknown function among the essential cellular hardware):

http://www.pubmedcentral.gov/articl[…]did=16407165

Wilson et al. survey of trends in discovery of ORFans (roughly linear increase in number, with no sign of leveling off:)

http://www.ncbi.nlm.nih.gov/entrez/[…]ubmed_docsum

The OrphanMine is a good place to go exploring:

http://www.genomics.ceh.ac.uk/orpha[…]han_home.php

To Jim: the Glass et al. paper is freely available online. Note that the “unknown function” fraction in their experiment comprises essential sequences, an observation Ian doesn’t treat in his post.

To PZ: of course, ex hypothesi, new proteins evolve throughout evolutionary time, commencing (under the theory of common descent) with the set present in LUCA. The problem is that on pretty much any theory of protein evolution, the origin of novel sequences will nevertheless leave an historical signal (homology as detected by sequence similarity). This is why ORFans are so puzzling.

None of my links worked above.

Mine does.

Are you gonna answer my questions?

Sorry, Paul, but you have offered no evidence for your thesis whatsoever. Your argument is subjective at best, more likely specious.

As usual.

Paul Nelson Wrote:

To Jim: the Glass et al. paper is freely available online. Note that the “unknown function” fraction in their experiment comprises essential sequences, an observation Ian doesn’t treat in his post.

Because it is irrelevant. “Unknown function” is not synonymous with “ORFan” (although many of these will be true ORFans). The Orphanin receptor was of unknown function for quite a while, even though it was clearly related to g-protein coupled receptors. Of the true Mycoplasma ORFans, some sequences will be difficult to recover phylogeny due to divergence or horizontal transfer form as yet unsequenced organism (and as I said above, there is a lot of progress in finding recovering ORFan phylogenies), some will be lineage specific de novo proteins. Neither of these is a problem for evolution.

Paul Nelson Wrote:

To PZ: of course, ex hypothesi, new proteins evolve throughout evolutionary time, commencing (under the theory of common descent) with the set present in LUCA. The problem is that on pretty much any theory of protein evolution, the origin of novel sequences will nevertheless leave an historical signal (homology as detected by sequence similarity). This is why ORFans are so puzzling.

And yet, we do find hierarchical families of ORFans with historical traces, just as if they evolved. De novo gene generation from “junk DNA” will be almost impossible to recover phylogenetically. Even where we have gene fusion, in cases like Sdic, where intronic sequences become coding sequences, recovering the phylogeny of these proteins is very challenging. And of course, the recently discovered horizontal transfer of bacteriophage genes produced an initially puzzling signal, until we looked carefully at bacteriophages.

It’s funny to listen to someone talk about the results of gene finders being evidence against common descent when the gene finding algorithms have statistical assumtions about common descent BUILT INTO THEM! These algorithms wouldn’t work right if it weren’t for common descent.

Silly, silly man.

I’d like to ask Paul Nelson some questions. You seem to have a conscience. Why even put up an argument about ORFans? You’ve acknowledged that there is no scientific theory of ID, and that there never was one. So, you must understand that all you are doing is blowing smoke. Why even try? Shouldn’t you be spending your time trying to make ID into something that is scientifically defensible? If you can’t defend ID why blow smoke?

When did you know that the Discovery Institute had dropped ID for teach the controversy? West seems to have claimed that the Discovery Institute had a change of direction back in 1999. 1999 just happens to be when Meyer posted the junk about the legality of teaching the controversy. The Discovery Institute used to claim that ID was their business, but what is the business, now? I recall some of the Discovery Institute people bad mouthing Mike Gene when he came out and claimed that he didn’t think that ID should be taught before Ohio broke. Did Mike just let the cat out of the bag too early? When were you informed of this change of direction and why didn’t any of the fellows mention it to the general public? Why didn’t anyone seem to know until Ohio? It certainly took the Ohio board by surprise. Before Ohio no one at ARN seemed to have a clue about the change except Mike.

Really, instead of blowing smoke, why not answer the questions?

Why not make ID scientifically defensible? Ron, Paul Nelson is a Young Earth Creationist. He can’t make a silk purse out of a sow’s ear. He can just try to badmouth real silk purses.

What am I missing here? Ian’s post said

From Siew and Fischer, Proteins: “We have shown that the number of ORFans is currently growing, whereas their fraction among ORFs is slowly diminishing.” (Bolding added)

Down in the comments, Nelson says

Wilson et al. survey of trends in discovery of ORFans (roughly linear increase in number, with no sign of leveling off:) (Bolding added)

So I went and read Wilson, et al., cited by Nelson as showing a linear increase in the number of ORFans, but the data actually show the proportion of ORFans decreasing pretty much as Ian described it, with the asymptote of the curve for sequences longer than 150 getting real close to zero.

I thought Nelson was supposed to be the honest creationist in the DI crowd.

RBH

RBH Wrote:

So I went and read Wilson, et al., cited by Nelson as showing a linear increase in the number of ORFans, but the data actually show the proportion of ORFans decreasing pretty much as Ian described it, with the asymptote of the curve for sequences longer than 150 getting real close to zero.

I thought Nelson was supposed to be the honest creationist in the DI crowd.

Well, if he kindly gives you the citation so you can go and find out that it doesn’t support his claims, that’s almost as good as just not making false claims in the first place! Right?

Maybe we can call it “para-honesty” or something.

Really, instead of blowing smoke, why not answer the questions?

Indeed. I’ve asked Paul repeatedly why if, as he says, there simply IS NO SCIENTIFIC THEORY OF ID, then why does the ID movement call itself … well … the ID movement?

I never got any intelligable answer from him. Except for a (presumably) sarcastic retort that maybe it should be called the “The Fundamentally Religious and Scientifically Misbegotten Objections to Evolution Movement (FRASMOTEM for short)”.

That, of course, would have the advantage of at least being an HONEST name, unlike “Intelligent Design Theory”. (shrug)

Various comments –

Yes, ORFans and sequences of unknown function are not equivalent: all ORFans are sequences of unknown function, but not the converse. It is likely however that much of that 28% will remain as ORFans (within the genus Mycoplasma), even with increased sampling. [I see Ian has commented on this.] The fascinating question to ask is this: what if other prokaryotes have their own high fractions of essential ORFans, which are taxon-specific?

Wilson et al. partition their data (at a 150 amino acid cut-off) because of worries about the possibly artefactual status of shorter ORFans. Even with that partition, however, the trend line is not coming down. Remove the data partition, and the slope of the line would change dramatically. Other authors have suggested that microbial genomic space extends indefinitely; when individual strains within group B Streptococcus were sequenced, each strain possessed a significant fraction of unique sequences:

http://www.ncbi.nlm.nih.gov/entrez/[…]ubmed_docsum

What one makes of the question of increasing number vs overall percentage will gain or lose significance with respect to functional roles for ORFans. A linearly increasing number of species-unique sequences that are also essential will be difficult to reconcile with universal common descent (but see below for the flexibility of that theory in the face of observational challenge).

Fraser and Peterson, after the first global transposon mutagenesis experiment in Mycoplasma, speculated that the high percentage of “unknown function” sequences may reflect aboriginal polyphyly for many cellular functions. “A third, and extremely interesting, possibility,” they write, “is that many gene functions have evolved independently more than once since the beginning of cellular life on the planet.” That’s casting doubt on common descent, at least for those particular functions. http://www.pubmedcentral.gov/articl[…]did=11182883

Ian, it’s hard to know what to say about your claim that ORFans pose no challenge to common descent. As you would have seen from the context of my Helsinki lecture, testing the monophyly of life (Darwin’s single tree) is impossible when anomalies are always dumped into the “problems to be solved later” bin. PT readers who pursue the ORFans puzzle on their own —- and they should! —- will note that adjectives such as “baffling,” “mysterious,” “surprising” recur in both primary research and review articles on the topic. Russell Doolittle explains why:

From its beginning, the whole-genome enterprise depended heavily on the premise that most genes would be readily identified by computer analysis alone. The basis for this hope was that most —- if not all —- extant genes are descendants from a smaller ancestral population that has been expanded by gene duplication. As such, indentifications would be made by comparison with known genes and gene products whose functions had been determined experimentally. (emphasis added)

Doolittle goes on to note the surprising discovery of high numbers of ORFans, and asks, “After more than a century of study by biochemists and microbial geneticists, how could there be so many unrecognizable genes?…it remains mysterious how these ORFans have become so different from their alleged nearest relatives.”

http://www.ncbi.nlm.nih.gov/entrez/[…]ubmed_docsum

This reaction of “surprise” arises because of the background assumptions of (a) universal common descent [from LUCA], and (b) evolutionary theories about the origin of novel genes and proteins. I argue that (a) is probably wrong, and (b) not nearly as well-understood as many think.

Thank you, Paul, for not answering my question.

I’ll take that as an “I don’t know what I’m talking about.” answer.

Clearly, you spout typicall clueless creationist rubbish.

I am sorry for your dementia.

This reaction of “surprise” arises because of the background assumptions of (a) universal common descent [from LUCA], and (b) evolutionary theories about the origin of novel genes and proteins. I argue that (a) is probably wrong, and (b) not nearly as well-understood as many think.

Here’s a bold prediction: any and all progress in understanding the ORFan phenomenon will be continue to be made by scientists who are not Young Earth Creationists.

Here’s a stray thought wanting the attention of your electrons:

Would we expect to see more ORFans in bacteria that have higher frequencies of HGT (Horizonal Gene Transfer, not Hercules Grytpype-Thynne)? My reasoning is that with low HGT, it’s more likely that an ORF will be derived from a similar gene in the same species, i.e. a gene that has already been sequenced. OTOH, with high HGT, it’s more likely that the ORF has been picked up from some un-sequenced or unknown bacterium.

Bob P.S. I’m disappointed at the lack of ORFul puns here.

Nelson Wrote:

Doolittle goes on to note the surprising discovery of high numbers of ORFans, and asks, “After more than a century of study by biochemists and microbial geneticists, how could there be so many unrecognizable genes?…it remains mysterious how these ORFans have become so different from their alleged nearest relatives.”

http://www.ncbi.nlm.nih.gov/entrez/[…]ubmed_docsum

This reaction of “surprise” arises because of the background assumptions of (a) universal common descent [from LUCA], and (b) evolutionary theories about the origin of novel genes and proteins. I argue that (a) is probably wrong, and (b) not nearly as well-understood as many think.

Most available evidence shows that a0 is likely right, while b) may be correct. After all science keeps finding new exciting facts about the evolution of life. What has ID to offer other than some doubt and second guessing what Doolittle really meant? For some these surprising mean new research and hypotheses, for others… well they write books…

Doolittle Wrote:

In this brief review, I attempt to highlight some of the most impressive advances that whole-genome studies have contributed to our views of evolution.

Some see evidence that our ignorance somehow points to something and yet time after time, when the veil of ignorance is lifted, science continues its happy way of exploring life around us. When it comes to Orfans, it seems that much of the objections have already been rebutted, so what is left? A disbelief in common descent against all the evidence? So are Orfans a threat to common descent?

Paul Nelson Wrote:

all ORFans are sequences of unknown function, but not the converse. It is likely however that much of that 28% will remain as ORFans (within the genus Mycoplasma), even with increased sampling.

Why do you think this is likely, when a) we don’t even know what proportion of the unknowns are ORFans and b) all evidence (see main post and post above and the cited papers) is that the number of ORFans in any given organism will fall dramatically, despite your attempt to obfuscate the issue.

Paul Nelson Wrote:

“a third, and extremely interesting, possibility,” they write, “is that many gene functions have evolved independently more than once since the beginning of cellular life on the planet.”. That’s casting doubt on common descent, at least for those particular functions

So what? We’ve known for a long time that different lineages have independently evolved a variety of enzyme functions. The fact that bacteria have independently evolved serine proteases from eukaryotes in no way affects common descent. It is exciting if gene functions can independently arise easily, because it is consistent with what we know from de novo generation of protein function, and is (yet another) death blow for the ID conception that functional proteins are rare in sequence space. Newly arising enzymes are no more problematic than horizontal gene transfer. We can follow common descent in spite of horizontal gene transfer (and if the Daubin Ochman work can be generalised to other organisms, most non-artefact ORFans are examples of horizontal gene transfer).

Paul Nelson Wrote:

Ian, it’s hard to know what to say about your claim that ORFans pose no challenge to common descent.

Because they don’t. You can still form phylogenies even if all the ORFans are new de novo genes, look what we do with LINES and SINES. We can form phyogenies even with horizontal gene transfer; so new genes pose no problem. And as I’ve said above, if the Daubin Ochman work can be generalised to other organisms, most non-artefact ORFans are example of horizontal gene transfer.

Paul Nelson Wrote:

As you would have seen from the context of my Helsinki lecture, testing the monophyly of life (Darwin’s single tree) is impossible when anomalies are always dumped into the “problems to be solved later” bin.

Except that these “problems” are a) irrelevant to the monophyly of life and b) being actively worked on as we speak. In the two and a bit years since the Siew and Fisher review, research has shown that roughly half of all ORFans are artefacts. Of the remaining, in E. coli the majority are examples of horizontal transfer (and can be used to trace lineages, note the title of Daubin and Ochman “Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli.”), and Siew and Fischer themselves have constructed a major public data base, and found one linage of ORFans descending from dehalogenases, others are solving phyogenies as we type here. All this evidence leaves common descent intact. It may take more effort to effectively trace phyogenies, but we can still do it.

I would like to point out that in that time, all ID has managed to do is well, nothing. With the public data bases available there could have been some “design theoretic” analysis surely. Instead all we have is misleading oversimplifications of an interesting aspect of biology, and no research.

Paul,

If ORFans are supposed to refute common descent, have you debated Behe on it?

popper's ghost Wrote:

The monophyly of life has been repeatedly tested, and passed.

Really? What tests do you have in mind?

Really? What tests do you have in mind?

Every observation that is consistent with monophlyly that might not have been is a passed test – philosophy of science 101. As I noted, “Each such test increases our confidence in the theory — it does not, of course, prove it”. Multiple origins are of course possible, but by Ockham’s Razor, evidence for them that is not consistent with monophyly is needed in order to favor such a view. Of course, there are those who will claim that “the possibility of polyphyletic origins became downright blasphemous”, but such folks should check Marvin Minsky’s crackpot index, particularly item 29:

40 points for claiming that the “scientific establishment” is engaged in a “conspiracy” to prevent your work from gaining its well-deserved fame, or suchlike.

Resolving the connections between Lamarck and Darwin concerning polyphyletic origins will require carefull study by some biology historian.

And yet you stated that “It seems that Darwin’s thinking was likely influenced by Lamarkian thought on multiple origins of life” without benefit of a “carefull (sic) study”. As Lynn noted, it doesn’t follow from Darwin’s “into a few forms or into one” statement that he was “likely influenced by Lamarkian thought on multiple origins of life”.

popper's ghost Wrote:

Every observation that is consistent with monophlyly that might not have been is a passed test — philosophy of science 101. As I noted, “Each such test increases our confidence in the theory — it does not, of course, prove it”.

Yes I understand all of that. What I would like to know is what are the tests to which you refer. That is those tests that support monophyly as opposed to polyphyly for life’s origins.

popper's ghost Wrote:

And yet you stated that “It seems that Darwin’s thinking was likely influenced by Lamarkian thought on multiple origins of life” without benefit of a “carefull (sic) study”. As Lynn noted, it doesn’t follow from Darwin’s “into a few forms or into one” statement that he was “likely influenced by Lamarkian thought on multiple origins of life”.

Please don’t go pedantic on me. I probably know much more about this topic then you so you should just listen and maybe learn something. I threw out that possible connection between Lamarck and Darwin because it is a reasonable suggestion, but one that as far as I know has never been seriously explored. There is little doubt that anybody who thought about evolution as an explanation for species diversity prior to 1860 had to think in polyphyetic terms. Darwin couldn’t have possibly completely insulated himself from that tradition - it simply makes no sense. Afterall, Lamarck was the most prominent advocate for evolutionary change before Darwin.

Please don’t go pedantic on me. I probably know much more about this topic then you so you should just listen and maybe learn something.

LOL! pedantic = ostentatious = pretentious. Your comment certainly qualifies.

I’m outta here.

The way I understand it, a Lamarckian model would have not several, roughly simultaneous, ancient origin events, but continuous origin events, as new lineages start the ‘climb up the ladder.’

Any way, that’s how I have geerally conceptualized the fundamental difference between Darwinian and Linnean thought: branching bush with single stem vs. multiple, parallel ‘ladders.’

Ian Musgrave Wrote:

The comments thread is drifting now, and Dr. Nelson seems to have left the room.

I apologize. How dare I ask him why he and Behe haven’t debated common descent, as any real scientists would eagerly do if they held such radically different positions.

For “Linnean” in my prev. comment, please read “Lamarckian.”

those typing fingers have a mind of their own sometimes.

popper's ghost Wrote:

LOL! pedantic = ostentatious = pretentious. Your comment certainly qualifies.

I’m outta here.

I asked you above to answer a very specific question based upon a claim you made. Perhaps you are unable to support your claim, so let me point towards an article that may help you think more clearly about this issue.

http://www.pandasthumb.org/archives[…]_univer.html

Paul Nelson quotes Russell Doolittle; Russel Doolittle says, “After more than a century of study by biochemists and microbial geneticists how could there be so many unrecognizable genes?” Excuse me, but what century is he talking about? It’s been only about a century since biologists recognized inheritance was discrete and things called genes even existed. It’s been just over 50 years since Watson and Crick discovered the structure of DNA, and it took several years after that for their model to be accepted. The genetic code wasn’t fully cracked until around 1967. Rapid methods for working out complete genomes didn’t exist until the turn of the present millenium. Of course, if Doolittle had said something like, “It’s been, oh, around a week and a half since there have been any major advances in biology. How could there still be unanswered questions about evolution?” he wouldn’t have sounded nearly as impressive, and Nelson wouldn’t have bothered to quote him. But at least he would have been honest.

Martha B Wrote:

Paul Nelson quotes Russell Doolittle; Russel Doolittle says, “After more than a century of study by biochemists and microbial geneticists how could there be so many unrecognizable genes?” It’s been only about a century since biologists recognized inheritance was discrete and things called genes even existed.

I think that’s the event he’s starting his “century” from. Because he’s talking about gene classification via product as well as sequence–

“As such, identifications would be made by comparison with known genes and gene products whose functions had been determined experimentally.”

–he can count the pre-DNA research era as contributing toward this end.

Earlier Doolittle actually says much of what you said, in terms of the recency of this research:

“Historically, the primary goal of the study of molecular evolution is to reconstruct past events in a way that explains the present living world. Ultimately, if the evidence has not been overly blurred by time, all trails should lead back to a common ancestral cell type. Over the years, macromolecular sequence information has been applied effectively towards this end, even in the face of major complications resulting from vastly unequal rates of change along different lineages, horizontal transfers of genes and gene clusters, and numerous other distractions. That these efforts have succeeded as well as they have must be regarded as a major triumph.”

“Although the enterprise has been ongoing for half a century, it’s only during the past decade that whole-genome sequences have been available; the question needs to be asked how this resource has affected the quest. In a word, immensely. Not only are organism connections at all levels being better established, but the full extent of the proliferation of gene families and the protein structures that underlie cellular divergences is also being greatly extended. In this brief review, I attempt to highlight some of the most impressive advances that whole-genome studies have contributed to our views of evolution.”

Martha B wrote:

Paul Nelson quotes Russell Doolittle; Russel Doolittle says, “After more than a century of study by biochemists and microbial geneticists how could there be so many unrecognizable genes?” It’s been only about a century since biologists recognized inheritance was discrete and things called genes even existed.

Good comments Anton and Martha. But another aspect should be brought up to answer Nelson’s attempted disparagement via an actually good question:

However poor Nelson’s chronology and reasoning may be, there is in fact a very good reason why it is taking a considerable amount of time to find out about genes. They evolved. They, in their contexts, are not the simple little designed things that IDiots expect them to be, rather they have developed through contingency, without the straightforwardness of “design”, and their regulation, expression, competing effects, and feedback are all very complicated.

Of course we have been able to read genes themselves for only 30 years or so, getting back to Nelson’s considerable lack of knowledge. And indeed, we have found so many linkages between genes that only a dolt or a reactionary could deny the responsible mechanism, evolution.

One reason why many genes are unknown (to say “unrecognizable” is tendentious and an attempt to bias reactions to the unknown) is that they have not yet been found, and/or have not yet been studied. That is to say, researchers are working out relationships and functions of many of the familiar genes, not tackling many of the genes that have been found, but not yet characterized fully.

Nelson seems to think that everything that is found is being studied, so that any genes whose relationships remain unknown for a few years are intractable, due to the chaotic “designer” who he fails utterly to describe, characterize, and to make predictions based upon the constraints of said “designer” (notably, because he has no actual “designer”, only the word). The fact of the matter is that many genes remain obscure simply because there are so many being found today.

In other words, once again the cause behind the foolish IDist question is the rank illiteracy and ignorance of the IDists. This theme never runs out.

Glen D http://tinyurl.com/b8ykm

Anton makes a very good point. Doolittle also says, apropos of Dr. Nelson’s claims

Doolittle Wrote:

On the bright side, the fraction seems now to be diminishing as searching regimens improve [8 and 9]. For example, the use of a fold recognition algorithm is making connections that were previously missed when only sequences were being considered [10* and 11].

Ie, Doolittle is highlighting the actual research that is resolving these questions, which Dr. Nelosn has ignored. Note that this section on ORFans was very brief, and the majority of other articles Dr. Nelson quotes are brief reveiws. He never seems to engage with the substantial papers tackling these questions (my bottle of Australian Red seems safe too).

About this Entry

This page contains a single entry by Ian Musgrave published on April 26, 2006 3:28 PM.

Tangled Bank Episode LII was the previous entry in this blog.

Fun facts on Creation Scien–er, Intelligent Design is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.381

Site Meter