Inordinately Fond of Viruses: ORFans and Intelligent Design

| 38 Comments | 1 TrackBack

orfano.png

J.B.S. Haldane, when asked “What has the study of biology taught you about the Creator, Dr. Haldane?”, replied

“I’m not sure, but He seems to be inordinately fond of beetles.”

Discovery Institute Fellow Dr. Paul Nelson is inordinately fond of ORFans, genes unique to one species that appear to have no relatives in other species. He feels that these unique genes represent a significant challenge to evolutionary biology. However, he has not noticed that the distribution of ORFans implies that the designer is more enamoured of viruses than humans.

A very tiny mystery:
What are ORFans, and why should we care about them? ORFans take their names from Open Reading Frames (ORF’s ). ORF’s are stretches of DNA that apparently code for proteins. ORFans are ORF’s that appear to occur only in one species. Note that I say “appear to”. The computer programs that are used to identify genes during whole genome assembly can falsely identify segments of DNA as ORF’s, this can be a significant issue in some genomes. Also, our computer programs for identifying related genes can miss genes that have undergone rapid evolution.

An example is the rotating image below left. ORFan_3D_conservation_score_White_BG-2.gif This is a 3D model of the protein Xc5848 from Xanthomonas Campestris (it is also the static molecule above), originally designated as an ORFan, it was identified to be part a of large class of proteins by sophisticated structure analysis. The model is coloured by amino acid conservation, with red being the highest conservation, and blue being poorly conserved. The model is mostly red (ie it’s part of a highly conserved protein family, not an ORFan at all).

ORFans come in two classes, short (often less than 100 amino acids long), which are unlikely to represent real genes as they are usually much shorter than most real genes, and long (usually over 150 amino acids long), which are likely to be real genes. There are far more short ORFans than long ORFans.

Paul Nelson thinks that ORFans represent a major blow to evolutionary theory. To him they break attempts to determine phylogenies and throw doubt on the idea that all organisms descended from a common ancestor. I’ve dealt with that aspect before (see also here), and I won’t go into detail here, but I would like to simply re-iterate a few points. We have a number of explanations, based on evidence, for the existence of ORFans.

1) Some represent artifacts
2) Some represent rapidly evolving genes whose origin is obscured by the pace of evolution
3) Some represent genes horizontally transferred genes from organisms that have not been sequenced yet.
4) Some represent genuine, de novo genes.

Now, as I said, we have evidence for all of these explanations, and ORFans will represent a combination of all factors. For example, it has been estimated that about half of all short ORFans represent artifacts, but some do represent genuine protein coding genes. In the Firmicutes (the family of bacteria that include the well known gut bacteria Escherichia coli, a large percentage of the genuine short ORFans represent bacteriophage genes (although the confirmed proportion of viral genes in prokaryotes generally is somewhat smaller), and as we add more genomes we discover relatives for things we previously thought were unique.

A tiny mystery gets tinier:
In 2003, a fair percentage of the ORF’s found in fully sequenced prokaryotic genomes were ORFans. However, even back in 2003, it was apparent that as we sequenced more genomes we found more relatives for ORFans and fewer new ORFans.

ORFan_figure.jpg

Figure 1c of Seiw & Fischer 2003, Proteins, 53:241-251, showing that the percent of the genome that is ORFans is decreasing, while the number of ORFans is flattening out.

A relentless fall of ORFans:
We have a lot more data now, and the extent of the fall in ORFans can be found by looking at the ORFan mine, a database of ORFans. As we add more genomes, we identify more relatives of things we thought were unique, and identify and purge more artifacts.

Consider the Escherichia coli genome. In 2003 the total ORFans (things likely to be artifacts) in the E. coli genome constituted 5.5% of the genome, and long ORFans (things likely to be genes) represented 2.4% of the genome. By 2008, total ORFans and long ORFans represented 0.4% and 0.1% of the genome respectively. Consider also the Heliobacter pylori genome, going from 17% and 9% total and long ORFans in 2003 to 2.3% and 0.6% total and long ORFans in 2008.

If you look at all 60 of the genomes reported by Seiw and Fischer in 2003, the total ORFans averaged 14%, by 2008 this was down to 6%. If you look at the genomes added after those 60 (ie all the late comers, not those that are already characterised), their ORFan precent is 7%. In 2003, the last 10 organisms to be added to the databased had an average of 12% ORFans when first sequenced, in 2008, the last 10 organisms had 6% ORFans when first sequenced.

Even those figures may overestimate the number of ORFans, of the 19 ORFans in the E. coli data base, 10 are annotated to viral or conserved proteins. Of the ones I’ve investigated, there is significant sequence similarity to other proteins (eg the alleged ORFan NC_000913orf2361 is annotated to be a CPZ-55 prophage, and forms a high significance phylogeny with other proteins and even has a PFAM domain in it!)

ORFan_tree.jpg Some ORFans are not. The supposed ORFan NC_000913orf2361 is related to a whole range of conserved proteins.

So as we sequence new genomes, we are finding fewer and fewer ORFans. This entirely consistent with the position that ORFans represent rapidly evolving proteins, horizontally transferred proteins and annotation artifacts rather than unique proteins inserted by an unknown designer by unknown mechanisms. Paul Nelson like to emphasize the number of ORFans, as this is increasing. However, the pattern of increase is very instructive. Below I’ve plotted the total number of ORF’s and ORFan’s with increasing number of full genomes sequenced, and the fold increase of ORF’s and ORFans with respect to the numbers of ORF’s and ORFans when only 15 Genomes were sequenced (why do ID advoctes never do this type of thing?).

You can clearly see that the rate of increase in ORFan number is dramatically slowing. When we reached 60 sequenced genomes, this resulted in 4.5 fold increase of ORF’s over the numbers present at 15 genomes, but just over a doubling of ORFans, by the time we got to 330 genomes, ORF’s had increased 25 fold from the numbers at 15 genomes, but ORFans had increased less than eight fold. This is entirely consistent with the fact that as we add genomes, we find more relatives of these genes.

Total vs Fold ORF's.pngORFan numbers increase as we sequence more genomes, but ORF’s (real genes with known relatives) increase much, much faster. This is consistent with the majority of ORFans representing under sampling of phylogenies. Data taken from Seiw and Fischer, 2003 and the ORFan mine).

enter the virus:
Paul Nelson is now particularly taken with a paper from Fischer’s group, that showed that around 38% of complete virus genomes are ORFans. This figure seems to impress Paul. However, the same issues that applied to prokaryotic genomes apply to viral genomes.

viral_orphans.jpg

As shown in figure 4 of Yin and Fischer (above), as the number of viral genomes sequenced increases, the percentage of ORFans drops as relatives are found (just like prokaryotic ORFans). The phage groups with the most “ORFans” are those that have the fewest sequences (just like prokaryotes, which suggest that sampling of genomes is the main issue).

Furthermore, 18% of alleged “ORFans” turn out to be horizontally transferred prokaryotic genes (just as a fair proportion of prokaryotic “ORFans” turn out to be horizontally transferred bacteriophage genes). Looking at the authors conclusions we find them saying:

Because the current sampling of phages (and of bacterial genomes in general), is limited and biased towards particular groups, the percentage of ORFans in different phage groups varies significantly. This low sampling may be a factor contributing to the abundance of phage ORFans, but is not likely to be the only one. That is, even after many more genomes are sequenced, we expect to find a significant number of ORFans and near-ORFans, awaiting interpretation. There are also other possibilities to account for the ORFans’ origin, like rapid divergence after horizontal transfer (from hosts or from other viruses, from existent genomes or yet extinct genomes) or duplication.

Rapid divergence obscuring ancestry in rapidly evolving viruses is by no means unusual, and more careful sequence comparison will undoubtedly turn up more relatives (just as happened with procaryotes).

Summary: So, the solutions to the ORFan “puzzle”, as outlined by Yin and Fischer (poor sampling, horizontal transfer, rapid evolution) follow the same lines as my previous Pandas Thumb posts (I also included annotation errors, known to produce a proportion of alleged prokaryotic “ORFans”. These annotation errors are likely to be substantial in small genomes as well).

It is instructive to compare the number of ORFans in various genomes (as they currently stand). The Human genome has 0% ORFans [see note], Prokaryotes an average of around 7% and viruses around 30%. Now, if it may be that ORFans represent artifacts, poor sampling and rapidly evolving genes (which would explain why rapidly evolving, under sampled and exceedingly diverse groups like viruses have more ORFans than prokaryotes or Humans).

Or the Designer really has an inordinate fondness for viruses.


Note: Paul Nelson objects to the paper that eliminated the last of the ORFans from the human genome (Clamp et al., 2007), as he claims that they did this on purely evolutionary reasoning. He is wrong; they also looked at whether these sequences were significantly different to random sequences, and whether they were expressed as protein. They weren’t and they aren’t. This is good evidence that they are artifacts.

Larry Moran has a good discussion of ORFans at the Sandwalk.

References

1 TrackBack

ORFan genes and intelligent design from Playing Chess with Pigeons on May 14, 2008 9:01 PM

“When you said “ORFan”, did you mean “ORFan” – a gene unique to one species that appear to have no relatives in other species, or “OFTen”, frequently?” In a previous post about a Expelled Q & A event held at Biola Universi... Read More

38 Comments

Nice.

Why does every creo who manages to read a paper conclude that it is something that is unexplainable by modern evolutionary theory? And why do they never seem to notice that the real authors never have this problem? If the papers ever proved what is claimed the authors would have pointed it out and become famous. Wonder why they didn’t bother?

On another thread some yahoo was trying to claim that genes that are “closely related” in different organisms somehow disprove common descent. Now some guy is trying to claim that if genes are not similar to other genes it disproves common descent. Make up your minds guys. Learn what the modern theory of evoution really predicts and stop making up fake scenarios that evolution supposedly cannot explain. And stop ignoring basic genetic mechanisms that are well understood by real scientists, that just makes you look ignorant.

David Stanton said: Why does every creo who manages to read a paper conclude that it is something that is unexplainable by modern evolutionary theory?

Poor (critical) reading skills + distaste for empirical research + fanatical devotion to a literal reading of Genesis + evangelical-fundamentalist group-think + underlying political agenda + realization that massive amount of money can be made selling books, etc. to people who think likewise.

hje for the win!

hje FTW indeed. Don’t forget paranoid delusion (about Big Science, the Global Darwinist Conspiracy™, etc.).

You could set the denialism to a music video. Oh wait, someone has.

“I’m not sure, but He seems to be inordinately fond of beetles.”

Or the Designer really has an inordinate fondness for viruses.

Nonsense! Hasn’t it already been established that it’s flagella of which the Designer is enamored? ;)

After all, if the designer wasn’t fond of flagella, He wouldn’t have whipped up so many of them!

Henry

One minor point: the Firmicutes are Gram positive. E. coli is Gram negative and in the phyla Proteobacteria

“I’m not sure, but He seems to be inordinately fond of beetles.”

Is that why they’re bigger than Jesus?

Add: an exagerrated confidence in their ability to discern the truth of a complex scientific argument merely by thinking about it a lot.

In other words, truthiness.

Has Paul Nelson ever made a respectable argument? Ever? I can understand why his ‘ontogenetic depth’ monograph is indefinitely postponed. I bet it sucks out loud.

Zaius said:

Is that why they’re bigger than Jesus?

Brilliant! ROFLMFAO

When the subject is “junk DNA,” IDers always react with: “we don’t really know that it’s ‘junk’ because they keep finding functions all the time.”

When the subject is “ORFAns” can we expect them to react consistently with: “we don’t really know that they’re ORFans, because they keep finding relatives all the time.”?

If they have a double standard, it would be especially curious since ID itself has never ruled out common descent and some IDers have conceded it outright.

Thanks, Ian, for a very thorough (and thoroughly referenced) summary of the state our knowledge of prokaryotic and phage ORFans.

I do have a nitpick - your use of apostrophes in the plural of ORF (“ORF’s”) is superfluous, and implies a contraction or a possessive, neither of which seems to be the case from the context. “ORFs” does the job of indicating the plural quite nicely.

hje Wrote:

Poor (critical) reading skills + distaste for empirical research + fanatical devotion to a literal reading of Genesis + evangelical-fundamentalist group-think + underlying political agenda + realization that massive amount of money can be made selling books, etc. to people who think likewise.

I modified (changes in bold) it to describe people like Michael Behe, if not Paul Nelson:

Reading skills that specialize in cherry picking and quote mining + distaste for empirical research that is known to falsify one’s claims + fanatical devotion to maintaining the public’s literal reading of Genesis despite admission that it is “silly” + evangelical-fundamentalist group-think + underlying political agenda + realization that massive amount of money can be made selling books, etc. to get people to think like you want, not like you do.

The designer loves prokaryotes. Obviously, metazoans were only designed as condominiums for the little beasties. When the designer comes back and finds one up start species using antibiotics and doing other terrible things like bathing and chlorinating the water there will be hell to pay.

Paul Nelson objects to the paper that eliminated the last of the ORFans from the human genome (Clamp et al., 2007), as he claims that they did this on purely evolutionary reasoning.

This is beyond stupid on Nelson’s part. His argument is that “evolutionary reasoning” can’t explain ORFans, even in theory, so he considers the use of “evolutionary reasoning” in order to explain “evolutionary reasoning” a problem????

Egads. The stupid, it burns. The picture I have in mind is that Far Side cartoon, with the fat little nerd boy trying to push open the entrance door of the “School for the Gifted”, and just not succeeding. The door is clearly labeled “PULL”.

He is wrong; they also looked at whether these sequences were significantly different to random sequences, and whether they were expressed as protein. They weren’t and they aren’t. This is good evidence that they are artifacts.

But beyond just misreading Clamp et al in his own meretricious quote-mining way, Nelson can’t even put together a coherent logical argument. PULL, Nelson, PULL!

Would somebody page Berlinksi? I feel an irresistible urge to call him Pull Nelson from now on. PULL, Nelson, PULL!

3) Some represent genes horizontally transferred genes from organisms that have not been sequenced yet.

Whoops, it looks like (ahem) a gene duplication.

Darn it! I am still trying to get through my biology book so I can understand this stuff. Aauurrgghh!

@ James - another great video :-)

Every issue in the life sciences represents “a significant challenge to evolutionary biology”. It is evolutionary biology’s exquisite ability to address these challenges, explain the physical evidence, and provide the historic underpinnings of all these issues that has led to the almost universal acceptance (amongst those who actually study biology and understand the evidence) of Darwinian evolutionary theory. The historic progress you demonstrate concerning ORFans is an excellent example of this continuing process.

AnswersinGenitals,

Well said. Love the handle.

Nigel D said: your use of apostrophes […]

Evolution leads 2 spelling-naziism! U watchin, Ben Stien? BTW, not to get off-topic (I expect *crickets*) where’s that explanation of ontogenetic depth, Paul?

P.S.: Ian, the two links in “I’ve dealt with that aspect before (see also here)” don’t work for me, they require login.

slang said:

Nigel D said: your use of apostrophes […]

where’s that explanation of ontogenetic depth, Paul?

Oh, man do I hope he publishes that someday. It’s going to be humiliating. But it’s been delayed for years and years now, so we shouldn’t be optimistic that we’ll get to laugh at it.

I would go ahead and start a thread for it at After the Bar Closes, but that thread would just languish, as Paul flies around the world, talking his way into free meals and trips, dispatching horrible old arguments against evolution, and never delivering the goods.

substitute “deploying” for “dispatching” in the above.

Well, steve, I’m not aiming for humiliation, although I’ll admit that it’s a little bit of a cheap shot. Paul Nelson (or someone using his name) always seemed nice and polite when he used to post here, and it’s not like I have anything to show in science myself. Nonetheless, the promise of publication still stands, AFAIK.

It would just be nice, for once, to see one of them saying something like “alright guys, I really thought I was on to something significant here, but I’ve read the response, thought about it some more, and now see that I was wrong”. Oh well, I can dream, can’t I? :)

slang Wrote:

It would just be nice, for once, to see one of them saying something like “alright guys, I really thought I was on to something significant here, but I’ve read the response, thought about it some more, and now see that I was wrong”. Oh well, I can dream, can’t I? :)

One reason I’m so passionate about anti-evolution activism, is that I “expelled” myself in 1981 (coincidentally 2-3 months before I first heard the word “creationism”).

I had an interesting hypothesis in my research (chemical reaction mechanisms, not about biology), but the data were not supporting it. I wanted it to be true in the worst way, but could not continue deceiving myself. But I’m far from unique; all scientists do that one time or another.

For the next ~16 years I thought “creationists” were just stuck in the “denial” phase, but still “working on it.” But then I found that that’s rarely the case. Mostly they don’t even try to support alternate explanations, and increasingly they don’t even say what those explanations would be, other than “some designer did something at some time.” The only data they work with is that obtained by mainstream science, and they only “analyze” it to see how they can take it out of context to promote unreasonable doubt.

Many may be still deceiving themselves, but I think that their main goal is to deceive others.

Maybe they’re waiting for the technology to arrive to enable a level-headed and detached person to make a decision, rather than rushing in where angels fear to tread. That’s excluding the tiny minority that has a sectarian, slavish view of Scripture interpretation, which minority only has credibility now because people who will not learn from history and will not employ systematic, disciplined methodology, keep rushing in.

slang said: Paul Nelson (or someone using his name) always seemed nice and polite when he used to post here, and it’s not like I have anything to show in science myself.

Nelson’s disingenuous nice-guy routine is part of what makes him more odious than even Dembski, imo. We know that Dembski’s a petulant, arrogant, dishonest ass, and we don’t hesitate to blast him when blasting is called for. Nelson, on the other hand, takes the dishonesty to a deeper level with his “Hey, let’s have a beer and talk about it” baloney. A polite and engaging dishonest scumbag is worse, if it’s possible, than an overtly odious dishonest scumbag.

It took me several reads, but PBH, in the 6:06 comment, seems to be saying:

“Pay no attention to those who claim that scripture overrules evidence, but rather pay attention to people like Behe who admit that (1) doing so is silly, but also (2) don’t rush in to any explanation that could actually be testable and useful, when you could always take the loophole that ‘we’re still waiting for technology to catch up to our bag of intuitions,’ until which time (which is never) the official answer is ‘don’t ask, don’t tell’.”

BTW, the “sectarian, slavish view of Scripture interpretation” may be a tiny minority among scientists, but it is about half of the general public. And anti-evolution activists miss few opportunities to exploit that difference.

Paul, it seems to me as if your point regarding ORFans hinges on some supposed impossibility of new proteins arising via mechanisms commonly at work in the cell. (If this is not so, then perhaps you could stop by and correct me.) This essay shows one way in which you are wrong. Scientists working in other systems are catching up with earlier generations of plant scientists in this regard.

As I see things, the matters of the reliability of the estimates of ORFan numbers notwithstanding, your argument is based on a fundamentally erroneous assumption.

(I know, I know, you need some sort of subscription to see the second item. The title of the cited article - “De novo origination of a new protein-coding gene in Saccharomyces cerevisiae”, by Cai et al., says enough for now. The article is slated to be in print in the May 1 issue of Genetics.)

Ron Okimoto said:

The designer loves prokaryotes. Obviously, metazoans were only designed as condominiums for the little beasties. When the designer comes back and finds one up start species using antibiotics and doing other terrible things like bathing and chlorinating the water there will be hell to pay.

Condos? More like luxury resorts replete with amenities and room service.

Well!! thank, the so called creator (OR DESIGNER) that I am not an ORfan if you will,and that I have a mother and father and one hundred and seven living relatives, but you could not tell by looking at em.Maybe they are the orf’s or ORfans,and the orf’s are taking vitamins, and the orfans are doing drugs, Adios orfans.The good book says from dust you were made and unto dust you will return, so why worry about every thing in between?, which brings me to this thought, if there is a designer (GOD) why would he cause all this confusion?, when his main purpose is to make us belive he exists and love him. Why would he not leave more simple foot prints for you to find?, for he surely would know that man would try and prove it.So dedicate yourself and your reaearch to the betterment of life and forget about design.

respectfully yours

Mr H.E. Thoma

Ontogenetic depth… the very name causes giggles. I have been waiting for years now this momunental monography so I can laugh hard and long. Paul please do not deny all of us here from a good laugh. Publish the ontogeneti depth monography ASAP.… :-)

slang said:

Evolution leads 2 spelling-naziism! U watchin, Ben Stien? …

I think you’ll find he spells it “Stein”…

Oh, hang on. Was I just being a spelling-nazi?

Ian, What you missed in the literature (and what I did not miss because I am in the medical field) is that there are no real orphan genes. Orphan genes can be often found in RNA viruses and ERVs. Why are they in ERVs? Because an ERV is not an integrated virus. Because ERVS are not remnant of ancient viruses but rather they are (derailed) VIGEs (variation inducing genetic elements). VIGEs were designed to rapidly generate variation. The simplest virus is an three element VIGE plus an oncogene. It wasn’t an oncovirus that integrated in the genome. Of course not. How could an oncovirus evolve without the genome it requires for replication? Darwinians have largely ignored the RNA virus paradox (i.e. all RNA visuses have a common ancestor around 40-50 thousan years ago) because it cannot be solved within the required time frame. Due to their paradigmD arwinians, have failed to recognize VIGEs as important genetic elements to induce variation from the inside. Junk DNA is the other fatal flaw of the darwinian paradigm. Variation inducing genetic elements, my dear, that is what makes up the junk.

Funny, I thought “junk” DNA was a surprise to those who found it, rather than a prediction of the then current theory?

Henry

peter borger, biologist, PhD said:

there are no real orphan genes.

Great. Now tell Nelson that, including your evolutionary hypotheses for ERVs evolving from the genome.

peter borger, biologist, PhD said:

the RNA virus paradox (i.e. all RNA visuses have a common ancestor around 40-50 thousan years ago)

Oops, forget your hypothesis then. If they are evolving from the genome all the time, and you reject LGT, your hypothesis should AFAIU fail to predict the above.

Now, I’m not a biologist but I see one test the hypothesis of ERV’s as retroviruses pass - they arrange in a phylogeny together.

peter borger, biologist, PhD said:

Darwinians have largely ignored the RNA virus paradox (i.e. all RNA visuses have a common ancestor around 40-50 thousan years ago) because it cannot be solved within the required time frame.

Again, not my area, and I wouldn’t want to cherrypick in my ignorance - but this paper gives a great many possible pathways that all meet the time frame. Incidentally, on the topic of cherry-picking, it problematizes the phylogeny above:

it is striking that comparisons of viruses from different families reveal extreme sequence divergence, such that they are often no more similar than random sequences would be (47). Indeed, the “tree” of all RNA viruses is highly distinctive in that it is composed of relatively close tips, representing members of each viral family, connected by internal branches of generally unquantifiable length.

But I will take the review article for granted, there is a valid phylogeny for the subset of retroviruses, however divergent.

Interesting topic indeed, and I thank you for pointing out some current questions that a layman ordinarily wouldn’t know about.

Torbjörn Larsson, OM said:

a great many possible pathways that all meet the time frame

I should say; or moves the time frame.

About this Entry

This page contains a single entry by Ian Musgrave published on May 15, 2008 8:52 AM.

Tangled Bank #105 was the previous entry in this blog.

The subtly different squid eye is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.361

Site Meter