# 99.9% Wrong

by Joe Felsenstein,
http://evolution.gs.washington.edu/felsenstein.html

Over at Uncommon Descent, “niwrad” is back with more calculations showing that conventional figures for comparing sequences of genomes are all wrong. Last time “niwrad” showed that humans and chimp genomes match only about 62% of the time. The usual figure given is 98.77%. Niwrad did this by taking 30-base chunks of one genome, finding the best match in the other genome, and then asking what fraction of the time there was a perfect match of all 30 bases. That’s where the 62% figure comes from. I immediately pointed out here at PT that this was expected and did not represent some insightful new way of calculating these figures.

Now Niwrad has turned to comparing two human genomes. The figure for 30-base perfect matches is about 96%. The conventional figure is about 99.9%. Let’s see what is expected. If a single base position has a 0.999 probability of matching, two bases have a 0.999x0.999 probability, three bases a 0.999x0.999x0.999 probability. 30 bases then have a probability that is 0.999 raised to the 30th power. Which turns out to be (ta-da!) 0.97. Not a bad fit.

Niwrad proudly notes that in the previous discussion

it seemed to me that the general feeling at the end was that my statistical method for performing genome-wide comparisons might have some merit, after all.

(Niwrad must have missed the discussion over here).

It does have merit: It’s a way of taking a close match and making it sound much less close – without changing anything. I have a suggestion: why not try 100-base chunks? That way human/chimp match will drop to only about 29%, while human/human will drop to 90%. Or how about 1000-base chunks? (human/chimp would be only about 0.00042 of a percent, and human/human would be down to about 37%). Where will this all end?

I wonder if he can turn his vast statistical insights to disproving the religion of anthropogenic climate change, now that he’s knocked the legs out of Evolutionism. Then the Two Pillars Liberalism will be demolished, like Samson in the temple of the Philistines!

Why is it that all creationists are good at is word games and number games? Does this guy really suppose that anyone will be fooled into thinking that humans and chimps are not really related?

DS said:

Why is it that all creationists are good at is word games and number games? Does this guy really suppose that anyone will be fooled into thinking that humans and chimps are not really related?

I am unsure what the point of niwrad’s latest post is – unless it is to persuade readers that humans are unrelated to humans.

Why not extend the methodology to comparing the entire strand of DNA from each species (so long as we are bent on choosing a methodology which makes no sense whatsoever, to engineer the result we desire)? If there is a single discrepancy between any two bases, the result is a 0% match.

So, as Joe says, no two humans are related to each other, not identical twins or even perfect clones–even they would have some mutations due to methylation, copy number variation, etc., which would cause a mismatch.

This is your brain on ID.

Could this be an excellent algorithm to accurately predict the magnitude of the ever shrinking creationist brain, maybe?

It should surprise nobody the Niwrad gets everything backwards. Nonetheless those who are sure that the existence of monkeys falsifies evolution will likely find his arguments convincing.

Over at Uncommon Descent in Niwrad’s thread, a commenter who is supportive of evolutionary biology, “DrREC”, has suggested correcting the percentage mismatches by dividing them by 0.24. This is arbitrary. The proper correction is to start from the fraction of match and take its 30th root. With a match of 96% for 30-base pieces, this yields a match of 0.99864 (a mismatch of 0.00146).

For the human/chimp match of 0.6173 (as given by Niwrad) we get 0.9805, or a mismatch probability of 0.0195 at the single-base level. That is somewhat higher than expected. For the median match (over chromosomes) of about 0.67, the corresponding single base match is 0.9867, for a mismatch of 0.0133.

It’s cute watching IDers trying to do science, isn’t it? Reminds me of watching three year olds creating artwork with their fingers and a few globs of paint in bright primary colors.

Is that what they do with their “microevolution”?

Of course not. This is part of their whole idiotic pretense that somehow the same type of evidence that indicates that “microevolution” occurred does not indicate that “macroevolution” happened.”

Stark illogic is necessary of ID/creationism.

Glen Davidson

Joe, I get 0.6173^(1/30) = 0.9840 so a probability mismatch of 1.60%, slightly but not amazingly higher than official stats. I notice his number of trials remained constant despite differing chromosome length - and he had a very low match rate on the Y chromosome, which in any case should surely only count for half? However I weighted the results properly and it made only a slight difference, changing to 0.6260 (mismatch drops to 1.55%).

Otherwise his methodology of using a 30-base stretch is actually quite good, if it is then subjected to the correct probability treatment, since the chance of a random match is vanishingly low - as niwrad confirms directly. The interesting thing which he confirms, all unwitting, is the directness of the match between the chromosomes.

Joffan said:

Joe, I get 0.6173^(1/30) = 0.9840 so a probability mismatch of 1.60%, slightly but not amazingly higher than official stats. I notice his number of trials remained constant despite differing chromosome length - and he had a very low match rate on the Y chromosome, which in any case should surely only count for half? However I weighted the results properly and it made only a slight difference, changing to 0.6260 (mismatch drops to 1.55%).

Otherwise his methodology of using a 30-base stretch is actually quite good, if it is then subjected to the correct probability treatment, since the chance of a random match is vanishingly low - as niwrad confirms directly. The interesting thing which he confirms, all unwitting, is the directness of the match between the chromosomes.

And not only that, but with more comparisons he could actually reconstruct the nested hierarchy of genetic similarities between all living organisms. Now I wonder how he would explain that result?

I guess in this case it would be like reinventing the wheel as an oval.

Where will this all end?

The logical end necessarily has to be when we consider 3.2 billion base-pair chunks of the human genome. Then, there will be absolutely no chunks in common with the genome from any other organism, proving once and for all that we are each God’s special creation.

Joffan said:

Joe, I get 0.6173^(1/30) = 0.9840 so a probability mismatch of 1.60%, slightly but not amazingly higher than official stats. I notice his number of trials remained constant despite differing chromosome length - and he had a very low match rate on the Y chromosome, which in any case should surely only count for half? However I weighted the results properly and it made only a slight difference, changing to 0.6260 (mismatch drops to 1.55%). …

Thanks, I get the same numbers as you, but somehow when I type them into the comment box they change, and then I subtract in my head using those, and …

The decay in match rate vs. segment length seems to provide a lot of information about the structure of the genome. Could you describe what is going on in the genome that accounts for this?

Some questions that pop into my mind: What is the segment length to describe a single protein? How many mismatches within a protein give a functionally similar molecule? Is the decay dominated by segments in junk DNA? Is there a critical segment length where the mismatches tell you about different structural aspects of the genome?

One should remember that Uncommondescent.com started as the blog of someone who holds masters degrees in statistics and mathematics and a PhD in mathematics. He is still posting there but obviously doesn’t give a shit about other IDiots exposing the vacuity of ID-creationism.

As DS points out, Niwrad’s recalibration is completely irrelevant to the hypothesis of common descent. If he compares humans to chimps, he will find greater similarity than humans to dogs, which will be greater than humans to fish, which will be greater than humans to algae, and so on.

If he uses existing genome databases and applies his personal algorithm, he will still end up reconstructing all the taxonomic hierarchies from first principles and they will look almost the same as the standard evolutionary view. It’s like claiming Lyons isn’t in France because the Paris-Lyon distance is not the same number in miles and kilometres.

jeff said:

Some questions that pop into my mind: What is the segment length to describe a single protein?

It varies a great deal. Remember that there are numerous introns of various sizes so a protein that has ~100 amino acids only needs a minimum of ~300 base pairs, but with introns, it might be ~1000 base pairs or more.

How many mismatches within a protein give a functionally similar molecule?

Look at the sequence homology between a “highly conserved” protein like cytochrome c between human and yeast, human and wheat, human and dogfish etc. There are respectively 11, 10 and 6 differences within the first 22 amino acids see here. There can be much greater divergences with other proteins.

Is the decay dominated by segments in junk DNA?

Presumably so, according to theory, with “junk DNA” evolving at a much faster rate.

topquark said:

It’s cute watching IDers trying to do science, isn’t it? Reminds me of watching three year olds creating artwork with their fingers and a few globs of paint in bright primary colors.

I think that you are underrating the esthetics of three year olds.

The manipulation of numbers by an evolution denier reminds me of predicting the Rapture.

jeff said:

How many mismatches within a protein give a functionally similar molecule?

As has been said, it really depends on the protein. Some highly conserved proteins only accumulate a relatively small number of substitutions at the amino acid level (3rd codon position mutations at the nucleotide level don’t usually lead to a change in the amino acid). Some get quite divergent. Compare human sequences to some of the rapidly evolving unicellular eukaryotes like Microsporidia or say Trichomonas vaginalis and you can see sequence similarities at the amino acid level at only 50% or lower and yet the proteins are still functional and can complement their homologs in yeast just fine.

It really all depends on the protein, what it does, how many interactions with other proteins it has, etc.

sparc said:

One should remember that Uncommondescent.com started as the blog of someone who holds masters degrees in statistics and mathematics and a PhD in mathematics. He is still posting there but obviously doesn’t give a shit about other IDiots exposing the vacuity of ID-creationism.

His degrees in mathematics never stopped him from playing the exact same stupid word and number games when it comes to probabilities and information theory.

jeff said: How many mismatches within a protein give a functionally similar molecule?

All but two amino acids have multiple sequences associated with them. For example, the sequences GCT, GCC, GCA, and GCG all produce alanine. Niwrad’s method counts these as differences.

Joe Felsenstein: I have a suggestion: why not try 100-base chunks?

If you really want to see how stupid his method is, try 3-base chunks. Hey look, at position X you use GCT to produce alanine and I use GCG. We are 33% different!!

jeff said:

The decay in match rate vs. segment length seems to provide a lot of information about the structure of the genome. Could you describe what is going on in the genome that accounts for this?

I have been puzzling about that. There are more direct ways of assessing the distribution of differences throughout the genome, but one issue is simply whether differences are clustered. There are a number of reasons to expect them to be clustered:

(1) Variation in degree of conservation of sequences by natural selection. Some sequences have function, some don’t, some have more essential function than others.

(2) Variation of mutation rates along the genome.

(3) Variation of coalescent depth. For closely related sequences such as Human compared to Human, individual loci vary in how closely related they are owing to the random variation of recency of common ancestry due to genetic drift. The genealogical trees of different loci within a species vary. These trees, which are not phylogenies as they are of individual gene copies within one species, are called “coalescents”.

So on a priori grounds we expect some clustering of differences. But here’s the catch: given the numbers found by Niwrad, there appears to be, not clustering, but overdispersion. Clustering, by concentrating differences, would make the probability of sharing of a 30 base-pair segment greater than we would expect from the degree of single-base similarity. The calculations I have been using here are all based on independence from one base to another along the 30 bases. If that assumption is violated by clustering, we have a greater chance of seeing an identical 30-base chunk.

But in the calculations, there seems to be a slightly smaller chance of seeing an unchanged 30-base chunk. So that would hint at differences being, at least slightly, overdispersed (the opposite of clustered). Why? One possible bias in the calculation is that some 30-base pieces do not have counterparts that can be found in the other genome, and those get dropped from the calculation. Wouldn’t that bias the frequency of matches upwards? I would guess so. However what is seen is that it is lower then expected.

All this needs investigation by more direct calculations than the frequency of 30-base exact matches. The one thing the simple calculations that we have done here show is that a huge difference between the frequency of 30-base matches and 1-base matches is expected, and is not a refutation of the 1-base figure. However the fact that these simple calculations are a bit off in the wrong direction is intriguing.

Why not 3.2 billion bp chunks? Then the similarity of human/chimp drops to 0%? ( come to think of it it drops to 9 long before 3.2 gb

This new mathematical field of Chunkology (which is so new that it is not even in my spelling checker) can lead to some very interesting questions and conclusions. E. g., what is the minimum chunk size needed to show that there is zero similarity between the four gospels, thus demonstrating that the gospels are totally different stories about totally different persons and events?

The first time JF went around on @nirwad’s “chunkology” I was puzzled about what was going on until I thought it out for a bit: “Oh. That’s really silly.”

What is appalling about crackpots is not that they will make blindingly silly arguments, but that when the obvious flaws of the arguments are pointed out to them, they blindly go on with the silly arguments anyway.

One of the advantages of being a crackpot is that people will find it hard to believe that anyone could be so obstinately stupid and be far more patient than is sensible. Thanks to sad experience I am far less patient than I used to be, and my sufferance of fools correspondingly limited.

It does have merit: It’s a way of taking a close match and making it sound much less close – without changing anything. I have a suggestion: why not try 100-base chunks? That way human/chimp match will drop to only about 29%, while human/human will drop to 90%. Or how about 1000-base chunks? (human/chimp would be only about 0.00042 of a percent, and human/human would be down to about 37%). Where will this all end?

This is precisely the depth and subtilty of Niwrad’s argument all the evolutionists have failed to miss. The greater the chunks of DNA used, the greater the ratio of percentage difference between the human/human and human/chimp comparisons. The broader the scope of the examination, the more different humans and chimps become.

The evolutionists have observed H-O in the IR spectra of both water and pee and concluded they are the same thing. However, intelligent design theorists observe the entire spectrum and conclude they are different. This analogy is perfect.

OK, DD, you’ve shown your hand … I am wondering how many of the more exciteable Pandas won’t notice the cute little double-negative in “failed to miss” and think you’re serious.

darwinism.dogBarf() said: The greater the chunks of DNA used, the greater the ratio of percentage difference between the human/human and human/chimp comparisons. The broader the scope of the examination, the more different humans and chimps become.

But you forgot the first part: the broader the scope, the more different humans and humans become too. Niwrad’s method could be used to show you are not closely related to your own mother.

In sane people, this result would be considered a clue that niwrad’s methodology is flawed. But creationists look at the human/human and human/chimp results, see one they like, and pretend the other doesn’t exist. Ahh, confirmation bias. How do creationists love thee? Let me count the ways.

There’s one.

mrg said:

What is appalling about crackpots is not that they will make blindingly silly arguments, but that when the obvious flaws of the arguments are pointed out to them, they blindly go on with the silly arguments anyway.

I think of that every time I hear the argument that Human Involvement in Experiments = Evidence for ID, as if intelligence was a magic serum or contagious disease that would cause natural processes to change. It’s so obviously goofy its amazing that anyone would propose it once, much less after the glaring flaws are pointed out.

Wheels said: 4) Checking their own work rigorously would require some knowledge of how and why scientists do that. Statistical tests, expanding the techniques to different scenarios (in this case comparing more than just humans and chimps), and all that stuff? It doesn’t come naturally, that’s something you get into the habit of doing with a science background and training.

Yeah, along with this is the crackpot notion of: “All this picky stuff the scientists do is just a smokescreen for their ignorance. I can see the forest for the trees – I can see the obvious truth that the scientists have blinded themselves from seeing with all their twaddle.”

And it should be noted that crackpots are not all that shy about trying to shove their views onto the scientific community – John Baez’s well-known “crackpot checklist” was obviously derived from experience with nutjobs thinking they’ve refuted Einstein and so on.

Staggeringly, *I* get missives from such nutjobs, and I’m a nobody – I tinker with physics on my website, but I have no particular qualifications in the field and have nothing resembling a reputation in it.

One of the other aspects to the matter is the influence of the internet on crackpots. Once upon a time they could mail letters to each other and form small circles of association; but with the internet there’s been a democratization of worldwide communications, the lowliest crank now has global reach. Crackpots from all over the planet can band together, collaborate in pumping up the volume on misinformation, and create their own system for its dissemination, even with its own journals. ISCID anyone?

Well anyone who doesn’t even know what a professional does should realize that they are an amateur. An amateur shouldn’t presume to know better than the professionals.

Conspiracy theories are are worthless in science. No matter whether you trust the system or not, if you are not willing to participate within the system you really can’t complain about not being taken seriously. All of the great advancements in science were accomplished by people who got the evidence to back up their claims in spite of severe resistance. Claiming that everyone is against you is just a cop out.

Of course creationists many actually believe any or all of these excuses. But once again, at some level, at least some of them must realize that these are just rationalizations.

DS said: An amateur shouldn’t presume to know better than the professionals.

I once cited the old saying: “A fool can ask more questions than a wise person can answer.”

The rejoinder was: “But isn’t the reverse also true?”

That took me by surprise and I didn’t think of the reply until later: No. A fool has an infinite store of answers, easily pulled out of his ass as needed.

mrg said:

DS said: An amateur shouldn’t presume to know better than the professionals.

I once cited the old saying: “A fool can ask more questions than a wise person can answer.”

The rejoinder was: “But isn’t the reverse also true?”

That took me by surprise and I didn’t think of the reply until later: No. A fool has an infinite store of answers, easily pulled out of his ass as needed.

Which only shows that you sir are no fool.

The opposite is true as well, for every crackpot there is at least one blog tearing them apart. I think that in the long run it will advantage us more than them.

mrg said:

One of the other aspects to the matter is the influence of the internet on crackpots. Once upon a time they could mail letters to each other and form small circles of association; but with the internet there’s been a democratization of worldwide communications, the lowliest crank now has global reach. Crackpots from all over the planet can band together, collaborate in pumping up the volume on misinformation, and create their own system for its dissemination, even with its own journals. ISCID anyone?

The opposite is true as well, for every crackpot there is at least one blog tearing them apart.

But what if they use duct tape on that cracked pot?

MichaelJ said: The opposite is true as well, for every crackpot there is at least one blog tearing them apart. I think that in the long run it will advantage us more than them.

Yeah, thankfully you’re right there. I don’t worry too much about 911 Troothers because they can’t put out a movie like LOOSE CHANGE or the like and not have a website pick it to pieces.

Thank Bob the Troothers don’t have any big-name advocates – the most prominent being a retired theology professor and a Hollywood actor noted for his long arrest rapsheet. When Ollie Stone did WORLD TRADE CENTER, there were worries he was going to sell the Troother story to America, but he was surprisingly restrained.

Karen S. said:

Robert Byers is trolling over at BioLogos. It’s hysterically funny! Just look at the Stephen Hawking thread. You’ll have to go change into dry pants after reading it.

Ok, I went to look at the thread at BioLogos, and the stupidity and the pride in stupidity shocked me, even after years of following Pandas Thumb and Pharyngula, Bad Astronomy, etc. My jaw dropped. I am still staggered. I must thank you for calling my attention to it, but at the same time, part of me regrets having read what I read.

[I wrote more, but…’nuff said. Wow. Still gobsmacked.]

It’s almost impossible to believe that this guy “niwrad” is still pushing this.

These people have no capacity for self awareness whatsoever.

We’ve been through this before.

If you look at correctly matched parts of the human genome and chimpanzee genome, there is a greater than 98% chance that, at a given individual locus, the base pair will be the same. Let’s call that probability “p”.

If you want to know the probability that, examining two loci, both will be a match, it is p^2.

If you want to know the probability that, examining “n” loci, all with be matches, it is p^n.

This is probably taught in high school and always taught in basic college statistics, but it is also well understood by many people with little formal education who enjoy games that involve cards or dice. It is very, very basic.

I hate to use insulting language, as it can distract, but for the sake of third party readers, I think it is critical to point out that if English words such as buffoon, moron, imbecile, idiot, jackass, arrogant, delusional, disturbed, pathetic, dull, stupid, egocentric, narcissistic, pitiful, etc, are to be used, they should be used to describe “nirwad”.

His argument boils down to “p^n = p^n, therefore magic instead of evolution”.

Has anyone taken this bozo’s side and tried to explain what can possibly be gained by this type of analysis?

There’s an even more remarkable statement over at Uncommon Descent right now. Gil Dodgen is, as always, drawing dramatic conclusions that Darwinism has collapsed and that scientists refuse to recognize it (he’s very good at drawing that conclusion – evidence is another matter).

Anyway, he opens with a statement that, for once, evolutionary biologists can agree with:

At UD we have many brilliant ID apologists, and they continue to mount what I perceive as increasingly indefensible assaults on the creative powers of the Darwinian mechanism of random errors filtered by natural selection.

I really can’t think of anything to add to that.