98.77% Wrong

by Joe Felsenstein,
http://evolution.gs.washington.edu/felsenstein.html

Over at Uncommon Descent (in this thread) “niwrad” presents a calculation, lengthily explained, showing that the assertion that human and chimp genomes differ by 1% in their base sequence is wrong.

What “niwrad” does is extraordinary. Choosing random places in one genome (doing this separately for each chromosome) “niward” takes 30-base chunks, and then looks over into the other genome to see whether or not there is a perfect match of all 30 bases. This turns out to occur between 41.60% of the time and 69.06% of the time in autosomes (it varies from chromosome to chromosome). The median is about 65%.

So the difference is really 35%, not 1%, right? Not so fast. If two sequences differ by 1.23% (the actual figure from the chimp genome paper), a one-base chunk will match 98.77% of the time. A two-base chunk will perfectly match (0.9877 x 0.9877) of the time. And so on. A 30-base chunk will match a fraction of the time which is the 30th power of 0.9877. That’s 0.6898 of the time.

So the 65% figure is pretty close to what is expected from a difference of 1.23% at the single-base level. However the penny hasn’t dropped yet over there (as of this writing, anyway). One commenter (“CharlesJ”) has asked whether there isn’t about a 1 in 4 chance of a 30-base mismatch if the difference is really 1%. That’s correct, and “niwrad” has (somewhat incorrectly) replied that it’s actually 1 in 3. This is a bit wrong but one way or the other the whole article goes up in smoke. “niwrad” has not figured that out yet.

Of course what creationists never do when they get upset about the 1% figure and claim it is Much Higher Than That is to compare that figure with the percentage difference with the orang genome or the rhesus macacque genome (gorilla isn’t available yet). Those are of course higher yet, no matter how you calculate the figure, leaving the chimp as our closest relative.