# 98.77% Wrong

by Joe Felsenstein,
http://evolution.gs.washington.edu/felsenstein.html

Over at Uncommon Descent (in this thread) “niwrad” presents a calculation, lengthily explained, showing that the assertion that human and chimp genomes differ by 1% in their base sequence is wrong.

What “niwrad” does is extraordinary. Choosing random places in one genome (doing this separately for each chromosome) “niward” takes 30-base chunks, and then looks over into the other genome to see whether or not there is a perfect match of all 30 bases. This turns out to occur between 41.60% of the time and 69.06% of the time in autosomes (it varies from chromosome to chromosome). The median is about 65%.

So the difference is really 35%, not 1%, right? Not so fast. If two sequences differ by 1.23% (the actual figure from the chimp genome paper), a one-base chunk will match 98.77% of the time. A two-base chunk will perfectly match (0.9877 x 0.9877) of the time. And so on. A 30-base chunk will match a fraction of the time which is the 30th power of 0.9877. That’s 0.6898 of the time.

So the 65% figure is pretty close to what is expected from a difference of 1.23% at the single-base level. However the penny hasn’t dropped yet over there (as of this writing, anyway). One commenter (“CharlesJ”) has asked whether there isn’t about a 1 in 4 chance of a 30-base mismatch if the difference is really 1%. That’s correct, and “niwrad” has (somewhat incorrectly) replied that it’s actually 1 in 3. This is a bit wrong but one way or the other the whole article goes up in smoke. “niwrad” has not figured that out yet.

Of course what creationists never do when they get upset about the 1% figure and claim it is Much Higher Than That is to compare that figure with the percentage difference with the orang genome or the rhesus macacque genome (gorilla isn’t available yet). Those are of course higher yet, no matter how you calculate the figure, leaving the chimp as our closest relative.

I thought they had the Fig Newton of informational type stuff on their side. Can’t he perform the irreducibly complex calculations that are required?

How do they explain the one-to-one correspondence between chimp and human chromosomes and bands? Do they have a twisted calculation for that? How do they explain all of the other genetic data such as SINE insertions and mitochondrial DNA? Let me guess…

The comparison I performed was completely different from those usually performed by geneticists, because was purely statistical in nature.

Bwahaha. As opposed to the geneticists who perform analyses based both in statistics and genetics. (Note: using Monte Carlo doesn’t make a method magically statistical, purely or otherwise.) I don’t think niwrad has any understanding of how the statistics that geneticists use actually work. He cites the results of the chimp genome paper without ever bothering to understand what units it is in. As Joe has pointed out 98+% similarity, is a statement about per-aligned-base similarity. Estimating a 30-mer dictionary distance is not going to magically change the results of per-aligned-base similarity.

Take some of my own research. Using about 12,000 orthologus nucleotides in humans and chimps, I estimated evolutionary divergence using statistically sophisticated expectation maximization and hidden Markov model techniques. As you see in Figure 3, humans and chimps are about 1.25% divergent (Look, error bars!). To put it another way, in humans and chimps 79 out of 80 ortholgous nucleotides have not changed since their common ancestor. Mice and rats are 16.8% divergent, meaning that 5 out of 6 have not changed.

Despite all their fascination with human-chimp divergence, ID creationists never get around to explaining how two species of vermin are 13 times more divergent than humans and chimps.

What percentage of the total genome of both chimps, and humans have been compared against each other?

IBelieveInGod said:

What percentage of the total genome of both chimps, and humans have been compared against each other?

According to the UCSC Genome browser, it’s at least 98%.

I’ll also point out that one does not need to look at all 3 billion or so bases to estimate a divergence on the scale of 1.25%. Even with 12,000 bases (which is low given modern data), the potential error of my estimate was 0.2%. Thus from my data the net divergence between the species is nearly certain to be between 1%-1.5%.

However, if the two genomes were really 95% similar or more, as is commonly claimed, also a 30BPM statistical test should produce 95% results, and it does not.

Epic Fail!

What’s new about an epic fail?

Any argument put froward by ID or YEC is an epic fail. All are

IBelieveInGod said:

What percentage of the total genome of both chimps, and humans have been compared against each other?

At first I thought you were trollingly trying to change the subject away from the fact that Niwrad’s entire argument rests on a simple math error.

But then I thought, ah hah, you’re making a subtle reference to the fact that Niwrad made and equally simple and stupid error by using randomly-selected 30-base chunks of each chromasome. That is an extremely small percent of each genome to compare against each other.

Bravo IBIG for highlighting yet another problem with this design argument.

And hey, guess what?

If instead of comparing only 30 nucleotides, you compare all (approx.) 3,000,000,000 nucleotides in our genomes, the identity is zero!

By Jove, this mathemajigger has disproved evolution! Praise be to the pink unicorn!

IBelieveInGod said:

What percentage of the total genome of both chimps, and humans have been compared against each other?

What percentage of my questions have you answered? Here are a few more for you:

How do you explain the one-to-one correspondence between chimp and human chromosomes and bands? Do you have an explanation for that? How do you explain all of the other genetic data such as SINE insertions and mitochondrial DNA? Let me guess…

I thought this retard had been banned from all threads except the bathroom wall. He should definitely be segregated from decent society. He has spewed three hundred and fifty pages of filth all over the bathroom wall. Don’t let him do it here.

The error is shockingly crude and childish. The person behind “niwad” is a dolt.

I will break it down to an even simpler analogy.

Imagine two equal length sequences of symbols.

One consists solely of “A”’s. It looks like this “AAAAAAAA.….”

The other is 99% A’s, but 1% B’s. The exact location of the individual B’s is not predictable.

A strand of it might look like this “ABAAAAAAAAAAAAAAAAAAA…”

Any dunce can see that if we examine long enough segments, the sequences will be 99% identical. That is, 99% of the time, the symbol at position “n” in the first strand will be identical to the symbol at position “n” of the second strand.

Most people can also see, however, that probability of a sequence of length “m” chosen from one being identical to the same position, same length sequence from the other is (0.99)^m.

Let’s imagine a truly asinine person who wants to argue against “the strands look similar theory” for ideological reasons.

He could randomly sample segments of length “m” from either strand and see if they had the exact same sequence as the same position, same length segment from the other strand.

The larger an arbitrarily chosen “m” becomes, of the course, the lower the probability that the entire sampled sequence will be identical between the two.

This is exactly what niwad has done, using m = 30.

To put it another way, his argument is exactly the same as arguing that two equal length series of coin flips will on average be 50% identical, because any given sequence of thirty coin flips has a less than 50% chance of being identical to the next series of thirty coin flips.

I am shocked – shocked! – to discover that “nirwad” has made what he believes to be a major innovation in how we compare genomes to quantify difference, has applied it to actual data, but yet failed to submit this breakthrough to a peer-reviewed journal for publication.

Another opportunity to build the scientific infrastructure for ID squandered through an abysmally bad research and publication strategy. It’s almost like they don’t really want knowledgeable review of their work …

And of course by “nirwad” I meant “niwrad”. My deepest apologies for misspelling the pseudonym.

With all the sincerity I can muster,

SWT

eric said: (in part of a response to trolling by IBelieveInGod):

… Niwrad’s entire argument rests on a simple math error.

I wouldn’t call it a math error so much as comparing apples to apple sauce.

But then I thought, ah hah, you’re making a subtle reference to the fact that Niwrad made and equally simple and stupid error by using randomly-selected 30-base chunks of each chromasome. That is an extremely small percent of each genome to compare against each other.

Taking a 30-base chunk (niwrad took a large number of them) and seeing whether each has a match in the other genome isn’t itself bad – it will mostly find matches at the corresponding location. And ten thousand of those, sampled, is a pretty good sample.

What the problem is, is that there are 30 bases and a mismatch of one base is enough to make the whole thing count as a 100% mismatch. Horribly biased. If niwrad had instead counted the fraction of the 30 that matched, and averaged that, the result would have been closer to 1.23%.

Our peerless leader Reed Cartwright has pointed out to me that there is also a major response to niwrad’s silliness at Todd Wood’s blog, Todd being a creationist but an honest biologist.

Todd Wood, the world’s only honest creationist*.

Therefore, also, the world’s least psychologically tormented, but also loneliest, creationist.

*I count only people who actually had access to an education, but choose to deny scientific reality, as creationists. Historical figures from pre-scientific times, or people who have been involuntarily education deprived, don’t count.

I have said before that The Fundamental Misconception of the ID/creationists goes right back to Henry Morris’ pitting the “myth of evolution against the science of thermodynamics.”

Here is niwrad on the thermodynamic argument. It is not surprising that he gets this wrong also.

It is that fundamental misconception that drives all “statistical calculations” by the ID/creationists. They know with out a shadow of a doubt that “everything descends into chaos without a guiding intelligence or program.” It’s because of entropy and the second law, despite the fact that they have learned to say publicly that they don’t believe evolution violates the second law (they have even learned to go out of their way to do some cheap calculations that shows it doesn’t). Nevertheless, their thinking reveals the fundamental misconception is still there.

Therefore all their “statistical arguments” begin by selection, using a uniform sampling distribution, from an essentially infinite set of possibilities. It proves evolutionists wrong 10150 percent of the time.

Joe Felsenstein said: If niwrad had instead counted the fraction of the 30 that matched, and averaged that, the result would have been closer to 1.23%.

Oops, should have been “If niwrad had instead counted the fraction of the 30 that did not match, and averaged that, the result would have been closer to 1.23%”.

Oh well, you all knew what I meant …

harold said:

Todd Wood, the world’s only honest creationist*.

I like to read Wood’s blog. I judge him as barking up the wrong tree, of course, but dang … the guy is a genuine article.

Wood’s attitude appears to be: “Evo science is on a solid basis as far as the evidence goes, but I believe there’s more to it than that, and if I do the grunt work I’ll be able demonstrate it and actually convince the science community of it.” Those more inclined to bait our resident creotrolls might ask them what they think of Wood.

Mike Elzinga said:

Here is niwrad on the thermodynamic argument. It is not surprising that he gets this wrong also.

I looked over it quickly, not trying to dig into an argument that was as specious as it was obscurely phrased, but I noticed he was making a linkup to “creationut information theory (CIT)” in it.

MrE, entropy is your hot button. CIT is mine. (Again, pronounced with a VERY soft “c”.)

BTW, on Todd Wood – as he points out at the end of his blog entry, as also noted by JF, all quibbling over the PRECISE percentage difference between human and chimp genomes is irrelevant. No matter how the pie is sliced, chimps still look well more like us genetically than they look like a gorilla.

mrg said:

I looked over it quickly, not trying to dig into an argument that was as specious as it was obscurely phrased, but I noticed he was making a linkup to “creationut information theory (CIT)” in it.

MrE, entropy is your hot button. CIT is mine. (Again, pronounced with a VERY soft “c”.)

The reason this entropy thing is my “hot button” issue is because I was around when Morris and Gish launched their attack on the biology teachers and on evolution. I have samples of the early writings of Morris and Gish in my files. I know exactly what they were trying to do; they even said it.

Indeed they attacked the fossil record and everything else; but that narrative of “pitting the myth of evolution against the science of thermodynamics” was an explicit program and articulated as such. It remains the centerpiece of creationist arguments even today; you can find it on the websites of ICR and AiG even though they don’t like attention drawn to it. They know they have been repeatedly corrected on this, yet they pushed the misconceptions anyway.

And Dembski and Behe picked it up in their approach to dealing with complex assemblies of molecules; so we see the clear track of the narrative right on into ID.

ID/creationist “information theory” is their “solution” to the “evolution vs. the second law narrative.” It is their “scientific” theory of their sectarian god. It is how they make their sectarian dogma scientific; and therefore superior to all other dogma.

They aren’t going to let go of this narrative; it has been too lucrative.

Mike Elzinga said:

ID/creationist “information theory” is their “solution” to the “evolution vs. the second law narrative.”

From their point of view, it’s actually better. The SLOT is well-defined, and any reasonable examination of the SLOT shows that it neither rules out nor confirms evo science.

“Information”, however, is not so well-defined and, appropriately, there are no well-defined physical laws associated with it. Is there “information” in the genome? Sure, but having said that, what do we know that we didn’t before? We could even come up with ad-hoc ways of measuring it – number of coding base pairs, for instance – but that really only allows us to compare genome sizes. It certainly doesn’t support derivation of any fundamental physical laws like this snatched-out-of-the-air “Law of Conservation of Information.”

In the end, however, the SLOT and CIT arguments are the same: “An unmade bed never makes itself.”

mrg said:

From their point of view, it’s actually better. The SLOT is well-defined, and any reasonable examination of the SLOT shows that it neither rules out nor confirms evo science.

Actually, this is not quite correct. Matter cannot condense without the 2nd law. To say that the entropy of an isolated system spontaneously goes to a maximum is also to say that matter interacts. You cannot say that entropy increases and deny that matter interacts; that is an oxymoron.

And we already know a great deal about how matter interacts, even in very complex systems. There are no known barriers to these processes continuing right on up to and including living organisms. No CIT is required (and that’s no CIT).

Why stop at 30-mers? How about comparing 100,000-mers? My genome wouldn’t even be 99% similar to my own parents at that level, which means I can’t possibly be related to them. Clearly I am the result of immaculate conception, so I’m the new messiah. And I say all Christians should believe evolution is true.

Game. Set. Match.

Nirwad says, “Maxwell’s demon is a thought experiment and is fictitious, nevertheless it clearly proves that intelligence can counter entropy in principle.”

Wow.

Of course what creationists never do when they get upset about the 1% figure and claim it is Much Higher Than That is to compare that figure with the percentage difference with the orang genome or the rhesus macacque genome (gorilla isn’t available yet). Those are of course higher yet, no matter how you calculate the figure, leaving the chimp as our closest relative.

Has this been done for the Bonobo? or is it in the works? Anyone know?

The depressing thing is that a number of people are spending time to refute this nitwit but he’ll just let the thread die and then refer back to his “successful counter of the 99% similarity myth” in a few months time. No honor, no shame, not even the slightest cognitive dissonance.

Ben W said:

Nirwad says, “Maxwell’s demon is a thought experiment and is fictitious, nevertheless it clearly proves that intelligence can counter entropy in principle.”

Wow.

I didn’t catch that on my quick glance through the article.

And so we prove that over a century of analysis showing that Maxie’s lil’ demon really IS a fiction is all twaddle. With the cherry on top being the visualization of what NW would say if EVIL-utionists tried to use an argument based on a “fiction”.

“Not gonna call them names, NW? I would bet you would.”

Not only are they misunderstanding this data, but they ignore the fact that no matter how close our genomes are to each other, 50 or 60 or 99%, it still means we’re… related. How closely or how distantly is hardly the point as far as proving whether or not we are related.

GEORGE said:

Not only are they misunderstanding this data, but they ignore the fact that no matter how close our genomes are to each other, 50 or 60 or 99%, it still means we’re… related. How closely or how distantly is hardly the point as far as proving whether or not we are related.

We do all have a common Creator:)

Matt Ackerman said:

And this is why I don’t like the 99% figure. 5% of the DNA between humans and chimps is 100% different, and the remaining 95% is 1% different. It seems likely that most of the phenotypic differences between humans and chimps arises from the 5% of genes which are unique to humans or chimps, and that the 95% of genes that humans and chimps share are less responsible for phenotypic differences.

Nope. You assume that 5% of DNA means 5% of genes, but almost all of that 5% is junk, mostly Alu repeats. Anyway, it’s silly to count each base of that 5% as equivalent to a point mutation, since we can account for all of it by only (!) 5 million mutations, while there are 35 million point mutations.

Just a quick note to point out that the invalidity of the basic approach of comparing blocks of 30 bases, and counting them as different if there is even one base of mismatch, has still not been understood by “niwrad” over at UD. He is a bit troubled by the objections put forth by “CharlesJ”, and says in their post #56:

Forgive me if I don’t understand what you mean in details. Nevertheless I agree with you that the results of the 30BMP test are not directly comparable to those in genomics literature. The 62% 30BPM similarity is not directly comparable with the 99% identity. We need a corrective coefficient. I agree with you also that such corrective coefficient differs depending on we do a 30BPM or 40BPM or 50BPM test …

To understand this I argue according to what I did in #29. Given two supposed genomes that match 99% a 30BPM test gives 70% matches. Since the real test gave 62% my first idea to obtain a 30BPM value comparable to 99% is to apply the simple formula: 99×62/70 = 87.7%. In other words the multiplier coefficient that we must apply to the 62% is 99/70 = 1.41.

Actually you don’t correct for block size that way. If there is an underlying similarity of p at the one-base level, the probability of a match when there are blocks of B bases is p raised to the Bth power. Call that Q. So to get back to p from Q you just raise Q to the 1/B power. Or use logs and get log(p) by dividing log(Q) by B.

With the exception of a couple of pro-evolution commenters there, the rest of them still think “niwrad” has proven that the underlying difference between humans and chimps is nothing like 1%. But then “niwrad” said in the post that

Now, I don’t personally believe that humans and chimps share a common ancestry, for a host of reasons that would take me too long to explain in this post.

so “niwrad” is not accustomed to “getting” basic scientific facts.

Michael Roberts said:

What’s new about an epic fail?

Any argument put froward by ID or YEC is an epic fail. All are

You might later have thought that “froward” is a typo. Not so!

fro·ward    /ˈfroʊwərd, ˈfroʊərd/ Show Spelled[froh-werd, froh-erd] –adjective willfully contrary; not easily managed: to be worried about one’s froward, intractable child.

Seems spot-on to me!

dpr

Oh come on. I call POE, again. Darwin spelled backwards! Correction factors so his made up metric can be compared to the way the real scientists do it? Come on, no one can be this stupid. He’s just yanking chains, milking it for all it’s worth. Has to be a POE.

As for humans and chimps not being related, I pointed out the evidence for that days ago. No amount of hand waving or soul searching is going to make that evidence go away. POE or not, this guy is just plain wrong. I do hope that he realizes that anyone dumb enough to fall for this nonsense probably won’t have the sense to understand that they have been duped, even after he fesses up and explains to them that it was all a scam to make them look stupid. That should be worth a laugh, seeing how many of them become POE deniers.

Proudly predicting creationist behavior since 1999.

Earlier, I said -

creationists - I used something called “logarithms” to figure that out quickly - don’t waste your time trying to understand)

Maybe someone thought I was being sarcastic. I wasn’t.

And indeed later, Joe Felsenstein noted this -

Creationist: “To understand this I argue according to what I did in #29. Given two supposed genomes that match 99% a 30BPM test gives 70% matches. Since the real test gave 62% my first idea to obtain a 30BPM value comparable to 99% is to apply the simple formula: 99×62/70 = 87.7%. In other words the multiplier coefficient that we must apply to the 62% is 99/70 = 1.41.”

JF: “Actually you don’t correct for block size that way. If there is an underlying similarity of p at the one-base level, the probability of a match when there are blocks of B bases is p raised to the Bth power. Call that Q. So to get back to p from Q you just raise Q to the 1/B power. Or use logs and get log(p) by dividing log(Q) by B”

In fairness, the creationist quoted here did figure out that niwrad was wrong, but was incompetent to correct him.

A key point here is that niwrad is not failing due to basic knowledge of biology. It isn’t that he thinks the human and chipanzee genomes are different because he has been fed false information about one of the genomes. He’s failing at the level of basic logic and basic math. And declaring himself a genius for doing so. And no-one at UD, a site ostensibly run by a PhD in a statistics-related field, is able or willing to correct him. Ignorance of the facts is easily correctable. Psychological problems so severe that they make you deny basic math and logic are not easily correctable.

Joe Felsenstein

so “niwrad” is not accustomed to “getting” basic scientific facts.

Again, Darwin (aka. niwraD) is clearly a educated person who thinks creationisms is a load of crap, and is simply having one over on the folks at Uncommon Descent.

If you ask me, it is in very poor taste to mock creationist in general for what niwarD is saying, since niwarD doesn’t believe it, and he is trying to look stupid, in order to make creationist look stupid.

DS -

Oh come on. I call POE, again. Darwin spelled backwards! Correction factors so his made up metric can be compared to the way the real scientists do it? Come on, no one can be this stupid. He’s just yanking chains, milking it for all it’s worth. Has to be a POE.

“Poe’s Law” refers to the general principle that religious extremists are so whacked out that those who try to parody them can’t be distinguished from the real thing, and vice versa.

This guy could be a parody of sorts, and the discussion so far would still be true (including the fact that anyone remotely familiar with genomic sequencing and/or very basic probability could correct him, and no-one at UD has done so). Bombastic, pompous names are common among narcissistic creationists, so a name that could imply “overturning Darwin” doesn’t give me much information.

My method of detecting possible parodies, which I believe produces better than random results, is as follows -

Just as most racists today deny racism and speak in coded language, most creationists, for whatever reason, choose not to speak openly about the hellfire, brutal executions, obsession with and misinterpretation of the relatively tiny proportion of the Bible that talks about sex, and (in many cases) modern ethnic biases that drive them. They will make not-very-veiled threats (“You’ll find out soon!”) when provoked, but tend to dissemble away from these topics.

Meanwhile, most parodists aren’t interested in mimicking dissembling, weaseling, and so on, as that isn’t much fun. They prefer to parody the “sinners in the hands of an angry God” type stuff of the past.

So when I see someone saying something like “Evolutionists are sodomites who will burn in hell”, especially without provocation, I think that there is a reasonable probability that it is a parody.

When I see a lot of dissembling, squirming, and weaseling, even when challenged, I know I am dealing with a real creationist.

As for this guy, the “great genius who easily overturns science with obviously incorrect math” is a common type of sincere creationist. So who can say for sure?

harold said:

DS -

Oh come on. I call POE, again. Darwin spelled backwards! Correction factors so his made up metric can be compared to the way the real scientists do it? Come on, no one can be this stupid. He’s just yanking chains, milking it for all it’s worth. Has to be a POE.

“Poe’s Law” refers to the general principle that religious extremists are so whacked out that those who try to parody them can’t be distinguished from the real thing, and vice versa.

This guy could be a parody of sorts, and the discussion so far would still be true (including the fact that anyone remotely familiar with genomic sequencing and/or very basic probability could correct him, and no-one at UD has done so). Bombastic, pompous names are common among narcissistic creationists, so a name that could imply “overturning Darwin” doesn’t give me much information.

My method of detecting possible parodies, which I believe produces better than random results, is as follows -

Just as most racists today deny racism and speak in coded language, most creationists, for whatever reason, choose not to speak openly about the hellfire, brutal executions, obsession with and misinterpretation of the relatively tiny proportion of the Bible that talks about sex, and (in many cases) modern ethnic biases that drive them. They will make not-very-veiled threats (“You’ll find out soon!”) when provoked, but tend to dissemble away from these topics.

Meanwhile, most parodists aren’t interested in mimicking dissembling, weaseling, and so on, as that isn’t much fun. They prefer to parody the “sinners in the hands of an angry God” type stuff of the past.

So when I see someone saying something like “Evolutionists are sodomites who will burn in hell”, especially without provocation, I think that there is a reasonable probability that it is a parody.

When I see a lot of dissembling, squirming, and weaseling, even when challenged, I know I am dealing with a real creationist.

As for this guy, the “great genius who easily overturns science with obviously incorrect math” is a common type of sincere creationist. So who can say for sure?

Absolutely. That was my point, which you have made much more clearly and eloquently then I could ever hope to.

Nope. You assume that 5% of DNA means 5% of genes, but almost all of that 5% is junk, mostly Alu repeats. Anyway, it’s silly to count each base of that 5% as equivalent to a point mutation, since we can account for all of it by only (!) 5 million mutations, while there are 35 million point mutations.

Typically comparisons of sequence gain and lost ignores repetitive areas of the genome, because areas with highly repetitive sequence are difficult to assemble. There is of course a bias of duplications and deletions to be in intergenic regions, which is also true of SNPs. However, this bias is surprisingly weak.

My numbers were actually coming from the structural divergence between human and chimpanzees genomes in protein coding regions. Approximately 6% of protein coding genes in human are absent in chimpanzees, and approximately 8% of protein coding genes present in chimpanzees are absent in humans (humans have experienced more lineage specific deletions than chimps) (Demuth et al. The Evolution of Mammalian Gene Families. PLoS ONE 1(1): e85. doi:10.1371/journal.pone.0000085)

It is widely agreed that structural divergence between us and our closest relatives is potentially responsible for a large proportion of the phenotypic divergence. Attempting to create figures that describe the total similarity in some abstract way seems to lead the public to the erroneous conclusion that there is insufficient genetic variation to account for phenotypic variation.

Few phenotypic differences have been traced to the molecular level, but several are already know to arise from structural variation, which is only to be expected. After all, HOX regulated genes depend on synteny (i.e. genes being next to each other) to determine patterns of expression, so it only makes sense that changes in synteny can be responsible for phenotypic divergence.

Ultimately, I can see no point to creating a number that can describe the similarity of any two genomes, even if there were a correct way to do so. Certainly it is important to study sequence identity (i.e. # of substitutions), because single nucleotide substitutions can be very reliably inferred from sequence data and patterns of substitutions are a rich source of data. Structural variants (i.e. insertions, deletions, inversions, and transpositions of greater than a few kb.) arise at a low rate in comparison to SNPs, but so what? Asking how similar human and chimp genomes might be an interesting high school science project, but why should I care?

Interesting questions, such as determining the relative contribution of mutational events to adaptive evolution, will not be answered by these sorts of analysis.

Matt Ackerman said: My numbers were actually coming from the structural divergence between human and chimpanzees genomes in protein coding regions. Approximately 6% of protein coding genes in human are absent in chimpanzees, and approximately 8% of protein coding genes present in chimpanzees are absent in humans (humans have experienced more lineage specific deletions than chimps) (Demuth et al. The Evolution of Mammalian Gene Families. PLoS ONE 1(1): e85. doi:10.1371/journal.pone.0000085)

This is a very liberal definition of “gene”. If a recent duplication has produced 2 copies of a gene in the human lineage, are those different genes, and can chimpanzees be said to lack a gene? Duplication is an important source of material for evolution, but I suggest that most duplications, like most other mutations, are evolutionarily meaningless. How many gene deletions separating humans and chimps are of single-copy genes and how many are of recent duplicates? Demuth, I notice, ascribes most changes in gene family size to neutral evolution.

I do agree that the most interesting questions here are about which differences are functional, and what those functions are. There are uses for distance measures, though.

If this is parody, my hat is off to niwrad – he/she has managed to convince the powers that be at UD to let him/her be a blog contributor, not simply a commentator.

John Harshman said:

This is a very liberal definition of “gene”.

No, I believe that the professional geneticists are using the definition of gene which is generally accepted in the scientific community. You are perfectly welcome to create your own definition, but don’t expect me to use it.

If a recent duplication has produced 2 copies of a gene in the human lineage, are those different genes.

Yes.

and can chimpanzees be said to lack a gene?

Yes.

They can be said to lack the duplicate which is unique to humans.

Duplication is an important source of material for evolution, but I suggest that most duplications, like most other mutations, are evolutionarily meaningless.

I strongly doubt it. I suspect that the majority of gene duplications are mildly deleterious.

How many gene deletions separating humans and chimps are of single-copy genes and how many are of recent duplicates?

50% of the deletions are of single copy genes. 50% are of genes in gene families with more than one copy. Page e85.

I do agree that the most interesting questions here are about which differences are functional, and what those functions are. There are uses for distance measures, though.

I didn’t say distance measures are useless; in fact, I list their uses. However, distance measurements do not measure some abstract ‘genotypic similarity’ because, as far as I am aware, no such goal can exist. When the general public reads the words ‘chimps are 99% similar to humans’ they assume scientist mean some sort of abstract genotypic similarity, which they do not.

I don’t really see anything you said that disagrees with my point.

Demuth et al. said:

“In total, our results support mounting evidence that gene duplication and loss may have played a greater role than nucleotide substitution in the evolution of uniquely human phenotypes, and certainly a greater role than has been widely appreciated.”

Geneticists may say that deletion of a recent duplicate is indeed loss of a gene, but saying it that way would also tend to deceive a layman into thinking that something important had just happened, like dropping your only copy of, say, cytochrome c. I really do think that most duplicated genes are subsequently deleted, either because they’re slightly deleterious or because their loss isn’t selected against. And I would further imagine that most of them are pseudogenized (real word?) before they are lost. In some cases, it may be the original copy that’s deleted, perhaps even in nearly half of those cases. No matter. None of this prevents gene duplication and loss from being important in evolution, and perhaps more important than point mutations, but I would be interested in seeing the evidence that it’s more important.

I do see that about half of all reductions of copy number, during the human lineage, in gene families that are inferred to have been present in the ancestral mammal have resulted in extinction of the family, generally by deletion of a single copy in a family that had been reduced previously to one copy. That doesn’t count gene families that weren’t in that ancestral mammal, and it doesn’t count families that may have had losses but didn’t have a net loss, but I’ll accept it as an estimate. Can we suppose that most of those losses were in moribund families, i.e. those that humans just weren’t using for much?

I read this on uncommon descent and the researcher is showing that humans and primates etc have like templates but the differences could not come from ToE. This is a line of investigation that others probably will pick up on ape/human sameness claims in time. As i told him biblical creationists should welcome as close a likeness to primate bodies as possible. its impossible upon looking at apes to not conclude God simply has one blueprint of life and twists things about. one computer program fits all. So people were simply given the best type of body in the equation of the blueprint. the ape body. Otherwise an entirely different kind of body would of had to be thought up that still included eyes etc. Our body is not relevant to conclusions on our origins. Looking for the differences ois a waste of time.

Not being a biologist, I’m confused by the discussion of genes without a discussion of whatever the control areas are called for turning genes on and off.

It seems obvious to me, for instance, that chimp arms are made with very similar genes to ours for all the various protiens– for hair, nails, bones, muscles, etc. The big difference between us is the longer length of chimp arms relative to the torso and hind legs.

Theoretically, a chimp could have exactly the same genes we do, and still look and act like a chimp, not a human, as different genes are turned on and off at different rates.

So, what are the control areas for the genes called, and how much research goes into differentiating the rates at which genes are turned on and off?

hoary puccoon said:

Not being a biologist, I’m confused by the discussion of genes without a discussion of whatever the control areas are called for turning genes on and off.

It seems obvious to me, for instance, that chimp arms are made with very similar genes to ours for all the various protiens– for hair, nails, bones, muscles, etc. The big difference between us is the longer length of chimp arms relative to the torso and hind legs.

Theoretically, a chimp could have exactly the same genes we do, and still look and act like a chimp, not a human, as different genes are turned on and off at different rates.

So, what are the control areas for the genes called, and how much research goes into differentiating the rates at which genes are turned on and off?

Regulatory regions or regulatory DNA sequences control gene action. There are 5 prime untranslated control regions, 3 prime untranslated control regions, intronic control regions, control regions within functional gene sequence, and control regions that can be very distant from the gene that are sometimes call locus control regions.

There are also gene products that control gene regulation that can be on other chromosomes.

Theoretically, a chimp could have exactly the same genes we do, and still look and act like a chimp, not a human, as different genes are turned on and off at different rates.

Probably more true than not.

Chimps and humans differ by 1.23% according to the OP. Two humans can differ by as much as 0.5% according to the human genome project.

The vast majority of the human chimp differences must be neutral drift.

When they sequenced the Neanderthal genome, they had a hard time finding differences between them and us.

One highly speculative calculation implied that the number of biologically significant differences between chimp and human genomes might be around 200 mutations.

niwrad realizes his error and abandons the field, citing the Vizzini defense (“Inconceivable!”)

Consider that this high figure is obtained under the following conditions very favorable to similarity:

(1) the ESM model helps to obtain high value of similarities; (2) the 30BPM test, for definition, is a lavish one because allows a total scrambling of patterns.

If one or both of these conditions is not applied the scenario can only get worse for similarity.

The conditions #2 implies that to speak of “identity” between genomes is nonsense, despite the high value obtained in the test. Besides 1.27% of difference in 3 billions base genomes makes 38 millions point mutations after all.

As a consequence the normalized result of the 30BPM test in no way supports the evolutionist claim of a common ancestor of these genomes. A blind evolution that changes and scrambles 38 millions bases is unthinkable.

I am satisfied of this work and wish to thank you for the collaboration.

FWIW, his overall number, including the sex chromosomes, works out to a similarity of 98.4% or a difference of 1.6%, assuming a random dispersal. Or about 48 million bp difference.

Ron and Raven– Thanks for responding. Do you know if anyone is actively working on the specific changes in control regions between chimps and humans?

Also, does anyone else suspect that niwrad is a sock puppet for Dembski himself? That line, “a blind evolution that changes and scrambles 38 millions bases is unthinkable” sounds exactly like his brand of ‘let’s see how much those rubes will swallow’ cynicism. Obviously, whether scrambling 38 million pairs is “unthinkable” depends on how many pairs there are total and how many generations separate chimps and humans. There must be about, what, at least a million generations separating chimps and humans, and three billion base pairs? That comes out to about 13 base pair changes per billion base pairs per generation.

Obviously, whether scrambling 38 million pairs is “unthinkable” depends on how many pairs there are total and how many generations separate chimps and humans.

Scrambling 38 million base pairs is not unthinkable. It is reality. Two humans can differ by up to 15 million base pairs. We know this by DNA sequencing. If having large numbers of base pair differences was deleterious or “unthinkable” whatever that means, we would all be dead and nonexistent.

This is the Fallacy of Argument from Being Stupid.

There must be about, what, at least a million generations separating chimps and humans, and three billion base pairs? That comes out to about 13 base pair changes per billion base pairs per generation.

The number of mutations per human generation is known, again from DNA sequencing. Each human is born with 150 new mutations compared to their parents.

Do you know if anyone is actively working on the specific changes in control regions between chimps and humans?

Probably they are. This work is slow because humans and chimps are not good experimental animals for obvious reasons. More of this sort of work is done in rodents. It can be slow and expensive because some of it involves transgenic mix and match type experiments.

Excuse my ignorance, but what is a POE?

Karen S. said:

Excuse my ignorance, but what is a POE?

It’s a reference to Poe’s Law, which says that attempts to mock fundamentalist behavior end up being indistinguishable from actual fundamentalist behavior, so delusional and nonsensical it is.

You will also see the term “Loki troll”, which is basically the same thing. I tend to prefer it because it’s a little more intuitive … it’s more associated with the TALK.ORIGINS forum.

“POE” can be interpreted as “Pretense Of Extremism”, but it reality it was orginally expressed by one Nathan Poe. No, it had little or nothing to do with Edgar Allen Poe.

Thanks mrg and Dale Husband for defining POE. I’ve believed for a long time that a certain poster on BioLogos named conrad is one of those. (Now I have a word for his kind.) I simply cannot believe that even a fundie could be so butt-clenchingly stupid.

Not being a biologist, I’m confused by the discussion of genes without a discussion of whatever the control areas are called for turning genes on and off.

Nova had a good program on regulatory genes, etc., and you can watch it online: What Darwin Never Knew

harold said:

creationists - I used something called “logarithms” to figure that out quickly - don’t waste your time trying to understand)

Maybe someone thought I was being sarcastic. I wasn’t.

And indeed later, Joe Felsenstein noted this -

Creationist: “To understand this I argue according to what I did in #29. Given two supposed genomes that match 99% a 30BPM test gives 70% matches. Since the real test gave 62% my first idea to obtain a 30BPM value comparable to 99% is to apply the simple formula: 99×62/70 = 87.7%. In other words the multiplier coefficient that we must apply to the 62% is 99/70 = 1.41.”

JF: “Actually you don’t correct for block size that way. If there is an underlying similarity of p at the one-base level, the probability of a match when there are blocks of B bases is p raised to the Bth power. Call that Q. So to get back to p from Q you just raise Q to the 1/B power. Or use logs and get log(p) by dividing log(Q) by B”

Sorry to have missed this. Yes, it is easy using logarithms as I later noted.

In fairness, the creationist quoted here did figure out that niwrad was wrong, but was incompetent to correct him.

Now the non-creationist commenter “CharlesJ” at UD has got the correct formula, but in a messy form. He says to obtain the probability of having a non-match in B bases, you just sum up the all probabilities of obtaining K mismatches using the terms for K>0 in a binomial distribution which has B trials with probabilities p of Heads.

But since the binomial probabilities sum to one, it is easier to compute the probability of a B-base match by using just the term for 0 mismatches, and then subtracting that sum from 1. In effect that is the method we have mentioned here.