Scurvy, Guinea Pigs, and You

| 15 Comments

In my last post, Common Design Errors, I proposed a problem for biblical creation. I received one response from a creationist, who cited Inai et al. (2003). This paper compared the largest set of homologous exons between humans, guinea pigs, and rats. You see, guinea pigs, like most primates and a few other taxa, lack L-guluno-gamma-lactone oxidase. Two sections were quoted to me.

When the human and guinea pig sequences (647 nucleotides in total) of the regions of exons 4, 7, 9, 10, and 12 were compared, we found 129 and 96 substitutions in humans and guinea pigs, respectively, when compared with the rat sequences (Fig. 2). The same substitutions from rats to both human and guinea pigs occurred at 47 nucleotide positions among the 129 positions where substitutions occurred in the human sequences. A high percentage of the same substitutions in the total substitutions (36%) indicates that there were many hot spots for nucleotide substitution throughout the sequences examined.

p. 316

Assuming an equal chance of substitution throughout the sequences, the probability of the same substitutions in both humans and guinea pigs occurring at the observed number of positions and more was calculated to be 1.84 x 10-12. This extremely small probability indicates the presence of many mutational hot spots in the sequences.

p. 317

This was to support the following response by the reader

Obviously if the same substitutions are occurring independently in the primate and guinea pig lineages, they could have arisen independently within the primates as well.

which was followed up by

If single sustitutions can occur independently, how do we know that larger deletions couldn’t as well?

However, the sections quoted from Inai et al. (2003) suffer from a major methodological error; they failed to consider that substitutions could have occured in the rat lineage after the splits from the other two. The researchers actually clustered substitutions that are specific to the rat lineage with separate substitutions shared by guinea pigs and humans. To illustrate this point, I am going to use sequences homologous to rat exon 10, for which we have the most data from other species.

The following figures list a section of rat exon 10 and the sections from ten species homologous to it. The sequences were retreived from NCBI and aligned using ClustalX. In the figure, periods represent nucleotides that are the same as rat nucleotides.

                                                                                                                  1
                     1         2          3         4          5         6          7         8          9         0          1         2          3         4          5         6     
            12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345
Rat         GGAGAAGACCAAGGAGGCCC TACTGGAGCTAAAGGCCATG CTGGAGGCCCACCCCAAAGT GGTAGCCCACTACCCCGTAG AGGTGCGCTTCACCCGAGGC GATGACATTCTGCTGAGCCC CTGCTTCCAGAGGGACAGCT GCTACATGAACATCATTATG TACAG
Guinea Pig  A................... .G........G...AG.... .....A..T........G.. ..C............T..G. G...............G..G ..C.....C........... ..C................. ..............TGC..A .....
Human       AA.........C........ .G........G......G.. ..............TG.G.. ...GT.........TG..G. G...A.........T.GA.G -.......C..A........ ..........T........C .....C.........ACC.. .....

If I performed the same analysis as Inai et al. (2003), I would conclude that there are ten positions where humans and guinea pigs experienced separate substitutions of the same nucleotide, otherwise known as shared, derived traits. These positions are 1, 22, 31, 58, 79, 81, 97, 100, 109, 157. However, most of these are shown to be substitutions in the rat lineage when we look at larger samples of species.

                                                                                                                  1
                     1         2          3         4          5         6          7         8          9         0          1         2          3         4          5         6     
            12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345678901234567890 12345
Rat         GGAGAAGACCAAGGAGGCCC TACTGGAGCTAAAGGCCATG CTGGAGGCCCACCCCAAAGT GGTAGCCCACTACCCCGTAG AGGTGCGCTTCACCCGAGGC GATGACATTCTGCTGAGCCC CTGCTTCCAGAGGGACAGCT GCTACATGAACATCATTATG TACAG
Mouse       .................... .G.................. .................G.. ..................G. ...................T ........C........... G................... .................... .....
Guinea Pig  A................... .G........G...AG.... .....A..T........G.. ..C............T..G. G...............G..G ..C.....C........... ..C................. ..............TGC..A .....
Human       AA.........C........ .G........G......G.. ..............TG.G.. ...GT.........TG..G. G...A.........T.GA.G -.......C..A........ ..........T........C .....C.........ACC.. .....
Chimp       AA.........C........ .G........G......... ...............G.G.. ...GT.........TG..G. G.C.A.........T.GA.G -.......C..A........ ..........C........C .....C.........ACC.. .....
Oragutan    AA.........C........ .G........G......... ..............TG.G.. ...GT..........G..G. G..............AGA.G -.....G.C..A........ ..........CA.......C ....TC.........ACC.. .....
Macaque     AA.........CA.G..... .G......A.G......... ..............TG.G.. ...GT.......A..G..G. G..............A...G -.......CA.A........ ..........CA........ .....C..G......ACC.. .....
Cow         A...........A....... .G........G......... ........GAG......G.. A..G..............G. ....A........T..C..G ..C.....C........... ..........C.A....... ...........G....C... .....
Pig         A................... .C........G......... .................G.. ...G..............G. .............T..G.CG ..C.....C........... .................... ................C... .....
Chicken     T........A.....A..A. .G........G.....TGCC ......AA.A.......GA. ...G...........T..G. ..........TG.T....CG .....G..CTG......... .................... ................C... .....
Tiger Shark TA..C....T.GA.CA..T. .GGA.C....G....ATTG. .....CAA.A.T..T..T.. .CG...A..T.TT..T..C. ....T..G..TGTT..T.CA ..C..T........C..... ...T.A.AGACA........ .......T........C... .....

When we look at this larger data table, only one position of the ten, 81, stands out as a possible case of a shared derived trait, one position, 97, is inconclusive, and the other eight positions are more than likely shared ancestral sites. With this additional phylogenetic information, I have shown that the “hot spots” Inai et al. (2003) found are not well supported. Therefore, the explaination given to me by the creationist who responded does not work.

References

  • Inai Y et al. (2003), “The Whole Structure of the Human Non-Functional L-Guluno-gamma-Lactone Oxidase Gene - the Gene Responsible for Scurvy - and the Evolution of Repetitive Sequences Thereon,” J Nutr Sci Vitaminology 49:315-319.

15 Comments

Wow, that’s a pretty big blunder to make, both for the authors and the reviewers. I wonder how that slipped past the reviewers? I guess the editors of the Journal of Nutritional Science and Vitaminology don’t know any bioinformaticists. Incidently, according to this site, the journal has an impact factor of 0.701, which is higher than the now-infamous Proceedings of the Biological Society of Washington (0.508). If that doesn’t testify that peer-review isn’t perfect, I don’t know what will.

I hope you’re going to talk a little trash to whomever sent you that link.

The wierd think is that most of the data I used came from the same lab (Nishikimi) that did this paper. From looking at other papers coming from the lab, I get the impression that molecular evolution is not its primary research focus. I think this is another example of a biochemist not fully understanding how to study molecular evolution.

Well, there is much to discuss here. First, how did you select that sequence segment? Of course your conclusions are sensitive to this since you are taking a small sample.

First, how did you select that sequence segment?

It was the segment with the most data, i.e. represented by the most taxa. I don’t think sample size is a problem since diverse taxa are represented.

You are looking at a tiny fraction of the data analyzed in the Inai paper. I’m not defending the Inai paper as I have not read it. However, the sequence segment you selected is quite different from other segments. In your segment there is high similarity. This is not characteristic. Elsewhere one finds a hodge-podge: poor alignments, segments with high human-guinea pig similarity but low similarity to the rat, etc. There are various possible explanations. The substitutions occurred in the rat lineage, insertions and deletions, etc.

This all makes interpreting the data a bit difficult. I suspect things will become clearer with more research, but, as you are probably aware, pseudogenes in general have not proven to be as straightforward as once thought. The Inai paper may be in error, but the idea of mutational hotspots is not controversial.

We need to be careful about claiming what pseudogenes do and do not portend for creation, or any other model for that matter. The creationists you spoke with were apparently unaware of this. It is certainly not a stretch for a creationist to point to mutational hotspots as the cause of the primate GULO pseodugene.

Hotspots are a reality, but we don’t understand them fully yet (to my knowledge anyway). We don’t well understand their cause, species dependence, etc. Hence no one has done a credible probability analysis of the primate GULO pseodugene.

Also, of course, we can’t be certain the primate GULO pseodugene is functionless. It seems obvious that it would be, but we need to be careful as other pseudogenes are being found to have indications of function.

You are looking at a tiny fraction of the data analyzed in the Inai paper. I’m not defending the Inai paper as I have not read it. However, the sequence segment you selected is quite different from other segments. In your segment there is high similarity. This is not characteristic, and more species doesn’t help. Elsewhere one finds a hodge-podge: poor alignments, segments with high human-guinea pig similarity but low similarity to the rat, etc. There are various possible explanations. The substitutions occurred in the rat lineage, insertions and deletions, etc.

This all makes interpreting the data a bit difficult. I suspect things will become clearer with more research, but, as you are probably aware, pseudogenes in general have not proven to be as straightforward as once thought. The Inai paper may be in error, but the idea of mutational hotspots is not controversial.

We need to be careful about claiming what pseudogenes do and do not portend for creation, or any other model for that matter. The creationists you spoke with were apparently unaware of this. It is certainly not a stretch for a creationist to point to mutational hotspots as the cause of the primate GULO pseodugene.

Hotspots are a reality, but we don’t understand them fully yet (to my knowledge anyway). We don’t well understand their cause, species dependence, etc. Hence no one has done a credible probability analysis of the primate GULO pseodugene.

Also, of course, we can’t be certain the primate GULO pseodugene is functionless. It seems obvious that it would be, but we need to be careful as other pseudogenes are being found to have indications of function.

Following up on my previous message, note for example the multiple alignments just up and downstream of the segment you show in your above post. Just upstream, we have the rat, mouse, cow, pig and chicken all with “CATCCC”, but human ang guinea pig with “TGAGTG”. Just downstream we see the same groups with “CCCT” and “TGAC”, respectively. This is not unusual and, for these cases, evolution is left with the explanation that while the ancestral sequence was preserved in the rat, mouse, cow, pig and chicken, the human and guinea pig independently made identical changes. The pattern, and especially its dependence on segment location is rather striking.

Response from Sacremento,

I would appreciate it if you didn’t use my email as your email.

You are looking at a tiny fraction of the data analyzed in the Inai paper. I’m not defending the Inai paper as I have not read it. However, the sequence segment you selected is quite different from other segments.

You should keep in mind that my post is to explain how Inai et al. used flawed methodology to support one of their conclusions. I was not doing a total reanalysis of their data, because there simply is not enough data to complement every thing every nucleotide they looked at.

Also, of course, we can’t be certain the primate GULO pseodugene is functionless. It seems obvious that it would be, but we need to be careful as other pseudogenes are being found to have indications of function.

GULO is a unary pseudogene. The functions found for other pseudogenes won’t work with pseudo-GULO.

Mr. Responder wrote

We need to be careful about claiming what pseudogenes do and do not portend for creation

Not really. “We” need to be much more careful about claiming what pseudogenes do and do not portend for the existence of the Yeti’s invisible flying saucer.

Or, Mr. Responder, are you claiming to have proof that the Yeti’s saucer does not exist? If so, let’s see your data, big talker.

Yes, I understand you were not doing a comprehensive analysis. You did, however, make a comprehensive conclusion that, interestingly, is supported by the segment you posted but not by the immediately adjacent upstream and downstream sequences. Furthermore, you claim that sample size is not a problem since diverse taxa are represented. This simply is not true. In fact, it is precisely those diverse taxa that reveal the problem. At several points the rat, mouse, cow, pig and chicken all align well while the human and guinea pig must have independently diverged from that consensus, yet they align rather well.

Responding to Great White Wonder, I’m afraid I don’t follow your point.

Responder

Responding to Great White Wonder, I’m afraid I don’t follow your point.

I’ll take that as an admission that you lack the requested data. The Yeti will be relieved. He needs to fly back to the Himalayas tonight after spending Thanksgiving weekend in Oregon with his cousin (you know who!).

Responder Wrote:

You did, however, make a comprehensive conclusion that, interestingly, is supported by the segment you posted but not by the immediately adjacent upstream and downstream sequences.

Care to share your alignment?

Here is an alignment that is a superset of your alignment (there are roughly an additional 90 residues upstream and 40 downstream). Of course you’ll need to format with a monospace font (~ represents inserts).

204149________GCAAGAAGGAGAGCAGCAACCTCAGTCACAAGATCTTC~~ACCTACGAGTGTCGCTTCAA_Rat 38325769______GCAAGAAGGAGAGCAGCAACCTCAGCCACAAGATCTTC~~TCCTACGAGTGTCGCTTCAA_Mous 24637282______GGAAGAAGGAAAACTGCAACCTCAGCCACAAGATCTTC~~ACCTACGAGTGCCGCTTCAA_Pig 5924388_______GGAAGAAGGAAAACTGCAACCTCAGCCATAAGATCTTC~~ACCTACGAGTGCCGCTTCAA_Cow 14994234______CAAAGGCTGAGCAGGTCAAGCGCAGTGATAAGGCTTTC~~AACTTTGACTGTCTCTTCAA_Shrk 46425804______~~~~~~~~~~~~~~~~CAATGTCAGCTACAAGATCTTC~~AACTACGAGTGCCGCTTCAA_Chkn 220300________GGGC~~~~AACCCGG~~AGAGCTGTG~GGGAGGGTGCC~~GGCATCCCTTCCTGCCCTGA_GPig 493656________AGGCTGGGAACCTGTGCAGAGTCTTGAGGGAGGGCACCCAGCGGTCCCTTCCCACCCTGA_Humn _______________________________*____________**_____*___________*____*____*

204149________G~~~CAGCATGTACAAGACTGGGCCATCCCTAGGGAGAAGACCAAGGAGGCCCTACTGGA 38325769______G~~~CAGCATGTCCAAGACTGGGCCATCCCCAGGGAGAAGACCAAGGAGGCCCTGCTGGA 24637282______G~~~CAGCATGTCCAGGACTGGGCCATCCCCAGAGAGAAGACCAAGGAGGCCCTCCTGGA 5924388_______G~~~CAGCATGTCCAGGACTGGGCCATCCCCAGAGAGAAGACCAAAGAGGCCCTGCTGGA 14994234______G~~~CAACATGTGTCGGACTGGGCTCTTCCTATTAAGCAGACTAGAGCAGCTCTGGAGCA 46425804______G~~~CAGCATGTGCAAGACTGGGCCATCCCCATTGAGAAGACAAAGGAAGCACTGCTGGA 220300________GGTCCAGATGGCATCCCC~TGCCCTGAGTGCAGAGAGAAGACCAAGGAGGCCCTGCTGGA 493656________GTTCTAGATTCTGTCCCCCTGGGCTGAGTGCAGAAAGAAGACCACGGAGGCCCTGCTGGA ______________*____*_____________**__*_______*___**_****_*__*__**_**___*_*

204149________GCTAAAGGCCATGCTGGAGGCCCACCCCAAAGTGGTAGCCCACTACCCCGTAGAGGTGCG 38325769______GCTAAAGGCCATGCTGGAGGCCCACCCCAAGGTGGTAGCCCACTACCCCGTGGAGGTGCG 24637282______GCTGAAGGCCATGCTGGAGGCCCACCCCAAGGTGGTGGCCCACTACCCCGTGGAGGTGCG 5924388_______GCTGAAGGCCATGCTGGAGGCGAGCCCCAAGGTAGTGGCCCACTACCCCGTGGAGGTACG 14994234______GCTGAAGGATTGGCTGGACAACAATCCTAATGTGCGAGCACATTTTCCTGTCGAGGTTCG 46425804______GCTGAAGGCTGCCCTGGAGAACAACCCCAAGATGGTGGCCCACTACCCTGTGGAGGTGCG 220300________GCTGAAGAGCATGCTGGAAGCTCACCCCAAGGTGGCAGCCCACTACCCTGTGGGGGTGCG 493656________GCTGAAGGCCGTGCTGGAGGCCCACCCTGAGGTGGTGTCCCACTACCTGGTGGGGGTACG ______________***_***______*****_______**__*__*_____*_**_*__*__**_*_***_**

204149________CTTCACCCGAGGCGATGACATTCTGCTGAGCCCCTGCTTCCAGAGGGACAGCTGCTACAT 38325769______CTTCACCCGAGGTGATGACATCCTGCTGAGCCCGTGCTTCCAGAGGGACAGCTGCTACAT 24637282______CTTCACTCGGGCGGACGACATCCTGCTGAGCCCCTGCTTCCAGAGGGACAGCTGCTACAT 5924388_______CTTCACTCGCGGGGACGACATCCTGCTGAGCCCCTGCTTCCAGCGAGACAGCTGCTACAT 14994234______GTTTGTTCGTGCAGACGATATTCTGCTCAGCCCCTGTTACAGACAGGACAGCTGCTACAT 46425804______CTTTGCTCGAGCGGATGAGATCTGGCTGAGCCCCTGCTTCCAGAGGGACAGCTGCTACAT 220300________CTTCACCCGGGGGGACGACATCCTGCTGAGCCCCTCCTTCCAGAGGGACAGCTGCTACAT 493656________CTTCACCTGGAGG~ATGACATCCTACTGAGCCCCTGCTTCCAGTGGGACAGCCGCTACCT _______________**_____*_____*_**_**____**_*****_*__*_*______******_*****_*

204149________GAACATCATTATGTACAGGCCCTATGGAAAGGACGTGCCTCGGCTAGACTACTGGCTGGC 38325769______GAACATCATTATGTACAGGCCCTATGGGAAGGATGTGCCTCGGTTGGATTACTGGCTGGC 24637282______GAACATCATCATGTACAGGCCCTACGGCAAGGACGTGCCTCGGCTGGACTACTGGCTGGC 5924388_______GAACGTCATCATGTACAGGCCCTATGGCAAGGACGTACCGCGGCTGGACTACTGGCTGGC 14994234______TAACATCATCATGTACAGACCCTACGGGAAGGAGGTGCCACGCGAGGGGTACTGGGCAAT 46425804______GAACATCATCATGTACAGGCCCTATGGGAAGAACGTGCCCCGGCTCAACTACTGGCTGAC 220300________GAACATCTGCATATACAGGTGACAGGCTGCTCCATGGGATTTAGGAG~~~~~~~~~~~~~ 493656________GAACATCAACCTGTACAGGTGACAGCTCACTGGGAGGTGGAGATGGGCCTGGGAGCCGGC _______________***_**____*_*****_____*_____

I haven’t had time to look at your larger dataset.

However, certain additional points need to be made about using phylogenetic data to estimate hotspots.

  • The macroevolutionary comparisons that are being done tell us about substitutions not mutations.
  • Long branch attraction

Another note:

I’m going through the GULO data and producing an alignment of the cds region. Some of the genebank files are a little confusing. For instance the cavie exon sequences include introns. I suspect that the upstream and downstream regions that Responder is refering to are places where the alignment is wrong because introns are being aligned with exons.

About this Entry

This page contains a single entry by Reed A. Cartwright published on September 5, 2004 3:36 AM.

More on Meyer was the previous entry in this blog.

ID creationism is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.381

Site Meter