Junk DNA, Linguistics and the scientific vacuity of Intelligent Design

| 53 Comments

On Pandasthumb, our dear friend Salvador Cordova (YEC) presents us with the following “argument”

Pellionisz FractoGene has demonstrated at least one layer of linguistic architecture for the junk DNA. A linguistic structure suggest function even if the structure is not fully understood (like seeing an undecoded communication, the communication has function, but it is not understood). Furthermore, Fracis Collins called it hubris to say any part of the genome is junk.

Salvador may perhaps not be familiar with the term ‘non-coding DNA’ which describes much better the scientific thinking on the somewhat unfortunate term “junk DNA”, especially since the term seems to be used for cherry picking rhetoric. In this posting I will explore the term junk DNA, address some of the findings in research that DNA and junk DNA show “linguistic features” and show why ID remains fully vacuous since it cannot predict let alone explain “junk DNA”.

Junk DNA and its confusions

According to Wikipedia Junk DNA “is a collective label for the portions of the DNA sequence of a chromosome or a genome for which no function has yet been identified.”

This however does not mean that all “Junk DNA” has a “function” although such DNA can serve, as in the case of the anti-freeze example, as a source for novel functions. In fact, science has detected examples of pseudogenes:

These chromosomal regions could be composed of the now-defunct remains of ancient genes, known as pseudogenes, which were once functional copies of genes but have since lost their protein-coding ability (and, presumably, their biological function). After non-functionalization, pseudogenes are free to acquire genetic noise in the form of random mutations.

Evolutionary science can explain the existence of such pseudo-genes.

It’s safe to say that “Junk DNA” contains many areas where we lack sufficient data or knowledge to understand its origins or function and yet, science is slowly unraveling the details surrounding non-coding DNA. One may thus compare science’s progress in understanding non-coding DNA with Intelligent Design. And one quickly comes to realize that ID remains as usual scientifically vacuous since it is based on our ignorance, not our knowledge.

DNA and linguistic features

Researchers have uncovered some interesting features in non-coding DNA, features which are also found in languages. Based on the work by linguist George Kingsley Zipf, the feature was called Zipf’s law.

Zipfs law states that

in a corpus of natural language utterances, the frequency of any word is roughly inversely proportional to its rank in the frequency table. So, the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc. The term has come to be used to refer to any of a family of related power law probability distributions.

In R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C. K. Peng, M. Simons, and H. E. Stanley, Linguistic Features of Noncoding DNA Sequences, Phys. Rev. Lett. 73, 3169 (1994), the authors showed how non-coding DNA abides by Zipf’s law. As I will show however, their findings come with a lot of warnings.

What ID proponents seem to suggest is that Zipf’s law provides for a specification and given our ignorance of non-coding DNA (which causes it to be complex in ID speak), they conclude that non-coding DNA is complex specified information (CSI) and thus designed.

As I will show, they are correct that non-coding DNA’s linguistic features are designed, and that the designed is a fully natural process

The first warning to ID activists should have been that non-coding DNA follows Zipf’s law distribution more closely than coding DNA. The second warning should have come from the work by researchers showing how many features in the genome match power scaling laws (Zipf’s law is a subset of such laws).

In “True reason for Zipf’s law in language”, researchers Wang Dahui, Li Menghui and Di Zengru published in Physica A: Statistical Mechanics and its Applications Volume 358, Issues 2-4 , 15 December 2005, Pages 545-550, describe the ‘true reason’ why Zipf’s law arises in languages.

Analysis of word frequency have historically used data that included English, French, or other language, data typically described by Zipf’s law. Using data on traditional and modern Chinese literatures, we show here that Chinese character frequency stroked Zipf’s law based on literature before Qin dynasty; however, it departed from Zipf’s law based on literature after Qin dynasty. Combined with data about English dictionaries and Chinese dictionaries, we show that the true reason for Zipf’s Law in language is that growth and preferential selection mechanism of word or character in given language.

Growth and preferential selection… But wait a minute, imagine a process of gene duplication and preferential attachment, and one has recovered one of the processes thought to be the cause behind the scale free nature of so many processes in the genome.

The cause for the deviation of Chinese from Zipf’s law is simple

What causes this difference between Chinese and other languages? Let us pay attention to the some features of Chinese characters and English words. Before the Qin dynasty, Chinese characters were in infancy and different in various areas of China. After Emperor Qin Shihuang unified the characters, the Chinese language became mature. It is difficult to create new characters because Chinese characters are pictographs, and the number of Chinese characters has grown very slowly, from 10 000 to 50 000 over last 2000 years. So, the available number of Chinese characters for any author is almost fixed. On the other hand, the words of other language, such as English, new words are introduced constantly and the number of words grows very fast compared with Chinese character. The available words for authors are unlimited.

It is difficult to add new Chinese characters while adding new words in most other languages is trivial. In other words, when the set of ‘words’ are fixed, the distribution will tend to start deviating from Zipf’s law, helping us understand why Zipf’s law applies better to non-coding DNA than coding DNA, since the latter is constrained by (strong) selection.

In 1996, researchers already pointed out the problems with Zipf’s law

This also showed highly similar Zipf behavior to noncoding DNA and language. Thus, to detect language Zipf analysis should be applied with caution, since it cannot distinguish language from power-law noise

N. E. Israeloff, M. Kagalenko, and K. Chan Can Zipf Distinguish Language From Noise in Noncoding DNA? Phys. Rev. Lett. 76, Issue 11 – March 1996.

In fact, in No Signs of Hidden Language in Noncoding DNA Phys. Rev. Lett. 76, (1996) , authors Bonhoeffer et al add

We have thus shown that most of the observations in [1] may be simple consequences of unequal nucleotide frequencies. Our explanation does not exclude the existence of an undeciphered language in noncoding DNA, but it does undercut speculative arguments based on Zipf’s Law or Shannon redundancy [4]. There remains, however, the very interesting question implicit in [1]: Why are there differences in nucleotide frequencies between coding and noncoding DNA?

The original authors reply in R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger, S. Havlin, C.-K. Peng, M. Simons, and H. E. Stanley, Mantegna et al. Reply:, Phys. Rev. Lett. 76, 1979 (1996) , pointing out that until the distributions of nucleotides in coding and noncoding can be established, the Zipf’s law feature in DNA remains unresolved.

What is interesting is that in 1992, Wentian Li published an article titled Random Texts Exhibit Zipfs-Law-Like Word Frequency Distribution in IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 38, NO. 6, NOVEMBER 1992

Abstract-It is shown that the distribution of word frequencies for randomly generated texts is very similar to ZipPs law observed in natural languages such as English. The facts that the frequency of occurrece of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word’s length to its rank, which stretches an exponential function to a power law function.

Salvador Cordova (YEC) also reports on the work by Pellionisz based on the concept of FractoGene. Salvador makes much of the unfortunate use of the term design by Pellionisz, but as Sal already points out such ‘fractal or recursive design’ already takes place in many plants (Fibonnacci series). In other words, design, once again does not mean ‘intelligent design’ but rather algorithmic design. Since algorithmic design is based on regularities, once again we notice how Intelligent Design is unable to distinguish between actual and apparent complex specified information (CSI). Much is made of the work by Isidore Rigoutsos and others titled Short blocks from the non-coding parts of the human genome have instances within nearly all known genes and relate to biological processes PNAS | April 25, 2006 | vol. 103 | no. 17 | 6605-6610

However, as Miklos Csuros et al point out in “Reconsidering the significance of genomic word frequency “

Determining what constitutes unusually frequent and rare in a genome is a fundamental and ongoing issue in genomics[6]. Sequence motifs may be frequent because they appear in mobile, structural or regulatory elements. It has been suggested that some recurrent sequence motifs indicate hitherto unknown or poorly understood biological phenomena[17]. We propose that the distribution of DNA words in genomic sequences can be primarily characterized by a double Pareto-log normal distribution, which explains lognormal and power-law features found across all known genomes. Such a distribution may be the result of completely random sequence evolution by duplication processes.

[17] references the paper by Rigoutsos et al.

In 2006 in a paper titled “Picking Pyknons out of the Human Genome” published in Cell Volume 125, Issue 5 , 2 June 2006, Pages 836-838, Meynert et al argue that there may be an unknown role for Pyknons, warning however that:

A frustration for computer scientists is that although DNA sequences are easy to analyze, interpreting why a sequence pattern in a genome is nonrandom is much harder to pin down. For example, patterns that appear many times in a genome might not be functionally important. Many dispersed repeats and retrotransposed pseudogenes also generate considerable numbers of related patterns in the genome. The authors point that although nearly all pyknons (99.9%) show some overlap with repeat elements, there are at least 50,000 instances of pyknons that show no overlap with repeat elements as defined by RepeatMasker (Smit et al., 1996). However, most pyknons (90%) are found at least half of the time in repeat regions, meaning that the vast majority of pyknon instances are in classical repeats.

So if these observations can in principle be explained by random sequence evolution by duplication processes, then how do we determine, once again, if there is true intelligent design to be found? Intelligent Design does not give us any answers here. Remember, all it can do is detect ‘design’ where design can include apparent or actual design, without providing ANY tools to differentiate between the two.

While ID is forced to remain hiding in the shadows of our ignorance, science is pushing forward, trying to unravel the ‘mystery’ of Junk DNA.

As a side note, the usefulness of “Junk DNA” dates back to as early as 1978

The usefulness of noncoding DNA for mapping human disease genes has been known for at least 25 years. In 1978, Y. W. Kan and Andres Dozy published a paper in The Lancet in which they used a variation in the flanking DNA of the beta-globin gene in the first successful prenatal genetic diagnosis of sickle cell anemia.

See Pubmed

As is explained by W. Maxwell Cowan et al in a review paper published in Annual Review of Neuroscience Vol. 23: 343-391 (Volume publication date March 2000) titled The Emergence of Modern Neuroscience: Some Implications for Neurology and Psychiatry

A major advance in the study of human genetic disorders occurred in the early 1980s with the development of restriction fragment length polymorphism analysis. Until that time, the genetic markers used to track genes and their mutations in human chromosomes were based solely on variations in coding regions of DNA, expressed ultimately as proteins. The common markers were blood group antigens, certain enzymes, and the antigens of the histocompatibility complex. However, DNA encoding gene products probably accounts for less than 10% of the human genome; more than 90% of the genome contains noncoding sequences, previously referred to as junk DNA. In 1980, Botstein et al (1980) realized that polymorphic sites could be recognized using restriction endonucleases. They pointed out that in the limit, single–base pair changes, which are genome specific and are tied to how closely individuals are related, can be detected by changes in restriction digest patterns. And it is important that these single–base pair changes are diagnostic even when the nucleotide changes occur in noncoding regions. These restriction fragment length polymorphisms (RFLPs) allowed saturation of the human genome with markers in noncoding as well as coding DNA regions, and this broad coverage made it easier to pinpoint the chromosomal loci of inherited diseases. Indeed, even before the report by Botstein and his colleagues, Kan and colleagues (Kan & Dozy 1978, Kan et al 1980) were able to show how RFLP analysis could be used for the prenatal diagnosis of a clinical disorder (in this case, sickle-cell anemia).

53 Comments

As to the origin of the unfortunate term junk DNA

However, Ohno’s paper is perhaps best remembered today for another reason. He titled his article, “So Much ‘Junk’ DNA in Our Genome.” And thus was born what is today one of the most easily grasped and yet pejorative terms in the genetic lexicon – junk DNA.

But is human DNA mostly a repetitious collection of junk sequences that have little or no bearing on the survival of the species? Or, is this so-called junk really providing structural support to our genes and chromosomes, offering the raw material for our continued evolution?

According to most scientists, the answer remains unknown. However, a growing body of evidence has begun to accumulate that suggests that not all of the junk sequences ought to be tossed in the scrap heap just yet.

When Ohno published his famous paper in 1972, he focused his attention mainly on the fossilized genes, called pseudogenes, that are strewn like tombstones throughout our DNA. But as the term “junk DNA” caught on in the 1980s, its meaning was extended to all non-coding sequences, the vast stretches of DNA that are not genes and do not produce proteins.

From Pharmaceutical DNA glossary

“junk DNA”: A general term that encompasses many different types of DNA sequences. These sequences run the gamut from introns, the parts of genes that are edited out during protein synthesis; transposable elements, repeated DNA sequences that, like parasites, duplicate themselves, adding nothing to the genome except more redundant sequence; and pseudo genes, fossils of one- time genes…all of the regulatory elements — promoters and inhibitors - required for gene transcription are spelled out somewhere between the genes. The same is true of other elements deemed junk, such as introns and RNA genes, which clearly hold important clues to understanding alternative splicing … the term junk DNA is frequently used incorrectly. Numerous articles in the medical literature use junk and non- coding DNA interchangeably. [B. Kuska “Bring in Da Noise, Bring in Da Junk” JNCI 90(15): 1125-1127 Aug. 5, 1998]

Dr. Susumu Ohno, writing in the Brookhaven Symposium on Biology in 1972 in the article “So Much ‘Junk DNA” in our Genome’ is credited with originating the term. But his paper was focused “mainly on the fossilized genes, called pseudo genes, that are strewn like tombstones throughout our DNA. But as the term caught on in the 1980’s, its meaning was extended to all non- coding sequences, the vast stretches of DNA that are not genes and do not produce proteins” (about 95% of the genome) … some [scientists] have begun the scrap the notion that all non- coding DNA is junk … “I don’t think people take the term very seriously anymore” says Eric Green [NHGRI] whose group is mapping chromosome 7. [B. Kuska “Should Scientists Scrap the Notion of Junk DNA?” JNCI 90(14): 1032-1033 July 15 1998]

PvM — Well done!

Attempting to summarize, non-coding DNA has (some form of) power-law distribution since its evolution is not constrained by selection pressures.

Did I get it?

Attempting to summarize, non-coding DNA has (some form of) power-law distribution since its evolution is not constrained by selection pressures.

Did I get it?

Yep. Of course, non-coding DNA is still ‘designed’ per ID’s own definition since it is specified, complex information and as Febble and others have pointed out, gene duplication and preferential attachment can explain ‘such design’. In other words, ID’s ‘design’ cannot differentiate between actual and apparent design and is thus scientifically vacuous (and uses equivocation to make its jump from design to ‘intelligent designer’)

Why does the text (YEC) follow every mention of Salvador’s name?

Ok now it’s time for a concession even though I appreciate that it may be cherry picked and quote mined.

The terminology “junk DNA” is unfortunate, which is why science has been moving away from the terminology and is using non-coding DNA. Understanding the historical context of the terminology helps one understand that it was originally used to describe pseudogenes by Ohno, and evolved to describe any DNA which did not have any apparent function.

Somehow creationists have taken the term and, perhaps unaware of its scientific meaning and its historical evolution, have argued that junk DNA need not be true junk, mirroring much of the scientific literature with which they seem to be unfamiliar.

That non-coding DNA shows evidence of linguistic features is indeed fascinating and such findings have spurred significant new research into how to explain these features. ID proponents seem to have jumped onto this bandwagon by arguing that 1) Intelligent Design explains these findings better 2) intelligent design even predicted these findings

They are wrong in both cases. Intelligent Design does not explain anything, certainly not better since any time science proposes an explanation, ID is unable to compare its ‘explanation’ to said scientific explanation to show that it is a ‘better explanation’. To suggest that id predicts these findings is even worse an argument since there is no foundation for such a claim. All ID predicts is that design will be found only in areas which remain unexplained by science. Predicting function or correlations in Junk DNA only undermines ID and has no logical foundation in its fundamental position that design is the set theoretic complement of regularity and chance. In other words, an argument from ignorance.

This leads but to one conclusion namely that ID as a research program or ID as a metaphysical foundation fails to be scientifically relevant and remains scientifically vacuous.

In case of Junk DNA, I have attempted to show that IDers are confused about the concept and meaning of the terms, that they ignore the scientific history of the term, and that their claims that ID predicts or explains these data are unsupportable.

In addition, I have attempted to show how research into Junk DNA has uncovered various functions and correlations, none of which seem to require an appeal to an Intelligent Designer (wink wink).

Antony Wrote:

Why does the text (YEC) follow every mention of Salvador’s name?

It’s his ‘trademark’ :-)

The ID prediction for Junk DNA is that there isn’t any - that is, what looks like junk to us is actually something useful we haven’t figured out yet. This is suspiciously familiar to the creationist view that God would not put junk into the genome, but that’s not the point… :p

Personally I don’t know how they can make this prediction. They always tell us, when we point out suboptimal features (easily explained by evolution, not by ID) that ID simply detects design, and does not explain the how or why. But at the same time they make predictions based on analogies: that junk DNA is not junk, that DNA might look like a computer program… It seems that we can’t speculate about the designer, except when we can.

And I note too, that all these ‘predictions’ assume a designer who acts in ways familiar to us… human ways… except when the design is suboptimal, in which case he/she/it/they become some whimsical fairything.

Things get very interesting

To date, attempts to model the origin of language-like features of non-coding DNA have been based on generalized Lévy walks [9]-[12]. In these simulations, oligonucleotide segments of variable length are excised from a given DNA sequence and then inserted either intact or in modified form, and at random, into the rejoined DNA. Repetition of this process, which mimics the movement of transposable elements and the insertion of retroviral sequences within a genome, leads to DNA sequences which exhibit long-range correlations. However, these models are unsatisfactory in that they do not explicitly take into account the changes in oligonucleotide sequence that are associated with the insertion and excision of transposable elements. Transposable elements represent a significant proportion of the moderately repetitive dispersed sequences found in eukaryotic genomes. These sequences, which have lengths in the range 1–10kbp, are characterized by their mobility within the genome. In terms of genome reshaping processes that might account for the language-like features of non-coding DNA, the most significant consequence of the insertion of most transposable elements is the production of direct repeats of nucleotide sequences at the insertion sites. The lengths of target site duplications range from 2 (e.g., the Tc3/mariner element in Caenorhabditis elegans [21]) to 12 (e.g., the IS4 bacterial insertion sequence [22]). In addition, many transposable elements exhibit strong preference for insertion at particular oligonucleotide target sites [21], [23]-[26]. Excision of transposable elements can have several outcomes ranging from precise excision (i.e. leaving behind the intact target site duplication), to strand inversion or base pair deletions at the junction [27]. The aim of our simulations was to show that the combination of insertion specificity, target site duplication upon insertion, and excision, provide mechanisms which can produce biased distributions of n-tuples and thus induce long-range correlations in originally uncorrelated DNA sequences.

Another example of regular processes explaining these ‘linguistic features’

G. S. Attard, A. C. Hurworth and J. P. Jack Language-like features in DNA: transposable element footprints in the genome Europhys. Lett, 36 (5), pp. 391-396 (1996)

The fact of the matter is that ID advocates cannot actually predict usefulness for junk DNA. Why not? It’s simple - if you say that mutations cause some errors (sometimes to the point of destroying gene function), and you say that God does not always clean-out these errors, then there is the possibility that “junk DNA” will exist. Thus, even the YECs can tolerate a small amount of functionless DNA (but it would have to arrive within a few thousand years and would most likely exist in a subgroup of the population rather than the entire population). So, when the IDists claim that ID predicts usefulness for all “junk DNA”, they are clearly thinking about one particular version of ID (one where God keeps the genome squeaky clean).

This fact touches on the much larger fact that ID really doesn’t make any predictions. Yes, they claim to make predictions like “functionality for junk DNA”, but they really can’t make that prediction. And, if some piece of junk DNA really turns out to be 100% certain to be functionless, IDists will first claim that we can’t be 100% certain, and they’ll also claim that ID doesn’t strictly require functionality for all junk DNA. In other words, the ID prediction for junk DNA functionality will be dropped, and they’ll start talking about why the newly discovered functionlessness doesn’t harm ID in the least bit (because now they have a new understanding of ID).

Combined with data about English dictionaries and Chinese dictionaries, we show that the true reason for Zipf’s Law in language is that growth and preferential selection mechanism of word or character in given language.

Can somebody tell me what this means? It looks like something was left out.

I think that junk DNA is an important concept. What is unfortunate is the confusion between junk DNA–DNA that genuinely does nothing and exists in the genome simply as a result of weak negative selection for genome size &/or positive selection by selfish DNA mechanisms–and noncoding DNA–DNA that does not code for protein, but may or may not serve other functions such as containing regulatory sequences.

I’ve often seen ID advocates acclaim discoveries of functional units within noncoding DNA as evidence against natural selection, apparently failing to realize the the knowledge that there are functional elements in noncoding DNA substantiall predates the coining of the term “junk DNA.” Nobody has ever seriously suggested that all noncoding DNA was junk.

The hypothesis is that a substantial portion of noncoding DNA in some (but probably not all) species is junk. This is based upon the observatioin of very large differences in the amount of noncoding DNA between morphologically similar species.

Preferential selection: the more a character or word is used, the more likely it is to be used again

In other words, growth and preferential selection can explain Zipf’s law. In RNA for instance duplication and preferential attachment are sufficient to explain the scale free nature of RNA.

Not surprisingly Zipf is an example of a scale free distribution.

In languages, growth of word vocabulary with preferential selection, the more a word is used, the more it will be used again, explain why Zipf law applies to languages. In case of Chinese it also helps understand why Zipf’s law failed when the chinese character set became more constrained.

Junk DNA also historically included regulatory regions which can hardly be called junk in the common sense of the word, but they are not expressed as proteins.

Meeting report in Genome Biology describes

A growing body of work suggests that genes for noncoding RNAs make up a substantial class of genes in all organisms, with increasing organismal complexity correlated with an increasing complexity of noncoding RNAs. Many of these noncoding RNAs appear to have regulatory functions and these were the subject of this year’s annual Cold Spring Harbor Symposium. Among the most exciting themes of the meeting were the evidence for significant amounts of hitherto undiscovered transcription in genomes and the discovery of novel classes of noncoding RNAs with thousands of members. In this report I review a few of these highlights.

The tenets of the ‘central dogma’ have required revision over the past few decades as biologists have begun to appreciate that RNA performs many functions once thought solely to be the domain of proteins. Apart from its well established roles as messenger, ribosomal component, and transfer RNA, it is now clear that RNA can have a key role in regulating gene expression. Noncoding regulatory RNAs - RNAs that are not translated into protein - include the small nuclear RNAs (snRNAs), the small nucleolar RNAs (snoRNAs), the XIST RNA that mediates mammalian X-chromosome silencing, microRNAs, riboswitches, and the RNA component of the enzyme telomerase. These RNAs direct such diverse processes as gene silencing, transcriptional and translational control, imprinting, and dosage compensation. These discoveries have electrified the biological community as we try to understand the extent of the ‘RNA world’ and how regulatory RNAs work in controlling gene expression. We are fast learning that large portions of the genome that do not code for proteins are in fact transcribed, and that these regions, previously thought to be ‘junk’, may be useful after all (Figure 1).

Frank J Slack Regulatory RNAs and the demise of ‘junk’ DNA Genome Biology 2006, 7:328 (Source)

That RNA places an ever larger role is excellent support for the RNA world.

PvM — You may care to check what Carl Zimmer has written about non-coding DNA in his blog, The Loom. He may have some additional references you may find of interest.

(This is neat stuff, if we leave out the IDiots…)

Arguing that a specific section of DNA has a linguistic quality and thus it must have a function is a very strange argument to make in the light of linguistics. Every introductory linguistic course introduces grammar with what has to be one of the most famous sentences in humanities:

“Colorless green ideas sleep furiously.” — Noam Chomsky, 1957

It is perfectly correct, based on the linguistic rules of English, yet is completely devoid meaning.

PvM: Yep. Of course, non-coding DNA is still ‘designed’ per ID’s own definition since it is specified, complex information and as Febble and others have pointed out, gene duplication and preferential attachment can explain ‘such design’. In other words, ID’s ‘design’ cannot differentiate between actual and apparent design and is thus scientifically vacuous (and uses equivocation to make its jump from design to ‘intelligent designer’)

OMG Have they just been on a ‘all DNA is perfect’ kick recently or what? I was just about to rip up this crap: http://www.uncommondescent.com/archives/1947

This is interesting. Just a few days ago, I had mentioned the attempt to apply Zipf analysis on non-coding DNA to Torbjorn Larsson over on Mark C. Chu-Carroll’s blog – this was while Mark and others were trying to get ol’ Slithering Sal to lay out an actual model for his claims about Shannon-Weaver info and mutation ( Sal had trotted out his standard citations of Haldane and Pellionisz). Anyway, it was Torbjorn that mentioned that fractals and Zipf analysis have power-law relationships in common, but it was Jonathan Von Post that set off the entire idea by mentioning market analysis ( Zipf was applied to that as well.)

It’s interesting in regards to how this came up to a few people at about the same time, and Sal would do well to look at Von Post’s preliminary sketch of formalizing the “entropy of Natural Selection” –the very issue Sal brings up and can’t manage to deal with mathematically – at Mark C. Chu-Carroll’s here: http://scienceblogs.com/goodmath/20[…]mment-320896

Seems that people have started to figure out that Sal is great at raising ideas, but mostly unable to describe them in sufficient detail let alone defend them. Haldane dilemma, neutral mutations, Turing machines, all are cherry picked by Sal. The latest is channel capacity and Shannon theory.

Of course, the excellent work by Schneider has shown that in the Shannon sense, information can trivially arise in the genome under processes of variation and selection. So if Sal wants to argue that Shannon information is relevant to evolution then he also has to abandon much of ID’s claims about CSI.

Of course ‘linguistic’ is actually a misnomer here, a wrong term along with ‘junk’. What occurs in non-coding DNA is some form of power law or log-normal distribution or both. (Wikipedia explains both quite well.)

So, picking on a well-studied power law, Stefan-Boltzmann for example, just how does this illustrate ‘function’ or ‘design’, much less ‘intelligent design’?

Balderdash…

Genetics: Junk DNA as an evolutionary force by Christian Biémont and Cristina Vieira Nature 443, 521-524 (5 October 2006)

Transposable elements were long dismissed as useless, but they are emerging as major players in evolution. Their interactions with the genome and the environment affect how genes are translated into physical traits.

As a molecular biologist with main focus on “junk DNA”, I would like to point out a common misconseption:

Junk DNA is NOT EQUAL to non-coding DNA.

In fact, most junk DNA (or better “repetitive DNA”) is comprised of so-called transposable elements. these are small genetic units that encode all genes necessary to make copies of themselves. Thus, they are also called “selfish genes”, since their only pupose appears to be to replicate themselves within the genome of the host.

However, the mere fact that they are found in all living organisms indicates that they have an essential function in evolution. We do not know much abut this function (which makes ID people jump in and say “see, that is proof for the Designer!”).

What we know is that it appears that these “selfish” genes provide a background mutation rate that cause significant genomic rearrangements and lead to new genetic variability. Therefore, transposable elements can be seen as motors of genome evolution. No junk, no design, just plain old evolution.

Maybe this is the right place to ask: is here anyone who actually understands Pellionisz’s “FractoGene” idea.…? I’m not an illiterate in genetics, but when I read through his latest Cerebellum review (for reasons completely unconnected to this post), the only thing that I could think of, was “What the hell is he talkin’ about…?”. (I tried to find out from himself as well, but he pretty much declined to explain anything (perhaps beacuse I was highly critical about the whole idea), all I got was a barrage of insults…)

A linguistic structure suggest function even if the structure is not fully understood (like seeing an undecoded communication, the communication has function, but it is not understood).

Sal is such an IDiot. It’s a fundamental of cryptography that any message can be encoded as any sequence. With no knowledge of the linguistic properties of the plaintext, it’s impossible to detect “linguistic structure” in the ciphertext.

However, the mere fact that they are found in all living organisms indicates that they have an essential function in evolution.

That sounds like fallacious teleology to me. That the reproductive processes are such as to create a habitable environment for junk DNA doesn’t mean that junk DNA has an “essential function”.

I have a little diffrent way to handle the

“is a collective label for the portions of the DNA sequence of a chromosome or a genome for which no function has YET been identified.” -> There is no nonfunctional stuff in DNA -“thing”.

I use it just like YEC:s are themselves using similar stuff in fossil record = Simply something like

“OK, you “predict” that there must be ‘function in junk’ (rhetoric enough -stuff for “Them”), so there is GAPS in your theory! And you can not squirm away, becouse If I say that evolution “predicts” that there is no gaps in fossil record(it is not doing that, I knew), at all we just have not found all them YET, you laugh at me. So your ID theory MUST be as wrong as my version of evolution is…”

I don’t know how they can make this prediction.

By being intellectually dishonest.

So your ID theory MUST be as wrong as my version of evolution is…

This mistakenly assumes that they apply standards consistently. They’ll just quote mine it as an “admission” that the ToE is wrong.

It seems to me that Salvador and DaveScot are in a contest to see who can post the most content-deficient, ludicrous post on UD.

Thank you PvM and deadman, for explaining the background to Pellionisz claims and how power-laws are observed in biological systems.

On Pellionisz: He has promoted his website and design terminology on Moran’s blog and on UD. I think I can characterize Moran’s reaction as negative and UD’s as positive, but neither blog analyzed Pellionisz claims.

He has some papers on his ideas, but I can’t see that he has been able to verify any decisive predictions. He also has a list of ‘postgene’ sicknesses, which seems to list as many sicknesses as possible where he knows or assumes individuals genome differences shows up in symptoms, progress and treatment. In my eyes it seems pretty much as overblown optimism or even woo.

On power-laws: The discussion of power-laws are interesting IMHO. I misinterpreted deadman’s comments on the problems of Zipf analysis on non-coding DNA - of course the failures of use could as well be because it shows power-law behavior as default, as well as if it had shown none.

That biological systems shows power-law behavior in some systems doesn’t particularly surprise, since complicated systems often follows Pareto rules and have connections in all ranges. (Like networks and glasses do.) But also rather simple laws with recursions and other foldings in phase space seems to give self-similar fractals and show up as power-law behavior. It doesn’t look like a particularly decisive prediction or test for underlying complexity to characterize mechanisms.

So it is revealing to me that, of course, it is (evolutionary) constraints on such systems (like on coding DNA) which shows up as deviations from power-law behavior. Live a little, learn a little.

On entropy: Yes, Sal discussed channel capacity, but in so vague terms that it IMO looked like he wanted a channel description for the genome through generations and mechanisms. And not between environment and genome as it seems he intended.

The later is Jonathan’s and Schneider’s model, as I understand it, and I know think Chris Lee’s analogy to machine learning that I have mentioned elsewhere. These models shows how information are picked up by evolutionary processes too.

IMHO Sal’s ‘approach’ would probably make a full evolutionary description of the genome and its states behavior very complicated, if possible. (Btw, what do you call an approach which try to avoid exactly “approaching”?) As PvM and Jonathan says an interesting idea, but not sufficiently described.

Jonathan graciously refers to Sal in any case. Well, perhaps even a blind kook can pick up some seeds between the nutty stuff.

“In my eyes it seems pretty much as overblown optimism or even woo.” In regards junk DNA and fractals, that is.

“The later is Jonathan’s and Schneider’s model” - The former is Jonathan’s and Schneider’s model,

Junk DNA also historically included regulatory regions which can hardly be called junk in the common sense of the word, but they are not expressed as proteins. Meeting report in Genome Biology describes

Incorrect. Not even supported by your quote, which states

we are fast learning that large portions of the genome that do not code for proteins are in fact transcribed, and that these regions, previously thought to be ‘junk’, may be useful after all

“Junk” has always referred to DNA that has no function, either protein coding or regulatory. The question has always been which of the nocoding DNA is junk and which is not. The passage you quote points out regulatory functions that had been identified for some regions of noncoding DNA, thereby proving that those regions are not junk at all.

It’s not “junk DNA” - it’s a message from ET!

PvM Wrote:

The terminology “junk DNA” is unfortunate, which is why science has been moving away from the terminology and is using non-coding DNA.

This is not correct. Non-coding DNA is, and always has been, the DNA that doesn’t encode polypeptides. Scientists have always known that many essential sequences are present in non-coding DNA; regulatory sequences, centromeres, telomeres, SARs, origins of replication, ribosome binding sites, polyadenylation sites, transcription termination sites, etc. etc.

None of those have ever been considered junk. Junk DNA is the DNA that does not have a function. Junk DNA is a subset of non-coding DNA and I don’t see any evidence to support your claim that scientists are abandoning the term “junk DNA” in favor of “non-coding DNA.” That doesn’t make sense.

I found your post quite confusing because you are using non-standard terminology.

Something I find interesting here is that DNA that’s “junk” to the individual might pay for its keep in the species by providing more ways of adapting to changes in the future. Sort of like storing stuff for possible recycled use later.

Henry

My crude model for calculating entropy in evolution by natural selection has been cut & pasted from the blog wherein it originated (Good Math, Bad Math), as a first draft paper, currently undergoing preliminary review by coauthor and his postdocs. A blog version of the rewrite of that will be online (possibly hosted by Blake Stacey) in a few days, for further rounds of feedback.

Meanwhile, switching from my scientist hat to my science fiction author hat, Henry J’s comment was addressed with great subtlety, ingenuity, story-telling bravado, and copious glossary and citations in “Darwin’s Radio” By Greg Bear, Del Rey, 1999, Note: Winner of the Nebula Award, Winner of the Endeavor Award, Nominated for the Hugo Award. I have not yet read the sequel, Darwin’s Children.

Syntax Error: mismatched tag at line 15, column 506, byte 3668 at /usr/local/lib/perl5/site_perl/5.12.3/mach/XML/Parser.pm line 187

Yes, the terminology is confusing, in part because many non-professionals have found the topic of sufficient interest to talk about it and muddy the waters, but also unfortunately because scientists themselves often misuse the terms and occasionally make totally over-blown claims for mere publicity’s sake.

When Ohno came up with the term “junk DNA”, he explicitly excluded regulatory elements, and was clearly mostly referring to inactivated pseudogenes, the necessary by-products of gene duplication/diversification mechanisms (and still, IIRC, he made allowances for some pseudogenes to also potentially re-acquire functions).

So, in my opinion “non-coding DNA” should only be used to denote any sequence not directly translated into protein, and therefore it includes a slew of well-known gene-regulatory sequences (promoters, enhancers, locus control regions, effectors and regulators of mRNA splicing, stability and translation, etc) and other functional elements (telomeres, centromeres, insulators, matrix attachment regions, etc). “Genic DNA” should include translatable sequences as well as all of the gene-regulatory elements mentioned above. “Selfish DNA” is more tricky to define, but it basically should be used only to denote elements capable of replicating themselves independently of other genome regions (various forms of transposons and repetitive sequences), or of increasing their genetic transmission frequency (segregation distortion elements).

The most straightforward definition of “junk DNA” is the one given by Larry and others above, that is “DNA that does not have function”, but that of course is somewhat tautological: as soon as a function for a sequence is discovered, it’s not “junk” anymore. That can be still useful from a classification point of view, but as far as explaining the existence of “so much junk in our genome” (to borrow Ohno’s paper’s title), it’s a “junk of the gaps” argument. I think a more useful definition of “junk DNA” would have to be more complex and specific, but I have a hard time coming up with one myself. Overall, I think the term should be used very parsimoniously and with abundant context.

PvM,

our dear friend Salvador Cordova (YEC) presents us with the following

“dear friend”. Aw shucks.

I do appreciate you Christian theism here at PandasThumb, PvM, as it keeps this place a little more congenial for people like me, although your theism and stand for the Christian faith won’t endear you to you know who from Minnesota…

As far YEC, why did you ever turn from your YEC roots. I suspect you must have been brought up with some pathetic YEC arguments and that’s why you left. If you read ID literature and the better YEC literature Walter Brown and truly comprehend it in an unbiased way, you will be cured of your current Darwinian maladies. You’re a prodigal son, PvM, and all will be forgiven if you go back from whence you came.

For the record I had nothing to do with your banning from Uncommon Descent. However, I suppose if you return to your YEC roots or even become an ID proponent, you might be made a UD author. We may even slaghter the fatted calf for our prodigal son upon his return. How does that sound?

regards, Salvador

Salvador T. “Wormtongue” Cordova wrote:

For the record I had nothing to do with your banning from Uncommon Descent. However, I suppose if you return to your YEC roots or even become an ID proponent, you might be made a UD author. We may even slaghter the fatted calf for our prodigal son upon his return. How does that sound?

And if he doesn’t come around, you’ll find some way to compare his written words to some alleged surgical mutilation of children; right?

(I notice, Sal, that you’re quick to claim you had notning to do with PvM’s banning from UD, but not so quick to take a stand on whether said banning was right or wrong. If you think it was wrong, why not say so? And if you think it was right, why are you so hasty to distance yourself from it? It must suck to be so paralyzed by cowardice.)

I suspect you must have been brought up with some pathetic YEC arguments and that’s why you left.

We’re still waiting for your non-pathetic YEC arguments. More to the point, we’re still waiting for ANY actual science to back up such arguments.

Sal Wrote:

As far YEC, why did you ever turn from your YEC roots. I suspect you must have been brought up with some pathetic YEC arguments and that’s why you left.

Are there any other kind of YEC arguments?

If you read ID literature and the better YEC literature Walter Brown and truly comprehend it in an unbiased way, you will be cured of your current Darwinian maladies. You’re a prodigal son, PvM, and all will be forgiven if you go back from whence you came.

ID, YEC, it’s all interchangable isn’t it? But Darwinian problems provide no evidence for ID nor YEC.

Are you familiar with St Augustine’s comments on science and us Christians?

PvM — Please remind us what St. Augustine had to say about science. Thanks.

Yo, here’s the St. Augustine bit:

Usually, even a non-Christian knows something about the earth, the heavens, and the other elements of this world, about the motion and orbit of the stars and even their size and relative positions, about the predictable eclipses of the sun and moon, the cycles of the years and the seasons, about the kinds of animals, shrubs, stones, and so forth, and this knowledge he hold to as being certain from reason and experience. Now, it is a disgraceful and dangerous thing for an infidel to hear a Christian, presumably giving the meaning of Holy Scripture, talking nonsense on these topics; and we should take all means to prevent such an embarrassing situation, in which people show up vast ignorance in a Christian and laugh it to scorn. The shame is not so much that an ignorant individual is derided, but that people outside the household of faith think our sacred writers held such opinions, and, to the great loss of those for whose salvation we toil, the writers of our Scripture are criticized and rejected as unlearned men. If they find a Christian mistaken in a field which they themselves know well and hear him maintaining his foolish opinions about our books, how are they going to believe those books in matters concerning the resurrection of the dead, the hope of eternal life, and the kingdom of heaven, when they think their pages are full of falsehoods and on facts which they themselves have learnt from experience and the light of reason? Reckless and incompetent expounders of Holy Scripture bring untold trouble and sorrow on their wiser brethren when they are caught in one of their mischievous false opinions and are taken to task by those who are not bound by the authority of our sacred books. For then, to defend their utterly foolish and obviously untrue statements, they will try to call upon Holy Scripture for proof and even recite from memory many passages which they think support their position, although they understand neither what they say nor the things about which they make assertion. [1 Timothy 1.7]

Raging Bee — Thank you!

The most straightforward definition of “junk DNA” is the one given by Larry and others above, that is “DNA that does not have function”, but that of course is somewhat tautological: as soon as a function for a sequence is discovered, it’s not “junk” anymore. That can be still useful from a classification point of view, but as far as explaining the existence of “so much junk in our genome” (to borrow Ohno’s paper’s title), it’s a “junk of the gaps” argument. I think a more useful definition of “junk DNA” would have to be more complex and specific, but I have a hard time coming up with one myself. Overall, I think the term should be used very parsimoniously and with abundant context.

I think of “junk” more as a hypothesis than as an identifiable category of DNA sequences. What the junk hypothesis points out is that from an evolutionary perspective, we cannot assume that because a sequence is present in DNA that it has a physiological function. Indirect evidence, such as huge differences in non-coding DNA between otherwise similar species, suggests that in at least some species, quite a lot of the DNA is junk, but that provides no guidance as to how to recognize it, other by exclusion, and our knowledge of genetic regulation is not yet comprehensive enough to have any confidence that we can recognize all regulatory elements.

The strongest test that I can imagine for junk is to delete it and see if something happens, but that is a rather weak test, because even if the animal appears phenotypically normal, one cannot exclude the possibility that the deleted sequence has a function that is expressed in some uncommon environment or circumstance.

Of course, there are some sequences, such as pseudogenes and damaged endoviral remnants, that are strong candidates for identification as junk, but even there one must be cautious, because evolutionary theory predicts that in some cases new functions will be found for such “left over” sequences.

Syntax Error: not well-formed (invalid token) at line 15, column 458, byte 1947 at /usr/local/lib/perl5/site_perl/5.12.3/mach/XML/Parser.pm line 187

I had to wipe the slime off my screen when Sal’s post scrolled across it.

How do I ‘trackback’ this post. I have linked to it on my blog, but apparently that’s not enough.

Comment #157708 Posted by GeoMor on January 26, 2007 2:59 AM (e)

Maybe this is the right place to ask: is here anyone who actually understands Pellionisz’s “FractoGene” idea.…?

ANSWERS BY PELLIONISZ (CAPITALIZED, AJP)

I’ve spent a fair amount of time trying to figure it out, too, and I read his “news column” every now and then. First let me assure you that it’s all very vague and enigmatic from what is available on his (many) websites, but here is what I think the basic ideas sort of are…

(THERE IS A FAIR AMOUNT OF INTELLECTUAL PROPERTY NOT DISCLOSED FREELY - BUT “GeoMor”-s ENCAPSULATION IS PRETTY MUCH CORRECT - AJP)

1. the classical models of genetics (protein-coding genes getting up- or down-regulated by transcription fators, etc.) are totally inadequate to explain how the genome specifies complex life. (no one would disagree) http://www.postgenetics.org

(I AM GLAD “NO ONE WOULD DISAGREE”…AJP)

2. the fact that fractal-like structures can be found in many different aspects of species morphology (neuronal dendrites and blood vessels in ourselves, for example) suggests that “programs” for generating these fractal structures are to be found in the genome. (seems reasonable enough) http://www.fractogene.com/

(I AM GLAD “SEEMS REASONABLE ENOUGH” - BUT GIVEN THE LEVEL OF “NOT UNDERSTANDING” WILL HELP SHOW “NON-OBVIOUSNESS” - AJP)

3. the vast number of repetitive elements in the genome, or “self-similar” elements as he suggestively calls them, are actually evidence of those programs. (do programs generating fractal structures need themselves have fractal nature?) http://www.fractogene.com/fractogene_concept.htm

(FOR THE PARENTHETICAL QUESTION: ALGORITHMS GENERATING FRACTAL STRUCTURES ENCAPSULATE THE DESIGN OF FRACTALITY - AJP)

4. These programs, called “FractoGems” or “FractoSets” or something, are insinuated to somehow be involved with various severe diseases. http://www.fractogem.com

(THIS IS ROCK SOLID EVIDENCE AT NUCLEOTIDE-BASE LEVELS, FOR SEVERAL LEATHAL DISEASES - AJP)

5. the one peer-reviewed article on all of this seems to say that the FractoGene theory correctly predicted the morphology of some (fractal-looking) neuronal dendrites in a fish cerebellum, somehow based on the noncoding DNA in its genome. This is impressive if verifiable, but unfortunately, it is not much clearer than what I just said as to what the prediction was actually based on — one figure seems to suggest it is based on genome size (genome size => morphological complexity), but this is acknowledged as being too naive in the text. http://www.junkdna.com/fractogene/0[…]llionisz.pdf

(READ THE PAPER A BIT MORE CAREFULLY, WHAT’S NAIVE AND WHAT ISN’T - AJP)

6. Prospective investors should contact Pellionisz. (perhaps the most emphasized point)

(THIS POINT IS NOW MUTE. THE LEAD INVESTOR MAY STILL ACCEPT PARTICIPANTS IN A CONSORTIUM, BUT THIS ISN’T A SCIENCE ISSUE BUT SHERE BUSINESS - AJP)

Again, this is all my own attempt to piece together something kind of reasonable from what Pellionisz makes public. If anyone else has anything to add, I’m interested to hear it. Hell, he could be right…

(“HELL, HE COULD BE RIGHT” - KEEP THIS IN MIND. GOOD JOB GeoMor, CONGRATS - AJP)

pellionisz_atNOSPAM_junkdna.com

The problem is that the fractal, scale free nature of the gene is hardly something that science was not aware of. In fact, scientists have shown how this scale free nature can arise via very simple processes of duplication and preferential attachment for instance.

anthony said:

Why does the text (YEC) follow every mention of Salvador’s name?

young earth creationist I assume.

About this Entry

This page contains a single entry by PvM published on January 21, 2007 12:18 PM.

Intelligent Design Creationism lacks explanatory power was the previous entry in this blog.

Confusion by design about design is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.361

Site Meter