Music, evolution, and language


This report has just been issued: Tunes create context like language: Maths shows why tonal music is easy listening.

It set me [URL =[…]anguage.html]thinking… [/URL]


Ray Jackendoff has been saying this for more than 20 years, and even has a book’s-worth of data and arguments to back it up.

Figures. I’m rarely on the cutting edge. Thanks for the refs…

The report referred to in this post mentions Zipf law. First, in fact it is Zipf laws in plural, as there are more than one Zipf law. Second, there is so far no commonly accepted explanation of these laws: oddly they hold even for gibberish! What indeed distinguishes meaningful texts from gibberish is the Letter Series Correlation effect discovered by Brendan MkKay and myself a few years ago. On my website there are several articles describing that effect (which is qualitatively the same regardless of language, authorship, etc but different quantitatively between various texts). For the last three years I and Brendan have been planning to publish these results in printed medium, but so far have not done so, for a number of reasons, one being busy with other projects (Brendan is writing a book on another topic) and the other is that every few months we have a new idea of how to improve the method etc. I expect it to work for music as well although we have not tried. Likewise, I expect that the LSC method can be applied to DNA but again we have not tried.

Zipf’s Law (one of them, anyway) has been applied to DNA and it seems that DNA does not follow it:

Tsonis, AA, JB Elsner, and PA Tsonis. “Is DNA a Language?” J theor Biol 184 (1997): 25–29.


DNA sequences usually involve local construction rules that affect different scales. As such their “dictionary” may not follow Zipf’s law (a power law) which is followed in every natural language. Indeed, analysis of many DNA sequences suggests that no linguistics connections to DNA exist and that even though it has structure DNA is not a language. Computer simulations and a biological approach to this problem further support these results.

Mark Perakh wrote

Second, there is so far no commonly accepted explanation of these laws: oddly they hold even for gibberish!

Not for all gibberish, it appears. In the most recent Scientific American there’s an article on the Voynich Manuscript which shows that the distribution of word lengths in the Manuscript does not conform to that of human natural languages. While the piece does not refer specifically to Zipf’s Law, it mentions that the distribution of word lengths in the Manuscript is a binomial, unlike human natural languages, which are described as being asymmetrical and fat-tailed on the long end, which sounds more like a kind of power law to me.


RBH: I have not seen the article in Sci American you refer to, but I have spent a considerable time and effort on Voynich manuscript (VMs) (although not lately), both corresponding with most of the “voynichists” and also applying the LSC method to analyze its structure.

The LSC in VMs behaves practically the same way as in meaningful texts in natural languages.

In fact VMs consists of fragments written in two differing ways (often referred to as “languages A and B”.) LSC for both VMs-A and VMs-B looks like it is for meaningful texts (but distinctively different from gibberish) with a clear quantitative distinction between A and B.

As some Voynichists have shown, Zipf laws hold for VMs. Zipf laws have nothing to do with the word lengths distribution but only with words frequencies.

The average word lengths for VMs-A and VMs-B are different.

As I have shown,the distribution of letter frequencies in VMs is more non-uniform than it is in most natural languages.

There is a lot more that has been found about VMs, including measurements of its entropies etc, but it still remains undecoded.

On my website there are two articles about my research of VMs which also have a number of references to other publications - see .

Btw, today, in an hour, I expect a call from Canada, from the producers of a video about Voynich who want to interview me about my part in the Voynich saga.

As to the word lengths distribution, I have to see the article in Sci American to judge whether this is just the long known data by Landini and others or something new I have not heard about.

There is an ongoing attempt by Gordon Rugg of England and his student who are trying to imitate the VMs structure using a rather ingenious method (they have published recently a paper in Cryptology) but so far their artificial texts display a structure clearly different from VMs insofar as LSC effect is measured (I am in correspondence with them).

John Wilkins: So, Zipf laws do not hold for DNA. Interesting. LSC is though a very differnt animal as compared to Zipf. Unlike Zipf, LSC is not a law, it is a method of studying certain statistical properties of texts. It is based on measuring the variability of characters frequencies along the text. It generates characteristic curves which are qualitatively different for meaningful texts (in 12 languages studied) on the one hand, and for gibberish on the other, and qualitatively identical but quantitatively different for various meaningful texts. We have not applied it to DNA which should be very interesting to do, as presumably the LSC curves for gene=containing parts may differ from those for the parts containing no genes - of course it is impossible to predict anything about it. Mark

I have perused the article on VMs in Sci American - and found that it is authored by our old friend Gordon Rugg, so it does not contain any information beyond what I have known for a while - Gordon sent me his results long before submitting his article to Cryptology and I made a number of comments to it at that time. It was a much better paper than this one in Sci American as it did not contain those misleading assertions which we see in Sci Amer paper (which he did not consulted with me about). The assertion which is untrue (and which makes the basis of his asseveration that VMs is gibberish) is that he allegedly succeeded to create artificial gibberish which is similar to VMs. In fact, he and his student Laura created hundreds of gibberish texts using an ingenious technique, and these texts superfically looked similar to Voynich ms. However, they sent me their texts and requested to conduct a LSC test on them, which I did together with Brendan McKay. All Gordon’s and Laura’s texts exhibited LSC curves typical of gibberish (as expected) while such curves for Voynich ms invariably are of the type characteristic of meaningful texts in natural languages. Obviously the gibberish texts created by Gordon and Laura have a structure quite different from VMs. Hence, while VMs may be a hoax, Rugg’s data in no way prove it - on the contrary! In Cryptology paper Rugg avoided such an exaggeration of the significance of his data, so it is disappointing that in Sci Amer paper he yielded to the temptation to misrepresent his results which in fact so far fell short of what he hoped to achieve. Humanly understandable, and more so since a degree for Laura is at stake. Mark

