HOME   ::  Journal List   ::   Article

Fontanari, J. F. and Perlovsky, L. I. (2004) Solvable null model for the distribution of word frequencies. Physical Review E, 70(4):042901.
Bookmark:  

Related links
  Web search: Google Web Search   ::   Google Scholar
  Within this site: Cited by (1)   

Abstract

Zipf's law asserts that in all natural languages the frequency of a word is inversely proportional to its rank. The significance, if any, of this result for language remains a mystery. Here we examine a null hypothesis for the distribution of word frequencies, a so-called discourse-triggered word choice model, which is based on the assumption that the more a word is used, the more likely it is to be used again. We argue that this model is equivalent to the neutral infinite-alleles model of population genetics and so the degeneracy of the different words composing a sample of text is given by the celebrated Ewens sampling formula [
Theor. Pop. Biol. 3, 87 (1972)
], which we show to produce an exponential distribution of word frequencies.
BibTex
@article{fontanari04wordFrequencyPRE,
  author={J.F. Fontanari and L.I. Perlovsky},
  title={Solvable null model for the distribution of word frequencies},
  journal={Physical Review E},
  year={2004},
  month={Oct},
  volume={70},
  number={4},
  pages={042901},
  url={http://www.isrl.uiuc.edu/~amag/langev/paper/fontanari04wordFrequencyPRE.html}
}