ZIPF’S AND HEAPS’ LAWS FOR THE NATURAL AND SOME RELATED RANDOM TEXTS
Abstract
We have generated randomized Chomsky’s texts and Miller’s monkey random texts (RTs), basing on a source natural text (NT), and clarified their rank–frequency dependences, Pareto distributions, word-frequency probability distributions, and vocabularies as functions of text lengths. Here the Chomsky’s RT is a NT randomized so that its ‘words’ represent any sequences of letters and blanks between the nearest occurrences of some preset letter (e.g., the letter i). We have compared the exponents appearing in different power laws that describe the word statistics for the NTs and RTs, and have analyzed how well theoretical relationships among those exponents are fulfilled in practice. We have proven empirically that the exponents α and β of the Zipf’s law and the word probability distribution for the Chomsky’s RTs are limited by the inequalities α < 1 and β > 1, while their Heaps’ exponent should be equal to η ≈ 1. We have also compared our results to those obtained for the monkey texts. We have shown that the vocabulary of the Chomsky’s texts is richer than that of the monkey texts. The Heaps’ law is valid to extraordinarily good approximation for the Chomsky’s RTs, similarly to the RTs generated by the intermittence silence process and unlike to sufficiently long NTs that reveal slightly convex vocabulary versus text length dependences plotted on the double logarithmic scale.
Key words: random texts, randomized texts, Miller’s monkey texts, Chomsky’s randomization, power laws, Zipf’s law, Pareto distribution, word-frequency probability distribution, Heaps’ lawFull Text:
PDFDOI: http://dx.doi.org/10.30970/eli.9.94
Refbacks
- There are currently no refbacks.