Abstract
We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent 12<α<1, which shows that the rarer words have a relatively larger degree of frequency volatility (that is, higher "burstiness"). A latent factor model has been estimated to investigate the structure of the word frequency distribution. The findings suggest that the dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.
| Original language | English |
|---|---|
| Pages (from-to) | 328-334 |
| Number of pages | 7 |
| Journal | Physica A: Statistical Mechanics and its Applications |
| Volume | 445 |
| DOIs | |
| State | Published - Mar 1 2016 |
Keywords
- Burstiness
- Latent Dirichlet allocation
- Word frequency variation
Fingerprint
Dive into the research topics of 'On variation of word frequencies in Russian literary texts'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver