Skip to main navigation Skip to search Skip to main content

On variation of word frequencies in Russian literary texts

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent 12<α<1, which shows that the rarer words have a relatively larger degree of frequency volatility (that is, higher "burstiness"). A latent factor model has been estimated to investigate the structure of the word frequency distribution. The findings suggest that the dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.

Original languageEnglish
Pages (from-to)328-334
Number of pages7
JournalPhysica A: Statistical Mechanics and its Applications
Volume445
DOIs
StatePublished - Mar 1 2016

Keywords

  • Burstiness
  • Latent Dirichlet allocation
  • Word frequency variation

Fingerprint

Dive into the research topics of 'On variation of word frequencies in Russian literary texts'. Together they form a unique fingerprint.

Cite this