Skip to main navigation Skip to search Skip to main content

Probabilistic word selection via topic modeling

  • Yueting Zhuang
  • , Haidong Gao
  • , Fei Wu
  • , Siliang Tang
  • , Yin Zhang
  • , Zhongfei Zhang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

We propose selective supervised Latent Dirichlet Allocation (ssLDA) to boost the prediction performance of the widely studied supervised probabilistic topic models. We introduce a Bernoulli distribution for each word in one given document to select this word as a strongly or weakly discriminative one with respect to its assigned topic. The Bernoulli distribution is parameterized by the discrimination power of the word for its assigned topic. As a result, the document is represented as a 'bag-of-selective-words' instead of the probabilistic 'bag-of-topics' in the topic modeling domain or the flat 'bag-of-words' in the traditional natural language processing domain to form a new perspective. Inheriting the general framework of supervised LDA (sLDA), ssLDA can also predict many types of response specified by a Gaussian Linear Model (GLM). Focusing on the utilization of this word selection mechanism for singe-label document classification in this paper, we conduct the variational inference for approximating the intractable posterior and derive a maximum-likelihood estimation of parameters in ssLDA. The experiments reported on textual documents show that ssLDA not only performs competitively over 'state-of-the-art' classification approaches based on both the flat 'bag-of-words' and probabilistic 'bag-of-topics' representation in terms of classification performance, but also has the ability to discover the discrimination power of the words specified in the topics (compatible with our rational knowledge).

Original languageEnglish
Pages (from-to)1643-1655
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number6
DOIs
StatePublished - Jun 1 2015

Keywords

  • Classification
  • Latent Dirichlet Allocation
  • Supervised learning
  • Topic modeling

Fingerprint

Dive into the research topics of 'Probabilistic word selection via topic modeling'. Together they form a unique fingerprint.

Cite this