Abstract
We propose selective supervised Latent Dirichlet Allocation (ssLDA) to boost the prediction performance of the widely studied supervised probabilistic topic models. We introduce a Bernoulli distribution for each word in one given document to select this word as a strongly or weakly discriminative one with respect to its assigned topic. The Bernoulli distribution is parameterized by the discrimination power of the word for its assigned topic. As a result, the document is represented as a 'bag-of-selective-words' instead of the probabilistic 'bag-of-topics' in the topic modeling domain or the flat 'bag-of-words' in the traditional natural language processing domain to form a new perspective. Inheriting the general framework of supervised LDA (sLDA), ssLDA can also predict many types of response specified by a Gaussian Linear Model (GLM). Focusing on the utilization of this word selection mechanism for singe-label document classification in this paper, we conduct the variational inference for approximating the intractable posterior and derive a maximum-likelihood estimation of parameters in ssLDA. The experiments reported on textual documents show that ssLDA not only performs competitively over 'state-of-the-art' classification approaches based on both the flat 'bag-of-words' and probabilistic 'bag-of-topics' representation in terms of classification performance, but also has the ability to discover the discrimination power of the words specified in the topics (compatible with our rational knowledge).
| Original language | English |
|---|---|
| Pages (from-to) | 1643-1655 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 27 |
| Issue number | 6 |
| DOIs | |
| State | Published - Jun 1 2015 |
Keywords
- Classification
- Latent Dirichlet Allocation
- Supervised learning
- Topic modeling
Fingerprint
Dive into the research topics of 'Probabilistic word selection via topic modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver