Skip to main navigation Skip to search Skip to main content

Knowledge discovery from citation networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Knowledge discovery from scientific articles has received increasing attentions recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: document itself and a citation of other documents. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper we propose a Bernoulli Process Topic (BPT) model which models the corpus at two levels: document level and citation level. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multilevel hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. In addition to conducting the experimental evaluations on the document modeling task, we also apply the BPT model to a well known scientific corpus to discover the latent topics. The comparisons against state-of-the-art methods demonstrate a very promising performance.

Original languageEnglish
Title of host publicationICDM 2009 - The 9th IEEE International Conference on Data Mining
Pages800-805
Number of pages6
DOIs
StatePublished - 2009
Event9th IEEE International Conference on Data Mining, ICDM 2009 - Miami, FL, United States
Duration: Dec 6 2009Dec 9 2009

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM

Conference

Conference9th IEEE International Conference on Data Mining, ICDM 2009
Country/TerritoryUnited States
CityMiami, FL
Period12/6/0912/9/09

Keywords

  • Latent models
  • Text mining
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Knowledge discovery from citation networks'. Together they form a unique fingerprint.

Cite this