Skip to main navigation Skip to search Skip to main content

Augmenting word embeddings through external knowledge-base for biomedical application

  • Kishlay Jha
  • , Guangxu Xun
  • , Vishrawas Gopalakrishnan
  • , Aidong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

The technological advancements in biomedical domain has led to a tremendous growth of unstructured data; primarily a result of increased publication of findings. At the same time, a corresponding interest in the Natural Language Processing (NLP) community to develop scalable methodologies to exploit such massive unlabeled corpora for unsupervised language processing has resulted in new opportunities towards developing semantic sensitive models. Amongst them, the field of word embeddings has garnered significant attention due to its capability to understand implicit semantics. However such data driven models are largely agnostic of the rich explicit semantic knowledge available in the biomedical domain in the form of vocabularies and ontologies. This is problematic because it leads to a poor representation of words with little local context and its effect is acute in biomedical domain. In this paper, we propose a novel model (MeSH2Vec) that jointly exploits both contextual information and available explicit semantic knowledge to learn externally augmented word embeddings. Unlike existing approaches, the proposed methodology is more dexterous in its ability to handle relationships between indirectly related concepts. The 13% improvement in the correlation to experts, shown on experiments involving biomedical concept similarity and relatedness task validates the effectiveness of the proposed approach and demonstrates the importance of incorporating human curated knowledge in the process of generating word embeddings.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1965-1974
Number of pages10
ISBN (Electronic)9781538627143
DOIs
StatePublished - Jul 1 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: Dec 11 2017Dec 14 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period12/11/1712/14/17

Keywords

  • biomedical domain
  • semantic knowledge
  • word embedding

Fingerprint

Dive into the research topics of 'Augmenting word embeddings through external knowledge-base for biomedical application'. Together they form a unique fingerprint.

Cite this