Skip to main navigation Skip to search Skip to main content

Globally Aware Contextual Embeddings for Named Entity Recognition in Social Media Streams

  • Satadisha Saha Bhowmick
  • , Eduard C. Dragut
  • , Weiyi Meng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

An important task for Information Extraction from Microblogs is Named Entity Recognition (NER) that extracts mentions of real-world entities from microblog messages and meta-information like entity type for better entity characterization. A lot of microblog NER systems have rightly sought to prioritize modeling the non-literary nature of microblog text. These systems are trained on offline static datasets and extract a combination of surface-level features - orthographic, lexical, and semantic - from individual messages for noisy text modeling and entity extraction. But given the constantly evolving nature of microblog streams, detecting all entity mentions from such varying yet limited context in short messages remains a difficult problem to generalize. In this paper, we propose the NER Globalizer pipeline better suited for NER on microblog streams. It characterizes the isolated message processing by existing NER systems as modeling local contextual embeddings, where learned knowledge from the immediate context of a message is used to suggest seed entity candidates. Additionally, it recognizes that messages within a microblog stream are topically related and often repeat mentions of the same entity. This suggests building NER systems that go beyond localized processing. By leveraging occurrence mining, the proposed system therefore follows up traditional NER modeling by extracting additional mentions of seed entity candidates that were previously missed. Candidate mentions are separated into well-defined clusters which are then used to generate a pooled global embedding drawn from the collective context of the candidate within a stream. The global embeddings are utilized to separate false positives from entities whose mentions are produced in the final NER output. Our experiments show that the proposed NER system exhibits superior effectiveness on multiple NER datasets with an average Macro F1 improvement of 47.04% over the best NER baseline while adding only a small computational overhead.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PublisherIEEE Computer Society
Pages1544-1557
Number of pages14
ISBN (Electronic)9798350322279
DOIs
StatePublished - 2023
Event39th IEEE International Conference on Data Engineering, ICDE 2023 - Anaheim, United States
Duration: Apr 3 2023Apr 7 2023

Publication series

NameProceedings - International Conference on Data Engineering
Volume2023-April

Conference

Conference39th IEEE International Conference on Data Engineering, ICDE 2023
Country/TerritoryUnited States
CityAnaheim
Period04/3/2304/7/23

Keywords

  • Information Extraction
  • Natural Language Processing
  • Social Media

Fingerprint

Dive into the research topics of 'Globally Aware Contextual Embeddings for Named Entity Recognition in Social Media Streams'. Together they form a unique fingerprint.

Cite this