Skip to main navigation Skip to search Skip to main content

SpeedRead: A fast named entity recognition pipeline

Research output: Contribution to conferencePaperpeer-review

10 Scopus citations

Abstract

Online content analysis employs algorithmic methods to identify entities in unstructured text. Both machine learning and knowledge-base approaches lie at the foundation of contemporary named entities extraction systems. However, the progress in deploying these approaches on web-scale has been been hampered by the computational cost of NLP over massive text corpora. We present SpeedRead (SR), a named entity recognition pipeline that runs at least 10 times faster than Stanford NLP pipeline. This pipeline consists of a high performance Penn Treebankcompliant tokenizer, close to state-of-art part-of-speech (POS) tagger and knowledge-based named entity recognizer.

Original languageEnglish
Pages51-66
Number of pages16
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: Dec 8 2012Dec 15 2012

Conference

Conference24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period12/8/1212/15/12

Keywords

  • NLP pipelines
  • Named entity recognition
  • Part of speech
  • Tokenization

Fingerprint

Dive into the research topics of 'SpeedRead: A fast named entity recognition pipeline'. Together they form a unique fingerprint.

Cite this