Abstract
Online content analysis employs algorithmic methods to identify entities in unstructured text. Both machine learning and knowledge-base approaches lie at the foundation of contemporary named entities extraction systems. However, the progress in deploying these approaches on web-scale has been been hampered by the computational cost of NLP over massive text corpora. We present SpeedRead (SR), a named entity recognition pipeline that runs at least 10 times faster than Stanford NLP pipeline. This pipeline consists of a high performance Penn Treebankcompliant tokenizer, close to state-of-art part-of-speech (POS) tagger and knowledge-based named entity recognizer.
| Original language | English |
|---|---|
| Pages | 51-66 |
| Number of pages | 16 |
| State | Published - 2012 |
| Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: Dec 8 2012 → Dec 15 2012 |
Conference
| Conference | 24th International Conference on Computational Linguistics, COLING 2012 |
|---|---|
| Country/Territory | India |
| City | Mumbai |
| Period | 12/8/12 → 12/15/12 |
Keywords
- NLP pipelines
- Named entity recognition
- Part of speech
- Tokenization
Fingerprint
Dive into the research topics of 'SpeedRead: A fast named entity recognition pipeline'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver