Skip to main navigation Skip to search Skip to main content

Handwritten document retrieval strategies

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

With the continuous growth of the World Wide Web, there is an urgent need for an efficient information retrieval system which can search and retrieve handwritten documents when presented with user queries. However, unconstrained handwriting recognition remains to be a challenging task with inadequate performance (around 30%, accuracy) thus proving to be a major hurdle in providing a robust search experience in the domain of handwritten documents. In this paper, we describe our recent research with focus on information retrieval from noisy text output by imperfect recognizers applied to handwritten document images. We describe three techniques each exploring a different approach for solving the noisy text retrieval task. The first method uses a novel bootstrapping mechanism to refine the OCR'ed text and uses the cleaned text for retrieval. The second method uses the uncorrected or raw OCR'ed text but modifies the standard vector space model for handling noisy text issues. The third method employs robust image features to index the documents instead of using noisy OCR'ed text. We describe these approaches in detail and also present their performance using standard IR evaluation metrics.

Original languageEnglish
Title of host publicationAND 2009 - Proceedings of the 3rd Workshop on Analytics for Noisy Unstructured Text Data
Pages3-7
Number of pages5
DOIs
StatePublished - 2009
Event3rd Workshop on Analytics for Noisy Unstructured Text Data, AND 2009 - Barcelona, Spain
Duration: Jul 23 2009Jul 24 2009

Publication series

NameACM International Conference Proceeding Series

Conference

Conference3rd Workshop on Analytics for Noisy Unstructured Text Data, AND 2009
Country/TerritorySpain
CityBarcelona
Period07/23/0907/24/09

Keywords

  • Handwriting analysis
  • Information retrieval
  • Keyword spotting
  • OCR correction

Fingerprint

Dive into the research topics of 'Handwritten document retrieval strategies'. Together they form a unique fingerprint.

Cite this