Skip to main navigation Skip to search Skip to main content

Text identification in noisy document images using markov random field

  • University of Maryland, College Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.

Original languageEnglish
Title of host publicationProceedings - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
PublisherIEEE Computer Society
Pages599-603
Number of pages5
ISBN (Electronic)0769519601
DOIs
StatePublished - 2003
Event7th International Conference on Document Analysis and Recognition, ICDAR 2003 - Edinburgh, United Kingdom
Duration: Aug 3 2003Aug 6 2003

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2003-January

Conference

Conference7th International Conference on Document Analysis and Recognition, ICDAR 2003
Country/TerritoryUnited Kingdom
CityEdinburgh
Period08/3/0308/6/03

Fingerprint

Dive into the research topics of 'Text identification in noisy document images using markov random field'. Together they form a unique fingerprint.

Cite this