Skip to main navigation Skip to search Skip to main content

Markov random field based text identification from annotated machine printed documents

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

In this paper, we describe an approach to segment handwritten text, machine printed text and noise from annotated machine printed documents. Three categories of word level features are extracted. We use a modified K-Means clustering algorithm for classification followed by a relabeling procedure using Markov Random Field(MRF) based on a concept of neighboring patches and Belief Propagation(BP) rules. Experimental results on an imbalanced data set show that our approach achieves an overall recall of 96.33%.

Original languageEnglish
Title of host publicationICDAR2009 - 10th International Conference on Document Analysis and Recognition
Pages431-435
Number of pages5
DOIs
StatePublished - 2009
EventICDAR2009 - 10th International Conference on Document Analysis and Recognition - Barcelona, Spain
Duration: Jul 26 2009Jul 29 2009

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR

Conference

ConferenceICDAR2009 - 10th International Conference on Document Analysis and Recognition
Country/TerritorySpain
CityBarcelona
Period07/26/0907/29/09

Fingerprint

Dive into the research topics of 'Markov random field based text identification from annotated machine printed documents'. Together they form a unique fingerprint.

Cite this