TY - GEN
T1 - Text identification in noisy document images using markov random field
AU - Zheng, Yefeng
AU - Li, Huiping
AU - Doermann, David
N1 - Publisher Copyright: © 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.
AB - In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov Random Field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.
UR - https://www.scopus.com/pages/publications/84945976683
U2 - 10.1109/ICDAR.2003.1227734
DO - 10.1109/ICDAR.2003.1227734
M3 - Conference contribution
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 599
EP - 603
BT - Proceedings - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
PB - IEEE Computer Society
T2 - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
Y2 - 3 August 2003 through 6 August 2003
ER -