Abstract
A boosted tree classifier is proposed to segment machine printed, handwritten and overlapping text from documents with handwritten annotations. Each node of the tree-structured classifier is a binary weak learner. Unlike a standard decision tree (DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all available training data at each node with different weights. The proposed method is evaluated on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment. The experimental results show that the proposed algorithm outperforms other methods on an imbalanced data set.
| Original language | English |
|---|---|
| Pages (from-to) | 943-950 |
| Number of pages | 8 |
| Journal | Pattern Recognition Letters |
| Volume | 33 |
| Issue number | 7 |
| DOIs | |
| State | Published - May 1 2012 |
Keywords
- Classification
- Decision tree
- Document analysis
- Text separation
Fingerprint
Dive into the research topics of 'Using a boosted tree classifier for text segmentation in hand-annotated documents'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver