Skip to main navigation Skip to search Skip to main content

Using a boosted tree classifier for text segmentation in hand-annotated documents

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

A boosted tree classifier is proposed to segment machine printed, handwritten and overlapping text from documents with handwritten annotations. Each node of the tree-structured classifier is a binary weak learner. Unlike a standard decision tree (DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all available training data at each node with different weights. The proposed method is evaluated on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment. The experimental results show that the proposed algorithm outperforms other methods on an imbalanced data set.

Original languageEnglish
Pages (from-to)943-950
Number of pages8
JournalPattern Recognition Letters
Volume33
Issue number7
DOIs
StatePublished - May 1 2012

Keywords

  • Classification
  • Decision tree
  • Document analysis
  • Text separation

Fingerprint

Dive into the research topics of 'Using a boosted tree classifier for text segmentation in hand-annotated documents'. Together they form a unique fingerprint.

Cite this