TY - GEN
T1 - From Scribbles to Text
T2 - 19th International Conference on Document Analysis and Recognition, ICDAR 2025
AU - Rangasrinivasan, Sahana
AU - Sumi, Sumi Suresh
AU - Setlur, Srirangaraj
AU - Jayaraman, Bharat
AU - Govindaraju, Venu
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Handwritten Text Recognition (HTR) remains a challenging task, particularly for child handwriting, which often exhibits irregular letter formation, letter crowding, mirrored letters, and phonological spelling errors. The presence of these characteristics is rare in adult handwriting, which forms the primary training data for most existing HTR systems and multimodal large language models (MLLMs). Consequently, current models often autocorrect or misinterpret these features, limiting their effectiveness in contexts where accurate handwritten text recognition is crucial. This is particularly problematic for identifying specific learning disabilities (SLDs) like dyslexia and dysgraphia, where features such as letter reversals, inversions, and spelling mistakes are key diagnostic indicators. To address this gap, we introduce Extended-TrOCR (E-TrOCR), an adaptation of the transformer-based optical character recognition (TrOCR) model specifically designed for child handwriting. E-TrOCR uses a two-stage training process, starting with the IAM dataset for general handwriting recognition, followed by fine-tuning on a dedicated child handwriting dataset. The model employs character-level tokenization to prevent autocorrection and introduces a novel 220-alphabet to represent letter reversals and inversions. Trained on over 1,800 text lines from elementary school students, E-TrOCR significantly outperforms state-of-the-art HTR models, underscoring the necessity of dedicated solutions for child handwriting recognition.
AB - Handwritten Text Recognition (HTR) remains a challenging task, particularly for child handwriting, which often exhibits irregular letter formation, letter crowding, mirrored letters, and phonological spelling errors. The presence of these characteristics is rare in adult handwriting, which forms the primary training data for most existing HTR systems and multimodal large language models (MLLMs). Consequently, current models often autocorrect or misinterpret these features, limiting their effectiveness in contexts where accurate handwritten text recognition is crucial. This is particularly problematic for identifying specific learning disabilities (SLDs) like dyslexia and dysgraphia, where features such as letter reversals, inversions, and spelling mistakes are key diagnostic indicators. To address this gap, we introduce Extended-TrOCR (E-TrOCR), an adaptation of the transformer-based optical character recognition (TrOCR) model specifically designed for child handwriting. E-TrOCR uses a two-stage training process, starting with the IAM dataset for general handwriting recognition, followed by fine-tuning on a dedicated child handwriting dataset. The model employs character-level tokenization to prevent autocorrection and introduces a novel 220-alphabet to represent letter reversals and inversions. Trained on over 1,800 text lines from elementary school students, E-TrOCR significantly outperforms state-of-the-art HTR models, underscoring the necessity of dedicated solutions for child handwriting recognition.
KW - Child Handwriting
KW - Handwritten Text Recognition
KW - Multimodal Large Language Models
KW - Transformers
UR - https://www.scopus.com/pages/publications/105016534383
U2 - 10.1007/978-3-032-04614-7_7
DO - 10.1007/978-3-032-04614-7_7
M3 - Conference contribution
SN - 9783032046130
T3 - Lecture Notes in Computer Science
SP - 115
EP - 131
BT - Document Analysis and Recognition - ICDAR 2025 - 19th International Conference, Proceedings
A2 - Yin, Xu-Cheng
A2 - Karatzas, Dimosthenis
A2 - Lopresti, Daniel
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 16 September 2025 through 21 September 2025
ER -