Skip to main navigation Skip to search Skip to main content

Creation of data resources and design of an evaluation test bed for Devanagari script recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.

Original languageEnglish
Title of host publicationProceedings - 13th International Work Shop on Research Issues in Data Engineering
Subtitle of host publicationMulti-Lingual Information Management, RIDE-MLIM 2003
EditorsRajeev Sangal, Bruce Croft
PublisherIEEE Computer Society
Pages55-61
Number of pages7
ISBN (Electronic)0780378687, 9780780378681
DOIs
StatePublished - 2003
Event13th International Workshop on Research Issues in Data Engineering: Multi-Lingual Information Management, RIDE-MLIM 2003 - Hyderabad, India
Duration: Mar 10 2003Mar 11 2003

Publication series

NameProceedings of the IEEE International Workshop on Research Issues in Data Engineering
Volume2003-January

Conference

Conference13th International Workshop on Research Issues in Data Engineering: Multi-Lingual Information Management, RIDE-MLIM 2003
Country/TerritoryIndia
CityHyderabad
Period03/10/0303/11/03

Keywords

  • Character recognition
  • Dictionaries
  • Natural languages
  • Optical character recognition software
  • Shape
  • Software libraries
  • Statistical analysis
  • Testing
  • Text analysis
  • Text recognition

Fingerprint

Dive into the research topics of 'Creation of data resources and design of an evaluation test bed for Devanagari script recognition'. Together they form a unique fingerprint.

Cite this