Automatic Indic script identification from handwritten documents: page, block, line and word-level approach

SM Obaidullah, KC Santosh, C Halder, N Das… - International Journal of …, 2019 - Springer
International Journal of Machine Learning and Cybernetics, 2019Springer
Script identification is a well-studied problem in literature since last decade. Several
methods for automatic script identification have been reported. All these methods consider a
document as either at page, block, line or word-level, but no experimental/empirical
conclusion has been provided in choosing the particular level of work. To address this, we
have carried out a multi-level script identification experiment, ie, the same document is
considered at different levels namely: page, block, line and word for script identification. Two …
Abstract
Script identification is a well-studied problem in literature since last decade. Several methods for automatic script identification have been reported. All these methods consider a document as either at page, block, line or word-level, but no experimental/empirical conclusion has been provided in choosing the particular level of work. To address this, we have carried out a multi-level script identification experiment, i.e., the same document is considered at different levels namely: page, block, line and word for script identification. Two different types of features are considered: script dependent and script independent, which is computed at each level to categorize different scripts. The experiment is conducted on a newly created handwritten multi-script and multi-level dataset, where 5 blocks, 7.5 lines and 15 words are generated from a single page, on an average (440 pages, 2200 blocks, 3300 lines and 6600 words, in total). Finally, we conclude two major issues: (1) find an optimal level of work, i.e. page/block/line/word-level, (2) provide a qualitative measure of feature set on particular level of work considered.
Springer