Segmentation and recognition of touching characters in machine printed Telugu documents using average character widths and central moments features

Publisher : International Journal of Applied Engineering Research

Campus : Mysuru

School : School of Arts and Sciences

Department : Computer Science

Year : 2015

Abstract : Accurate recognition of machine printed Telugu documents is one of the principal requirements for the furtherance of the Telugu optical character recognition system. The definite recognition of characters in south Indian languages like Telugu can be realized only when a systematic and ordered segmentation is performed in the segmentation stage of optical character recognition. This paper proposes a recursive segmentation and recognition approach based on the average character widths for the segmentation of touching characters in machine printed Telugu documents like newspapers and text books. The algorithm functioning is based upon the database of trained features of various character components in the Telugu script. The central moment features of the segmented components are used to create a database of various character components in Telugu script. The algorithm had obtained the encouragable outputs in the segmentation process and had achieved an overall recognition rate of around 93-97% in most of the documents experimented. © Research India Publications.

