Publication Type : Conference Paper
Publisher : Springer Nature Singapore
Source : Lecture Notes in Networks and Systems
Url : https://doi.org/10.1007/978-981-33-4305-4_14
Campus : Amritapuri
School : School of Computing
Department : Computer Science and Applications
Year : 2021
Abstract : This paper describes an optical character recognition technique to convert scanned Sanskrit text images scripted in Devanagari into digital documents. The segmentation mechanism, an adaptation from existing literature, identifies and separates upper and lower modifiers in a character. It also recognizes fused Devanagari letters. The segmented characters are fed to a convolutional neural network classifier which is trained upon a dataset with about 1.2 lakhs images belonging to 85 classes for the core part of a character. Each character from the segmentation phase is predicted and mapped to the respective Unicode representation. These Unicode values for characters are added to reconstruct the desired word. By keeping track of spaces between words and lines, a document can be reconstructed to an editable format.
Cite this Research Publication : Vamsi Krishna Kikkuri, Pavan Vemuri, Srikar Talagani, Yashwanth Thota, Jayashree Nair, An Optical Character Recognition Technique for Devanagari Script Using Convolutional Neural Network and Unicode Encoding, Lecture Notes in Networks and Systems, Springer Nature Singapore, 2021, https://doi.org/10.1007/978-981-33-4305-4_14