Publication Type : Conference Paper
Publisher : IEEE
Source : 2025 12th International Conference on Computing for Sustainable Global Development (INDIACom)
Url : https://doi.org/10.23919/indiacom66777.2025.11115749
Campus : Coimbatore
School : School of Computing
Department : Computer Science and Engineering
Year : 2025
Abstract :
The digitization of printed texts in various languages heavily depends on optical character recognition (OCR) technology. For the Kashmiri language, which is mostly written in the Nastaliq script-distinguished by its elaborate ligatures, cursive style, and complicated diacritical marks-The difficulties and developments in OCR are explored in this research. Since many features are shared by Urdu and Kashmiri scripts, OCR accuracy for Kashmiri text is aimed to be improved through transfer learning from Urdu OCR models. Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and k-nearest Neighbours (KNN) models are used in the study. Two datasets are tested: one containing basic Kashmiri words and the other comprising individual Kashmiri letters. According to the results, higher accuracy in characterlevel recognition is achieved by CNN-based Model 3 than by other models. However, due to the intrinsic complexity of the script, sentence-level identification is still rendered quite difficult. It is indicated by the results that gains may be achieved through transfer learning from Urdu OCR models may lead to gains, script-specific details like ligature detection and diacritical mark placement necessitate customized preprocessing and model finetuning. In order to close the current performance gap, the critical need for extensive, annotated Kashmiri text databases and specialized OCR algorithms is emphasized by this study. The preservation and accessibility of Kashmiri literature are helped by the results through the enhancement of OCR methods for under-represented languages.
Cite this Research Publication : Zarak Jahan, Padmavathi S., Deciphering Kashmiri Script: Challenges and Advances in Character Recognition, 2025 12th International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, 2025, https://doi.org/10.23919/indiacom66777.2025.11115749