An Amrita paper titled A Level Set Methodology for Sanskrit Document Binarization and Character Segmentation won the best paper award in the Second International Conference on Control, Communication and Computer Technology.
The conference was organized by the Interscience Research Network in Bengaluru during November 2011. The paper was authored by second-year M.Tech. students Poornima S .V., Premjith B. and Vidya M., under the guidance of their professor, Dr. Soman K. P., Head of the Center for Computational Engineering and Networking (CEN) in Coimbatore.
An expanded version of this paper will be published by Wiley InterScience.
Another Amrita CEN paper titled A Level Set Methodology for Segmentation of Fluorescent Microscopy DNA Images won the best paper award; this time in a national-level conference.
This conference on Information Processing and Computing was organized at PSG-Tech Coimbatore, also in November 2011. The paper was authored by Ms. Poornima S.V. and Dr. Soman K.P.
Both award-winning papers elaborate on the work undertaken in the Center that seeks to convert scanned images of ancient Sanskrit texts into documents that can be edited or searched through.
“Many scholars and enthusiastic learners across the world will be delighted if they are able to access ancient Sanskrit scripts that can be edited or searched just like our usual text documents,” explained Dr. Soman, referring to the work. “As of now, they can just browse scanned images.”
Using optical character recognition (OCR), one can make editable text documents from document images. The challenge is in using OCR for correctly recognizing Sanskrit words. Sanskrit is written in the Devanagiri script; and most letters have a horizontal upper line known as the Sirorekha.
“As it is, the Devanagiri script has a complex geometry; Sirorekha further complicates character segmentation,” explained Dr. Soman.
Image binarization and segmentation are important preprocessing techniques used in OCR. Errors in these steps are propagated to further steps, resulting in poor quality documents.
The proposed level set method in the award winning papers performs both segmentation and binarization of the image document. It uses the flexibility of active contours and converges fast, aiding in both quick and clear segmentation of the image.
“The efficiency of the proposed method was evaluated using standard performance metrics such as the F-measure, MSE and PSNR. The algorithm was also tested on benchmark images including printed and hand-written Sanskrit documents. Some of the input images were several decades old,” the students shared.
The students found that the proposed method provided excellent results. “There were many advantages. Image segmentation did not require sharp contrast and the active contours used exactly captured the curls and twists in the cells,” they added.
November 26, 2011
Center for Computational Engineering and Networking, Coimbatore