Back close

Robust text extraction for automated processing of multi-lingual personal identity documents

Publication Type : Journal Article

Publisher : International Journal of Engineering and Technology

Source : International Journal of Engineering and Technology, Volume 8, Issue 2, Pages 1086-1094, 2016.

Url :

Campus : Mysuru

School : School of Arts and Sciences

Department : Computer Science

Year : 2016

Abstract : Text extraction is a technique to extract the textual portion from non-textual background like images. It plays an important role in deciphering valuable information from images. Variation in text size, font, orientation, alignment, contrast etc. makes the task of text extraction challenging. Existing text extraction methods focus on certain regions of interest and address characteristics like noise, blur, distortion and variations in fonts makes text extraction difficult. This paper proposes a technique to extract textual characters from scanned personal identity document images. Current procedures keep track of user records manually and thus give way to inefficient practices and need for abundant time and human resources. The proposed methodology digitizes personal identity documents and eliminates the need for a large portion of the manual work involved in existing data entry and verification procedures. The proposed method has been experimented extensively with large datasets of varying sizes and image qualities. The results obtained indicate high accuracy in the extraction of important textual features from the document images.

Admissions Apply Now