Deskewing Method for Unconstrained Handwritten Kannada Language Leading to Text line Segmentation and its Skew Estimation

Authors

  • Shakunthala B S, Dr. C S Pillai

Abstract

Segmenting text-lines for handwritten Kannada document plays an important role in Human
Character Recognition System. Segmentation is the method for extracting text lines, words and characters.
Segmentation accuracy depends on segmentation phase, incorrect segmentation leads to false recognition.
Handwritten document image is taken as dataset for this approach. In the proposed system, text
segmentation analysis is done for handwritten document in which extracting text lines, words and characters
are done and skew correction is done based on Enhanced Skew Detection and Correction for Words
(ESDCW) algorithm for estimating and correcting skew lines. The algorithm considers the height and
width of the entire handwritten word. Apparently, there must be a minimum value for the height of any word
and maximum value for the width of any word in case of no skew. Once skew is corrected with approximate
skew angle repetition of the same process, only busy zone is considered for performing precise skew
correction. The preprocessing is done using the following methods: (i) filtering (ii) gray scale conversion
and (iii) Binarization. The proposed system recommends preprocessing, dilation and labeling of associated
components of input image, deskewing of words associated with a text line and inserting words to the new
image. Thereafter there is extraction of words that are identified which are then stored in a new image The
unwanted information is carefully removed during extraction of words using the bounding box technique at
the same time avoiding overlapping of words while storing in a new image file. Test was carried out on fully
unconstrained handwritten Kannada documents which yielded an average segmentation rate of 96.38%.

Published

2020-03-25

Issue

Section

Articles