Вы находитесь на странице: 1из 12

Journal of research in engineering and its applications

Vol. 1, Issue. 2, (2018), pp, 98-110

Keyword Spotting For Blstm Based Unconstrained

Handwritten Using Ctc Token Parsing Algorithm
Gowthul prasad1, Suraj sharma2
1Research Scholar, Department of Electronics and communication Engineering, College of Engineering,
Email: prasad12gow@gmail.com
2Assistant professor, Department of Electronics and communication engineering, Kerala Technological
Email: susharma25@yahoo.in

The images are becoming more popular in today’s world and being made available over the internet,
scanned/captured documents are used in paperless offices and digital libraries. The keyword spotting is well
known techniques in document image retrieval system and Recognition free Information retrieval is based on
Keyword Spotting technique it searches for most related keyword from image as per user request by using only
image features. In this paper, to avoid vanishing gradient problem in the handwritten recognition using parser
methods and also character matching to perform efficient and reliable output result.

Keywords: Vanishing gradient, Character matching, handwriting recognition, keyword spotting.


Till now most of traditional libraries and organizations using huge amount of hard copy/printed
resources that are costly and bulky, therefore apiece and all libraries and organizations are move to digitized
format [1]. Paper documents can be converted into digital form by using digitization equipment like scanners,
digital cameras and mobile phones (smart phones, iPhone, iPods). The most common format for these historical
printed documents is the text in which the characters of the documents are represented by the machine-readable
codes (e.g. ASCII codes) [2].

Extracting information from the document images is challenging problem as it compared with digital
texts[3]. Information retrieval from document images has become a growing and challenging problem. Recognition
and extraction of text in document images is the aim of document image analysis. However information retrieval is
concerned with content based document browsing, indexing and searching from a huge database of document
images. The text retrieval from document images has made significant progress and addressing related information
processing problems such as topic clustering and information filtering. Information retrieval (IR) form document
images are developed using two techniques [4].

Information Retrieval holds the intellectual aspects of the document description and specification for
search, many methods and techniques are exposed to carry out the retrieval operation. An efficient and effective
information retrieval method is necessary for fetching the relevant document from a huge storage of documents in
response to user query and ranks the related documents in order. In this aspect an effective method is necessary for
matching the document, and digital image processing (DIP) fields have been made effort to build efficient
document interpretation machine to move towards paperless office, huge use of computer a large. Volume of
information is digitized, and available in the form of document image without adequate or index information. So
retrieval of information is much harder for image data then text data[9].
Making handwritten texts available for searching and browsing is of tremendous value. For example,

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

one might be interested in finding all occurrences of the word “complain” in the letters sent to a company [14]. As
another example, libraries all over the world store huge numbers of handwritten books that are of crucial
importance for preserving the world’s cultural heritage. Making these books available for searching and browsing
would greatly help researchers and the public alike. Certain efforts have already been put into word spotting for
historical data [15], [16]. Another related application is the segmentation of images of historical documents into
meaningful regions, which can be improved with keyword spotting.

Word spotting [2] has emerged as a promising method for recognition free retrieval. Here, word images
are represented using some features and comparison is done with the help of an appropriate distance metric. Due to
appearance based nature of the matching, word spotting has the advantage that it does not require prior learning.
Such word matching schemes have been popularly used in document image retrieval. For example, accessing
historic handwritten manuscripts, searching documents in a collection of printed documents [5] etc. In traditional
word spotting, word images are often represented using a sequence of feature vectors and compared using
Dynamic Time Warping (DTW). Word spotting with DTW works well. However it takes approximately one
second to compare two word images [2]. This makes it practically infeasible in case of large database, where
millions of word images are present.


Sayantan Sarkar [6] proposed the technique of word spotting using Modified Character Shape Code to
Handwritten English document images. It is different from other Word Spotting techniques as it has implemented
two level of selection for word segments to match search query. First one is based on word size and the next is
based on character shape code of query.

B. Gatos et al. [7] have used zone based projection profile with segmentation free approach for word
spotting; they experimented with 50 documents of the Greek script. zone based feature and features based on word
projection profile are extracted from the key word as well as words in the documents the process of word matching
is performed by Manhattan distance, matching the synthetic word with all other segmented words then experiment
evolution is performed using precision vs. recall curves, they used user feedback mechanism to improve the result
of the matching process.

Linlin Li et al. [8] proposed a method for a fast keyword spotting for English script by using pixel based
features are extracted via stroke feature, ascender and descender feature of the word to handle the touching
character in the documents with segmentation free approach authors proposed a method based on word shape
coding techniques ,the strength of the method is proposed in the document filtering techniques authors have used
minimum edit distance to get the similarity measurement the experiment is performed using precision /recall value
as 96.22%/90/08 respectively authors have compared their method with OCR the performance is measured as
98.33% to 96.08. so their method is robust to handle different font and character size.

Biradar et al., [9] Key word spotting is one of the best ways for indexing and retrieval of document
images without optical character recognition. Many researchers have been studied on word spotting but they
concentrated on single script documents, word spotting in multi-script document has big room in future. There are
many methods are proposed but there is need to evaluate the different methods on single benchmark data set for
the advancement of research in word spotting. The drawback is Script independent model for word spotting in
Indian Multi- script Documents.

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

Llados and S anchez propose in [10] a keyword spotting method based on the shape context descriptor.
Words are represented by a signature formulated in terms of the shape context descriptor and are encoded as a bit
vectors code words. A voting strategy is presented to perform the retrieval of the zones of a given document
containing a certain keyword. Kuo and Agazzi use in [11] another classical technique of the speech processing
field. A Hidden Markov Model (HMM) is used to spot words in poorly printed documents. In this case, a learning
step to train the HMM is needed. In addition each word the user wants to query has to be learned previously.

Authors have used moment invariants (MI) feature to recognize the characters. MI features are well
known to be invariant under rotation, translation, scaling and reflection. V Karthikeyan [12] proposed a system for
recognizing Tamil characters. In his paper, the character image skeletonized using Hilditch’s algorithm and
features are extracted based on the concept of image moment which is the weighted average of entire pixel
intensities. Here four features are extracted from each of the character, the equation of which is derived from Hu’s
moment invariants [13].

LSTMs have also recently been demonstrated as a mechanism for learning to represent parse structure.
Vinyals et al. [17] proposed a phrase structure parser based on LSTMs which operated by first reading the entire
input sentence in so as to obtain a vector representation of it, and then generating bracketing structures sequentially
conditioned on this representation. Although superficially similar to our model, their approach has a number of
disadvantages. First, they relied on a large amount of semi-supervised training data that was generated by parsing a
large unannotated corpus with an off-the-shelf parser.

Second, while they recognized that a stack like shift reduce parser control provided useful information,
they only made the top word of the stack visible during training and decoding. Third, although it is impressive feat
of learning that an entire parse tree be represented by a vector, it seems that this formulation makes the problem
unnecessarily difficult.


1. To solve the vanishing gradient problem using BLSTM because it take too long to train and inaccurate.
This problem describes the limits of back propagation learning.
2. Word recognition is still difficult to understand the progress as no common dataset is available for
evaluation, but as this dataset does not include annotation for handwritten words.
3. To HMM method doesn’t work anymore because of the lack of training data for each model and because of
time and perhaps memory problem.
4. Handwriting recognition problem has been researched a lot but still we have issues to solve for, because of
accuracy level. If accuracy is 100% then the problem is solved.


The proposed method is to avoid the vanishing gradient problem using Bi directional long short term
memory (BLSTM) neural network which occurs due to the repeated update of the hidden states in the feedback
loop, using the values of the previous hidden states and the gradients, as long as the memory persists.

The operations of multiplication and differentiation make the gradients vanish over time. LSTM
networks demonstrated the power of deep learning with many nonlinear layers, in connected handwriting
recognition, without any prior knowledge about the languages to be learned. Bidirectional architecture is
employed to access the future input as well as the past input. To solve the vanishing gradient problem using
BLSTM because it take too long to train and inaccurate.

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

(i) Proper initialization of weight matrix,

(ii) Regularization of output i.e. L1/L2 or dropout,
(iii)Use of ReLU activations as its derivative is either 0 or1

Input Handwritten image Preprocessing using Adaptive Feature extraction


BLSTM Neural Network

ReLU for avoid vanishing problem

Output result Character matching CTC parser



Figure 1. Proposed system BLSTM NN

Preprocessing is the step to execute on the input image which is scanned that is exact result image more
useful for further actions. The noise present in image is eliminated by adaptive wiener filter.
(𝜎 2 − 𝑣 2 )(𝐼𝑜𝑟𝑖𝑔(𝑥,𝑦) − 𝜇)
𝐼𝑓𝑖𝑙𝑡(𝑥,𝑦) = 𝜇 +

Where, 𝐼𝑓𝑖𝑙𝑡(𝑥,𝑦) is the filtered image, 𝜇 and 𝜎 2 are local mean and variance, v2 is the estimate the noise

Feature extraction:
The feature extraction technique used gray level co-occurrence matrix. I is an image where number of grey
levels is equal to the number of rows and columns of a matrix in GLCM. Grey co matrix function creates the GLCM
by calculating how often pixels with grey-level value x occurs horizontally adjacent to a pixels with the value y.
Each element (x, y) in GLCM specifies the number of times that the pixels with values X occurred horizontally
adjacent to pixels with value y.
𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡 = ∑ |𝑥 − 𝑦|2 𝐼(𝑥, 𝑦)

(𝑥 − 𝜇𝑥)(𝑦 − 𝜇𝑥)𝐼(𝑥, 𝑦)
𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = ∑
𝜎𝑥 𝜎𝑦

𝐸𝑛𝑒𝑟𝑔𝑦 = ∑ 𝐼(𝑥, 𝑦)2


Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

𝐼(𝑥, 𝑦)
𝐻𝑜𝑚𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦 = ∑
1 + |𝑥 − 𝑦|

Vanishing Gradient problems:

The vanishing occurs when we have saturation at the tails of the sigmoidal function (0 or 1). This is
problematic because now the derivative will always be near 0. During BBPT, we will be multiplying this near zero
derivatives with our error repeatedly. If you keep multiplying a small number less than 1 over and over, it
decreases towards 0 and weakens our error signal. Let the sigmoidal activation function.
𝑆(𝑡) =
1 + 𝑒 −𝑧

S´ (t) = S (t) (1-S (t))

ReLU Units:

To solve this issue, we can use rectified linear units (ReLU) which don’t suffer from this tail saturation as
much. When the input is smaller than zero, the function will output zero. Else, the function will mimic the identity
function. It’s very fast to compute the ReLU function
0, 𝑥<0
𝑓(𝑥) = {
𝑥, 𝑥≥0

Another way of ReLU is

f (x) = max (0,x)

The derivative is 0 if x <0 and 1 if x >= 0, so now error signal won’t weaken as it back propagates
through the network. But we do have the problem in the negative region (x <0) where the derivative is zero. This
can nullify our error signal so it’s best to add a leaky factor to the ReLU unit, where the negative region will have
some small negative slope. This parameter can be fixed or be a randomized parameter and be fixed after training.
There’s also maxout but this will have twice the amount of weights as a regular ReLU unit.

CTC Token Parsing Algorithm

A CTC network processes the input sequence to complete posterior character probabilities at every step.
CTC token passing that may convert the handwritten manuscripts into readable images. In our proposed work we
tend to convert the handwritten manuscripts formatted image into text format. These may ready to lend a hand in
such a way that easy understanding and immediate recognition. The parser determines the total formatted image
characters and devises them into recognizer after bit by bit analysis.

The recognizer can convert the image into text, provides the text output with image display result. The
expected output is in a recognized text in a file. While converting the handwritten characters identification of
character is essential. As we declared each character recognition with specification’s like ([a]-[z]) for lower cases,
([A]-[Z]) for upper cases letter.

Recognition of number’s and special character also taken into part with ([0]-[9]) for digits conversion and
(*, + those operators) with special characters. These must spot a character between 0 and ∞ times or from 1 to ∞

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

times processed its recognition. The spacing between the character differs by pixels, so identification of pixel using
character lines made with ((#[a]* [z]* #)) and equalizing the pixel format using “(Le[a]-[z]*), ((#le[a]-[z]*#)), or
both (#[A]-[Z]o[a]-[z]#)”. Those are the initial level of process for identifying and translating the characters from
the handwritten pages.

This is a step by step conversion, each character can be processed under above method and stored, later
conversions continued with the subsequent steps. Every character can be identified for whether it is a character or
letter, recognized wisely with the use of this algorithm. Finally all the recognized characters stored and relocated


CTC Parser()
Lower Case Elements (# [a]-[z] #) //identify the lower case letters
Upper Cases Elements (#[A]-[Z]#) or Digits (#[0]-[9]#) // identify the upper case letters
operators using * or+
(Spotting between0and∞timesa character, or spotting between1and∞timesa
character) #[0]-[9]+#
Lower cases Elements((#[a]-[z]*)#)
Beginning line(#[a]-[z]*#)
Identify the pixel (#[a]* [z]*#))
Equalize to pixel format(Le[a]-[z]*)
((#le[a]-[z]*#)), or both (# [A]-[Z]o[a]-[z]#)
Convert->text format
(# [0-9]*#) or word beginning by one upper case element
(#[A-Z][a-z]*#) word beginning by one lower case element
Relocate Data from Pm to Database
Convert -> number format
Relocate number from (#[0-9]*)
Relocate word from (# [A-Z] *[a-z])

Character Matching Process

These techniques are different from the others in that no features are actually extracted. It containing the
image of the input character is directly matched with a set of characters representing each possible class. The
technique is simple and easy to implement in hardware and also match with database which is queried by the user.


1. buffer ← empty

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

2: node ← root node

3: while characters on input do
4: ch ← next character from input
5: normalize whitespace or skip multiple
6: if buffer contains character other than letter or digit then
7: unread characters from buffer back on input until the first occurrence of the character that is not letter or digit
8: end if
9: buffer ← empty
10: node ← rootnode
11: end if
12: end while


The performance analysis of the proposed system is following parameters are character matches
recognition, accuracy, rejection rate, time calculation and word error rate. For the existing techniques the accuracy
and time seconds can be quite varies. But for our parser technique it is much differs by its accuracy maintaining
and time.
Input Image

Figure 2 Input image

Figure 2 shows the input image given to the system which contain preprocessing stage, feature extraction
stage, and finally neural network. The above input image is given to preprocessing that can be use adaptive
filtering technique for minimizing the error in the input image, then the result of preprocessing is eliminate the
error from the image.
adaptive filter

Figure 3 Output Of Preprocessing

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

The figure 4 shows the output result of the feature extraction stage. The input for feature extraction is
output result of the preprocessing and the result of feature extraction is given to neural network.
feature extraction

Figure 4 Output Of Feature Extraction

The output image of feature extraction is given to BLSTM neural network to predict the image with past
and future context of elements. The network processing the finite number of sequence words. And also predict the
probability of label sequence.

resized Image
grayscale Image

(a) (b)
Figure 5 (a) and (b) Image prediction

The final output image can be eliminate the noise error and both output (a) and (b) the image will be Text file.

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

doubled Image


Figure 6 Final Output image (a) image (b) Text file output

Character Matches Recognition (Cmr)

This field holds the sample input image, character recognition are also available in the sample subject.
Exactly the hand written character matches with the one. This kind of character recognition is quite easy to identify
and may the technique involved. The exact character can be recognized and the status determines the relative
classes for character matching. This declares the rate of character to distinguish the exact character.

𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑊𝑜𝑟𝑑 𝑚𝑎𝑡𝑐ℎ𝑒𝑠

𝑤𝑜𝑟𝑑 𝑚𝑎𝑡𝑐ℎ𝑒𝑠 𝑖𝑛 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒

Accuracy is calculated in percentage which is correctly recognized image by the whole document image
in the system.

𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑐𝑜𝑔𝑛𝑖𝑧𝑒𝑑 𝑖𝑚𝑎𝑔𝑒

Accuracy (%) =
𝑊ℎ𝑜𝑙𝑒 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑖𝑚𝑎𝑔𝑒 𝑖𝑛 𝑠𝑦𝑠𝑡𝑒𝑚

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

Figure 7 Comparison of accuracy


Rejection rate is calculated by the word which is not matches in user query and the whole document image
in the system.

𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ𝑒𝑑 𝑤𝑜𝑟𝑑
Rejection rate (%) =
𝑊ℎ𝑜𝑙𝑒 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑖𝑚𝑎𝑔𝑒

Figure 8 Analysis Of Rejection Rate

The word error rate is calculated by predicted word by actual word from the sequence of words.

Figure 9 Comparison of error rate

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

Factors Existing Proposed

system system

character matches 81 92

accuracy 83 95

word error rate 17 9

rejection rate 19 11

time calculation 7 9

Table 1: Comparison table with existing and proposed system

Figure 10 Comparison Of Time Between Existing And Proposed Approach


Many researchers have been studied on word spotting but they concentrated on single script documents,
word spotting in multi-script document has big room in future. There are many methods are proposed but there is
need to evaluate the different methods on single benchmark data set for the advancement of research in word
spotting. In this paper, to avoid vanishing gradient problem in the handwritten recognition using parser methods
and also character matching to perform efficient and reliable output result as recognizes text file.


[1] Yue Lu and Chew Lim Tan “Word Searching in Document Images Using Word Portion Matching” Springer-
Verlag Berlin Heidelberg, DAS 2002, LNCS 2423, pp. 319–328, 2002

[2] Toni M. Rath, R. Manmatha “Word Image Matching Using Dynamic Time Warping” IJDAR, pp. 139–
152, 2007

[3] Thomas Konidaris, “A segmentation-free word spotting method for historical printed documents”
springer, 17 April 2015

Journal of research in engineering and its applications
Vol. 1, Issue. 2, (2018), pp, 98-110

[4]Vijayarani.S, Sakila.A,” A Survey On Word Spotting Techniques For Document Image Retrieval”,
International Journal of Engineering Applied Sciences and Technology, 2016 Vol. 2, Issue 2, ISSN No. 2455-
2143, Pages 6-10

[5] A. Balasubramanian, M. Meshesha, and C. V. Jawahar, “Retrieval from document image collections,” in
DAS, 2006, pp. 1–12.

[6]Sayantan Sarkar “Word Spotting in Cursive Handwritten Documents using Modified Character Shape
Codes”,Springer, Advances in Computing and Information Technology pp 269-278,2013.

[7] B. Gatos, T. Konidaris, K. Ntzios, I. Pratikakis and S.J. Perantonis A Segmentation-free Approach for
Keyword Search in Historical Typewritten Documents Proceedings of the 2005 Eight International Conference on
Document Analysis and Recognition (ICDAR’05) 1520- 5263/05 $20.00 © 2005 IEEE

[8] Linlin Li, Shijian Lu and Chew Lim Tan A Fast Keyword-Spotting Technique Ninth International Conference
on Document Analysis and Recognition (ICDAR2007)0-7695-2822- 8/2007 IEEE.

[9] Biradar, Ashok Huded, et al., “Word Spotting in Offline Handwritten Documents Recent Progress and Future
Challenges”, International Journal of Advanced Research in Computer Science and Software Engineering 6(3),
March - 2016, pp. 344-349

[10] Thenkalvi Boomilingam & Murugavalli Subramaniam, “An efficient retrieval using edge GLCM and
association rule mining guided IPSO based artificial neural network”, Received: 22 March 2016 /Revised: 7 August
2016 /Accepted: 14 September 2016,Springer.