Вы находитесь на странице: 1из 8

Detection of Arrows in On-line Sketched Diagrams

using Relative Stroke Positioning

Martin Bresler, Daniel Průša, Václav Hlaváč


Czech Technical University in Prague, Faculty of Electrical Engineering,
Department of Cybernetics, 166 27, Praha 6, Technická 2, Czech Republic
{breslmar, prusapa1, hlavac}@cmp.felk.cvut.cz

Abstract the research has already moved from recognition of plain


text to recognition of a more structured input as diagrams.
This paper deals with recognition of arrows in online This work is focused on recognition of arrows in on-line
sketched diagrams. Arrows have varying appearance and sketched diagrams.
thus it is a difficult task to recognize them directly. It is ben- Arrows are the most important symbols in diagrams,
eficial to detect arrows after other symbols (easier to de- since they bear the most valuable information about the
tect) are already found. We proposed [4] an arrow detector diagram structure – what symbols are connected together.
which searches for arrows as arbitrarily shaped connectors However, it is a difficult task to recognize them because
between already found symbols. The detection is done two of their varying appearance. We consider two diagram do-
steps: a) a search for a shaft of the arrow, b) a search for mains – finite automata (FA) and flowcharts (FC). There is
its head. The first step is relatively easy. However, it might a freely available benchmark database available for each of
be quite difficult to find the head reliably. This paper brings the domains: the FA database [4] and the FC database [1].
two contributions. The first contribution is a design of an Figure 1 shows examples of diagrams from these two do-
arrow recognizer where the head is detected using relative mains. It is obvious that arrows can be arbitrarily directed
strokes positioning. We embedded this recognizer into the and their shafts might be straight lines, curved lines, or
diagram recognition pipeline proposed earlier [4] and in- polylines. Moreover, their heads have a different shape.
creased the overall accuracy. The second contribution is an There exists an approach, where arrows are detected first
introduction of a new approach to evaluate the relative posi- and the knowledge of arrows helps to naturally segment the
tion of two given strokes with neural networks (LSTM). This rest of the symbols [14]. The problem is that authors of this
approach is an alternative to the fuzzy relative positioning approach put very strict requirements on the way the arrow
proposed by Bouteruche et al. [2]. We made a comparison is drawn. It must consist of one or two strokes and the ar-
between the two methods through experiments performed on row’s head must have only one predefined shape. Another
two datasets for two different tasks. First, we used a bench- approach is to detect arrows the same way as other sym-
mark database of hand-drawn finite automata to evaluate bols – using a classifier based on the symbol appearance.
detection of arrows. Second, we used a database presented Since the arrows might be arbitrarily rotated and the heads
in the paper by Bouteruche et al. containing pairs of ref- might have different shapes, it is necessary to create several
erence and argument strokes, where argument strokes are arrow sub-classes. This approach is more general, but the
classified into 18 classes. Our method gave significantly achieved accuracy is limited. The state-of-the-art methods
better results for the first task and comparable results for in flowchart recognition achieve always very small accu-
the second task. racy in arrow recognition [5, 3]. We already suggested [4]
that it is better to detect arrows after the other symbols
are detected. We proposed an algorithm, which searches
1. Introduction for arrows as arbitrarily shaped connectors between already
found non-arrow symbols. It works in two stages: a) ar-
This paper deals with on-line handwriting recognition, row shaft detection, b) arrow head detection. The detection
where the input consists of a sequence of strokes. A stroke of arrow head is based on heuristics and does not achieve
is a sequence of points captured by an ink-input device (the satisfactory precision. In this paper, we employ machine
most commonly a tablet or a tablet PC) as the user was writ- learning to improve the proposed arrow detector with arrow
ing with a stylus or his finger. In handwriting recognition, head classifier based on relative strokes positioning.
(a) (b)
Figure 1. Examples of hand-drawn diagrams containing arrows connecting symbols with rigid bodies: (a) finite automata, (b) flowchart.

In many cases, appearance does not give us enough in- represent the head of the arrow. Section 3 introduces our
formation to classify single strokes and we need some con- method for evaluation of the relative position. Experiments
textual information. Relative position of a stroke with re- and their results are described in Section 4. Finally, we
spect to a reference stroke is the most intuitive. Bouteruche make a conclusion in Section 5.
et al. [2] addressed this problem directly and proposed a
fuzzy relative positioning method. The authors introduced 2. Arrow detector
a method evaluating the relative position of strokes based
on the fact how pairs of strokes fulfil a set of relations such Arrows are symbols with a non-rigid body. They consist
as ”the second stroke is on the right of the first stroke” of two parts: shaft and head. The head defines the orien-
through defined fuzzy landscapes. They used this method tation of the arrow. However, arrow’s appearance can be
to solve a prepared task, where pairs of reference and ar- changing arbitrarily according to the given domain. They
gument strokes are given and the argument strokes have to can have various shapes, lengths, heads, and directions.
be classified into 18 classes corresponding to several types Therefore, it is a difficult task to detect arrows with or-
of accentuation or punctuation. The information about the dinary classifiers based on symbol appearance. However,
appearance and the relative position of the argument stroke each arrow connects two other symbols with a rigid body
with respect to the reference stroke must be combined to- (see Figure 1). It is beneficial to detect these symbols first
gether to achieve a good recognition rate. This task ade- and leave the arrow detection to another classifier detecting
quately demonstrates the need for relative positioning sys- arrows between pairs of these symbols. This new classifier
tem. They used Radial Basis Function Networks (RBFN) must perform the following two steps:
as a classifier. The method was further improved by a better
definition of fuzzy landscapes and using SVM by Delaye et 1. Find a shaft of the arrow connecting the given two
al. [7]. Although the fuzzy relative positioning is a power- symbols. This shaft is just a sequence of strokes lead-
ful method useful for more complex tasks as recognition of ing from a vicinity of the first symbol to a vicinity of
structured handwritten symbols (Chinese characters) [6], it the second symbol and it is undirected.
gives poor results when applied on arrow head detection.
2. Find a head of the arrow, which is located around one
Our work brings two contributions. First, we define ar- of the end-points of the shaft. The head defines orien-
row head detection as a classification of possible arrow head tation of the arrow (if it is heading from the first sym-
strokes based on relative positioning. We used this arrow bol to the second symbol or vice versa).
head classifier to significantly improve proposed arrow de-
tector. Second, we propose a new method for evaluation The detection of an arrow’s shaft can be done iteratively
of the relative position of strokes, which exploits simple by simply adding strokes to a sequence such that the first
low-level features and uses Bidirectional Long Short Term stroke starts in a vicinity of the first symbol and the last
Memory (BLSTM) Recurrent Neural Network (RNN) as a stroke ends in a vicinity of the second symbol. A new stroke
classifier. The BLSTM RNN proved to be a good tool for is added to the sequence only if the distance between the
classification of individual strokes [13]. end-point of the last stroke and the end-point of the new
The rest of the paper is organized as follows. Section 2 stroke is smaller than a threshold. The algorithm must con-
describes the proposed arrow detector and the way the rel- sider all possible combinations of strokes creating a valid
ative positioning is exploited to determine which strokes connection between the given two symbols. The search
Ref.pointmA Querymstrokesm HeadmA
searchmandm
classification
Pairsmofmsymbols Shaft Extractionmofm Arrow
Detectionmofmarrowm Selectionmofmthembestm
referencemstrokesm
shaft arrowmhead
andmpoints
Querymstrokesm
searchmandm
Ref.mpointmB classification HeadmB

Figure 2. Arrow recognition pipeline. The recognition process is illustrated on a simple example of two symbols from FC domain.

space can be reasonably reduced by setting a maximal num- It happens quite often that the user draws a shaft and a
ber of strokes in the sequence. This number depends on the head of an arrow by one stroke. Our algorithm would fail
domain and the fact, how many strokes users use to draw ar- in that case. Therefore, we make one important step before
row’s shafts. Typically, it is four and two for flowcharts and we try to find the arrow’s head – we segment the last stroke
finite automata, respectively. We can immediately remove of the shaft into smaller sub-strokes in such a way that the
some shafts, which are in a conflict with another shafts, head is split from the shaft. Created sub-strokes are divided
and keep those with the smallest sum of the following dis- into two groups. One group is used to finish the shaft again
tances: a) distance between the first symbol and the first such that it reaches the symbol again. Sub-strokes of the
stroke of the shaft, b) distance between the second symbol second group are put into the set of query strokes possibly
and the last stroke of the shaft, c) distances between indi- forming the head. Our splitting algorithm is described in
vidual strokes of the shaft. Section 2.2. If the shaft and the head are not drawn by one
stroke, the algorithm will ideally perform no segmentation
Since we do not know the orientation of the arrow yet and this step can be skipped.
and the shaft is undirected, we have to consider both end-
points of the shaft and try to find two heads (one in the 2.1. Reference stroke and reference point
vicinity of each end-point). Ideally we will be able to find
just one head. In practice, it can happen that we find two It is necessary to define a reference stroke. Position of all
heads and we have to decide which one is better. The de- query strokes will be evaluated relatively with respect to it.
tection of an arrow’s head is not a trivial task, because there Naturally, it seems that the arrow’s shaft should be the refer-
might be a lot of interfering strokes around the end-points ence stroke. However, it is better to use just a sub-stroke of
of the shaft: heads of another arrows or text. The deci- the shaft for this purpose. The reason is that the shaft might
sion which strokes represent the true arrow’s head we are be arbitrarily curved or refracted, the whole arrow might
looking for and which are not, is a task, where the stroke be arbitrarily rotated, and we want to normalize the input
positioning might be beneficially used. First, we define a in such a way that the reference stroke has always more or
reference stroke (a sub-stroke of the shaft) and a reference less the same appearance and the query strokes have always
point (end-point of the shaft), which are used to express more or less the same relative position. Therefore, we cre-
a relative position of query strokes (details follow in Sec- ate a sub-stroke beginning at the end-point of the shaft with
tion 2.1). Second, this information about relative position is a shape of a line segment. It is done iteratively by adding
given to a classifier making the decision. The query strokes points to the newly created stroke until the value of a crite-
are all strokes in a vicinity of a given end-point of the shaft, rion, expressing how similar is the stroke to a line, is bigger
which are not a part of the shaft itself nor the two given than a threshold. The criterion is a ratio of the distance be-
symbols. We make a classification into two classes: head tween the end-points of the stroke and the path length of the
and not-head. Explanation for the evaluation of the relative stroke (sum of distances between neighbouring points). We
position of strokes and classification is given in Section 3. set the threshold empirically to 0.95. Another condition is
Let us just note that the classifier returns a class into which that the distance between end-points of the stroke must be
the query stroke is classified along with a potential. We use bigger then a threshold empirically derived from the aver-
this potential to decide which head is of better quality in the age length of strokes, because the possible presence of so
case we find two. We just compute a sum of potentials of all called hooks at ends of strokes would cause small value of
strokes in each head and decide for the head with the big- the criterion for short strokes. Figure 3 illustrates how the
ger value. This slightly favours heads consisting of higher reference stroke is determined as a sub-stroke of the shaft.
number of strokes, which is desirable in the most cases. A Then we rotate the reference stroke and all query strokes
pseudocode for the algorithm that we just described is di- by such an angle that the vector given by the end-points of
vided into two procedures and presented in the supplemen- the reference stroke will be pointing in the direction of the
tary material as Algorithm 1 and Algorithm 2. The arrow x-axis. In another words, it will cause that the true arrow
recognition pipeline is depicted in Figure 2. heads should point from the left to the right. For purposes
of our method for evaluation of relative position of strokes The common approach is to find tentative splitting points
(described in Section 3), we have to define a reference point. with high curvature and low speed. The best subset of
Obviously, it is the end-point of the shaft. these points is selected according to the error function fit-
ting points of each segment into selected primitives. The
most common primitives are line segments and arcs [8, 15].
It is also possible to use machine learning to train a classifier
detecting the splitting points [9, 11].
The presented algorithms are sophisticated and allow to
find segments fitting predefined primitives. However, us-
ing any of these methods seems to be an overkill for our
(a)
task. We do not require to split a stroke at any precisely de-
fined point nor to create segments with particular geomet-
rical properties (line segments or arcs). All we need is to
split the arrow’s head from its body and it is not important
if both the body and the head will be further split into sev-
eral segments. Therefore, we suggest to use much simpler
algorithm for stroke segmentation. Its description follows.
We compute a value AA, which we call “accumu-
lated angle”, associated to each point of the stroke S =
(b) (c)
{p1 , p2 , . . . , pn } according to the following equation:

AAi = mean(Rank3{A(i, 1), . . . , A(i, min(i−1, n−i, R))}),


(1)
where i is the index of the point in the sequence, Rank3
is an operator choosing up to the three smallest values of
a given set, R is the maximal radius, and A is a function
computing an angle between two vectors defined by the in-
(d) (e) dex of the given reference point and its two neighbouring
points chosen by the size of the radius. The function A is
Figure 3. Example showing a diagram and the way of choosing
the reference point, the reference stroke, and the rotation. Indi- defined as follows:
vidual pictures illustrates the following: (a) whole diagram with a −
p− −−→ −−−−→
i pi−r · pi pi+r
highlighted (red) arrow to be detected, (b) detected arrow’s shaft A(i, r) = arccos . (2)
is blue and right end-point is considered to be the reference point, kpi pi−r k · kpi pi+r k
second point is green, the angle α used to rotate query strokes is
marked, (c) rotation is done, the reference point is red as well as Let us note that AAi is computed according to Equation (1)
strokes of the real arrow’s head, (d) analogously to (b) with the only for i ∈ {2, . . . , n − 1} and AA1 = AAn = 0. We
other end-point considered, (e) analogously to (c) with exception define the initial set of splitting points by taking points
that there is no real head, because the arrow’s orientation is wrong. where the AA reached a local minima and the value is
smaller than mCoeff · mean{AA1 , . . . , AAn }. In the case
Because we still do not know the orientation of the arrow, that there are two splitting points too close to each other
we have to consider both options: the arrow is heading to (dist(pi , pj ) < distThresh), we remove one with smaller
the first symbol or the second symbol. Therefore, we define AA value. We set mCoeff = 0.5 and distThresh = 200
two reference points, end-points of the shaft. A reference empirically. After this removal, the segmentation is done.
(sub)stroke is associated to each of these two points then. We tested the described algorithm on arrows from the FA
Figure 3 shows the whole process of the reference stroke database (see Section 4.1) which were drawn by one stroke
extraction and rotation. and it turned out that the algorithm split the head from the
2.2. Stroke segmentation body in 100% of cases. Let us emphasize that parameters
mCoeff and distThresh are tunable. It makes it easy to
Stroke segmentation is very important field of research, adjust for demands of a given task.
because it is frequently used preprocessing step. Therefore,
there exist various papers dealing with this problem. The 3. Evaluation of relative position of strokes
segmentation is done by defining a set of splitting points.
The substantial information is curvature and speed defined Unlike the method by Bouteruche et. al, where a query
at each point and geometric properties of stroke segments. stroke is evaluated with respect to the whole reference
p1
p1
p2

d1
d2 d1
α2
α1 x
R αn dn pn
α1
R x
dn αn

pn

(a) Arrow domain. (b) Accent domain.


Figure 4. Example showing pairs of reference and query strokes and extracted sequences of features (angles and distances) for both
domains. Reference point R is marked red. In the case of arrow domain, both, the reference and the query, strokes are already rotated. The
query stroke is a sequence of points {p1 , p2 , . . . , pn }.

stroke by evaluating its fuzzy structuring element, we pro- 4. Experiments


pose to evaluate the relative position of a query stroke with
respect just to a single point of the reference stroke. In the We made experiments on two tasks. The first one is the
case of arrows, it is the end-point of the arrow’s shaft. In task defined in this paper – classification of strokes into two
the case of the task defined by Bouteruche et al., it can be classes head, not-head when the reference stroke is a part
an arbitrary fix point. We propose to choose a center of the of the arrow’s shaft. The second task is to classify argument
reference stroke’s bounding box. strokes representing accentuation or punctuation of its refer-
ence strokes into 18 classes. The task as well as the database
We are given a reference stroke, which is represented by
called ACCENT was proposed by Bouteruche et al. In the
its reference point R and a query stroke S defined by a se-
case of arrows we additionally evaluated the whole process
quence of its points: S = {p1 , p2 , . . . , pn }. To describe the
of arrows detection, where the stroke classification is a sub-
relative position of S with respect to R, we express relative
task. We used both positioning methods to solve both tasks
position of each point pi using polar coordinates. Position
−−→ − and we made a comparison. All experiments were done on
of each point is defined by the angle αi = Rpi ∠→ x and the a standard tablet PC Lenovo X230 (Intel Core i5 2.6 GHz,
distance di = kRpi k. We create a sample for each pair of 8GB RAM) with 64-bit Windows 7 operating system.
a reference and query strokes consisting of a sequence of
the described features {[α1 , d1 ], [α2 , d2 ], . . . , [αn , dn ]} and 4.1. Arrows
a label indicating the class of the query stroke. For illus-
tration, see Figure 4. We propose to use (B)LSTM RNN as We used the FA database for this experiment. The ver-
a classifier, because it reaches the best results in many ap- sion 1.1 contains the annotation of heads and shafts of ar-
plications. However, it is possible to use different tools for rows. We extracted a reference point and stroke for each
classifying sequences (e.g. Hidden Markov Models). When arrow as described in Section 2.1. The only difference is
dealing with neural networks, it make sense to normalize that the shaft is known from the annotation. We created
inputs: a set of query strokes and rotated these strokes according
vk − mk to the reference stroke. We extracted features with respect
vˆk = , (3) to the reference point or the reference stroke depending on
σk
the used method for each query stroke and assigned a label
where vk is an input value, vˆk is the normalized value, mk based on the annotation from the database. We refer to the
and σk are the mean and the standard deviation of all values samples with the label head as positive and those with the
of the same feature from the training database, respectively. label not-head as negative samples. The FA database con-
We use this normalization to normalize the distance only. sists of 12 diagram patterns drawn by several users and it is
The advantage of proposed features is the fact that they split into training and test dataset. The training dataset con-
are simple and easy to extract (low time complexity). More- tains diagrams from 11 users (132 diagrams) and the test
over, they express relative position of the query stroke with dataset diagrams from 7 users (84 diagrams). Each dia-
respect to the reference point as well as the shape of the gram is formed of 54 strokes and contains 5 symbols and
query stroke. It is possible to reconstruct the trajectory of 10 arrows in average. We extracted 1480/834 positive and
the query stroke from the sequence of features. It leads to 1263/1019 negative samples from the training/test dataset.
simple implementation and fast evaluation. Arrows drawn by one stroke are manually segmented in the
database. However, to demonstrate our segmentation algo- 100
PrecisioncofcRNNs

rithm from Section 2.2, we created a second test dataset (ref.


as test2), where we further segmented query strokes. Ob- 99.5

tained sub-strokes created new samples with the same label 99


as the original ones. We used this dataset to show that possi-
98.5
ble oversegmentation will not lower the final precision. We

precisionc[L]
created 1252 positive and 1876 negative examples this way. 98

For our method, we used LSTM and BLSTM RNNs im- 97.5

plemented within the library JANNLab [12]. We tried dif-


97
ferent numbers of nodes in the hidden layer to get the best
performance. We always trained the network in 200 epochs 96.5
LSTM
with the following parameters: learning rate 0.001, momen- BLSTM
96 c
2 4 8 16 32
tum 0.9. We achieved the best overall precision of 99.9 % numbercofcnodescincthechiddenclayerc[−]
with the BLSTM RNN with 32 nodes in the hidden layer.
However, it might be important to find a trade-off between TimeLneededLtoLclassifyLoneLsample
precision and time complexity and thus it might be better 7

to use the LSTM RNN with only 8 nodes in the hidden


6
layer, because it is significantly faster. It gives the precision
of 99.6 % and the average time needed for classification is 5
0.79 ms. For details, refer to Figure 5. The best achieved
precision for individual classes are given in Table 1. The 4
timeL[ms]
achieved precision on the test2 with the best trained neural
3
network was not decreased and reached 99.9 %.
2
For the method of Bouteruche et al., we used a RBFN
implemented within the library Encog [10]. We set the 1
LSTM
number of the nodes in the hidden layer to be a power of BLSTM
the number of features, which leads to equally spaced RBF 0L
2 4 8 16 32
numberLofLnodesLinLtheLhiddenLlayerL[−]
centers. It is the setting qiving the best performance. We
tried two sets of features proposed by Bouteruche et al. re- Figure 5. Dependency of precision and time complexity on the
ferred in their paper by numbers 4 and 5 and we achieved number of nodes in the hidden layer of RNNs for the FA database.
the accuracy of 95.4 % and 88.2 %, respectively. It is not
surprising that the feature set number 5 reached much worse Method positive negative overall
results. It contains features expressing how much a query Ours 99.91 % 99.85 % 99.88 %
stroke fits into structuring elements of all classes. However, Bouteruche et al. (4) 98.56 % 92.75 % 95.36 %
Bouteruche et al. (5) 94.24 % 83.32 % 88.24 %
in this case, we have just two classes and the class of nega- Delaye et al. 95.17 % 86.07 % 90.17 %
tive samples contains arbitrarily shaped strokes and thus the
structuring elements are too wide. We also implemented the Table 1. Comparison of precisions for arrow heads detection.
method by Delaye et al. [7]. Their filtered fuzzy landscape
is an improvement of the Bouteruche’s feature set 5 and thus
it gives rather low precision for the very same reason. The 4.1.1 Arrow detector test
feature set number 4 gives much better results. However, it
We took all annotated symbols with rigid bodies and
was still inferior in comparison with our method – the best
tried to find arrows with the arrow detector we proposed
overall precision of 95.36 %. For more detailed results see
(query strokes for arrow heads were classified with our best
again Table 1.
BLSTM RNN). We compared the detected arrows with an-
Since we use RNN in our method, the classification has notated arrows. Let us remind that all pairs of symbols were
higher time complexity (especially with increasing com- considered. Conflicting arrow shafts were removed imme-
plexity of the net). The classification made by a RBFN is diately. However, adding arrow heads may cause another
indeed very fast. On the other hand, it is much faster to conflicts. The result of the arrow detector is a list of ar-
extract the low level features we use: 0.016 ms per sample. row candidates and a structural analysis should be done to
Feature extraction is slower in the case of fuzzy positioning: solve the conflicts. However, we tried to remove conflicts
2.89 ms per sample for the feature set number 4 and 0.99 ms by simply keeping arrows with higher confidence to see how
per sample for the feature set number 5. it affects recall and precision. The test dataset of the FA
database contains 796 arrows. We achieved the recall of to use spatial context – their relative position. The exam-
95.4 % / 94.2 % and the precision of 41.5 %/95.4 % for un- ples of the benchmark have been written on a PDA by 14
performed / performed conflict removal. Our arrow detector writers. The training database contains 4243 examples of 8
performs 106.5 stroke classifications in average per diagram writers and the test database contains 2393 examples of 6
while searching for arrow heads while there are 10 arrows writers. None of the writers is common to both data sets.
in average per diagram.

4.1.2 Diagram recognition pipeline test


We embedded our arrow detector into the diagram recogni-
tion pipeline proposed earlier [4] and made experiments on
the FA and FC databases. The FC database does not contain
annotation of arrow heads and shafts. Therefore, we used
the arrow head classifier trained on the FA database in both
cases. The results are shown in Tables 2, 3. Although there
is an improvement in both domains, it is more significant
in the FA domain. The recognition accuracy increased in
all symbol classes, which shows that misrecognized arrows Figure 6. Classes of the argument strokes in ACCENT database.
can cause further errors in classification of other symbols.
To apply our method, we set a center of each reference
Correct stroke Correct symbol segmentation
Class labeling [%] and recognition [%]
stroke’s bounding box as a reference point and extracted
Previous Proposed Previous Proposed features. We tried LSTM and BLSTM RNNs the same
Arrow 89.3 94.9 84.4 92.8 way as in the case of the Arrow database. However, we
Arrow in 78.5 85.0 80.0 84.0 achieved the precision of 91.9 % only. It turned out that
Final state 96.1 99.2 93.8 98.4
State 95.2 96.9 94.5 97.2
our features have a problem to distinguish very small ar-
Label 99.1 99.8 96.0 99.1 gument strokes like acute, apostrophe, or dieresis. These
Total 94.5 97.4 91.5 96.4 strokes often consist just of one single point. Therefore,
we decided to enrich the set of features and add local fea-
Table 2. Diagram recognition results for the FA domain. tures describing the appearance of strokes. We used four
features introduced by Otte et al. [13]: an index of the point
to distinguish long and short strokes, sine and cosine of the
Correct stroke Correct symbol segmentation
Class labeling [%] and recognition [%] angle between the current and the last line segment (zero
Previous Proposed Previous Proposed for extreme points), and sum of lengths of the current and
Arrow 85.3 88.7 74.4 78.1 the previous line segments. Let us note that the point in-
Connection 93.3 94.1 93.6 95.1 dices and distances are normalized (3). We refer to the two
Data 95.6 96.4 88.8 90.6
Decision 90.8 90.9 74.1 75.3 sets of features and associated experiments as basic and ex-
Process 93.7 95.2 87.2 88.1 tended. We achieved the best precision with the extended
Terminator 89.7 90.2 88.1 88.9 features and the BLSTM RNN with 32 nodes in the hidden
Text 99.0 99.3 87.9 89.7 layer, which was 93.6 %. The training was done again with
Total 95.2 96.5 82.8 84.43
the learning rate of 0.001 and the momentum of 0.9. The
Table 3. Diagram recognition results for the FC domain. ROC curves and time complexities are shown in Figure 7.
In the case of the method of Bouteruche et al., we used
our reimplementation and made the experiments. We con-
4.2. Accent firm the results they stated – the precision of 95.75 %.
The Accent database consists of pairs of reference and 5. Conclusions
argument strokes. The task is to classify the argument
strokes into 18 graphic gestures. Two of them correspond We have shown how important and difficult task is the
to the addition of a stroke to a character. The 16 others (see arrow recognition for the whole process of diagram recog-
Figure 6) correspond to an accentuation of their reference nition. We designed an arrow recognizer, which detects ar-
character (acute, grave, cedilla, etc.), to a punctuation sym- rows in two steps: a) detection of an arrow’s shaft, b) detec-
bol (coma, dot, apostrophe, etc.) or to an editing gesture tion of an arrow’s head. First step is easy, because the search
(space, caret return, etc.). As several subsets of gestures for a shaft is guided by detected symbols connected by the
have the same shape, the only way to discriminate them is arrow. For the second step, we proposed a novel arrow head
100
Precision%of%RNNs Acknowledgment
Basic%LSTM
90 Basic%BLSTM The first author was supported by the Grant Agency of
Extended%LSTM
80 Extended%BLSTM the CTU under the project SGS13/205/OHK3/3T/13. The
second and the third authors were supported by the Grant
70
Agency of the Czech Republic under Project P103/10/0783
precision%[L]

60 and the Technology Agency of the Czech Republic under


50 Project TE01020197 Center Applied Cybernetics, respec-
tively.
40

30 References
20
[1] A.-M. Awal, G. Feng, H. Mouchere, and C. Viard-Gaudin.
10 % First experiments on a new online handwritten flowchart
2 4 8 16 32 64
number%of%nodes%in%the%hidden%layer%[−] database. In DRR 2011, pages 1–10, 2011.
[2] F. Bouteruche, S. Macé, and E. Anquetil. Fuzzy relative po-
TimeBneededBtoBclassifyBoneBsample
sitioning for on-line handwritten stroke analysis. In Proceed-
3.5 ings of IWFHR 2006, pages 391–396, 2006.
BasicBLSTM
BasicBBLSTM [3] M. Bresler, D. Průša, and V. Hlaváč. Modeling flowchart
3 ExtendedBLSTM structure recognition as a max-sum problem. In Proceedings
ExtendedBBLSTM
of ICDAR 2013, pages 1247–1251, August 2013.
2.5
[4] M. Bresler, T. V. Phan, D. Průša, M. Nakagawa, and
2
V. Hlaváč. Recognition system for on-line sketched dia-
timeB[ms]

grams. In Proceedings of ICFHR 2014, pages 563–568,


1.5 September 2014.
[5] C. Carton, A. Lemaitre, and B. Couasnon. Fusion of statis-
1 tical and structural information for flowchart recognition. In
Proceedings of ICDAR 2013, pages 1210–1214, 2013.
0.5
[6] A. Delaye and E. Anquetil. Fuzzy relative positioning tem-
plates for symbol recognition. In Proceedings of ICDAR
0B
2 4 8 16 32 64 2011, pages 1220–1224, September 2011.
numberBofBnodesBinBtheBhiddenBlayerB[−]
[7] A. Delaye, S. Macé, and E. Anquetil. Modeling Relative
Figure 7. Dependency of precision and running time on the num- Positioning of Handwritten Patterns. In Proceedings of IGS
ber of nodes in the hidden layer of RNNs for ACCENT database. 2009, pages 122–127, 2009.
[8] M. El Meseery, M. El Din, S. Mashali, M. Fayek, and N. Dar-
wish. Sketch recognition using particle swarm algorithms. In
Proceedings of ICIP 2009, pages 2017 – 2020, 2009.
classifier based on relative stroke positioning. We presented [9] G. Feng and C. Viard-Gaudin. Stroke fragmentation based
a classification method based on low-level features using on geometry features and HMM. CoRR, 2008.
(B)LSTM RNNs. We embedded the proposed arrow detec- [10] Heaton Research, Inc. Encog Machine Learning Framework,
2013. http://www.heatonresearch.com/encog.
tor into diagram recognition pipeline and we increased the
[11] J. Herold and T. F. Stahovich. Classyseg: A machine learning
accuracy of the state-of-the-art diagram recognizer on the
approach to automatic stroke segmentation. In Proceedings
benchmark databases of finite automata and flowcharts. of SBIM 2011, pages 109–116, 2011.
We have also made the comparison with the state-of-the- [12] S. Otte, D. Krechel, and M. Liwicki. JANNLab Neural Net-
work Framework for Java. In Proceedings of MLDM 2013,
art method for relative positioning method. This method is
pages 39–46, 2013.
unable to solve the proposed task adequately and reaches
[13] S. Otte, D. Krechel, M. Liwicki, and A. Dengel. Local
the inferior precision. However, we have made the compar- feature based online mode detection with recurrent neural
ison on the task for which this method was developed and networks. In Proceedings of ICFHR 2012, pages 531–535,
it shows that our method gives slightly worse results in that 2012.
case. It implies that the fuzzy positioning might be a good [14] A. Stoffel, E. Tapia, and R. Rojas. Recognition of on-line
solution for some sort of tasks (data), but it is not a gen- handwritten commutative diagrams. In Proceedings of IC-
eral tool. On the other hand, our method seems to be more DAR 2009, pages 1211–1215, 2009.
general since it gave relatively good results in both cases. [15] A. Wolin, B. Paulson, and T. Hammond. Sort, merge, re-
Even in the case it gives slightly worse results it might be peat: An algorithm for effectively finding corners in hand-
a good alternative thanks to its simplicity and fast feature sketched strokes. In Proceedings of SBIM 2009, pages 93–
extraction. 99, 2009.

Вам также может понравиться