Академический Документы
Профессиональный Документы
Культура Документы
i =1,N
X
i
Y
i
= [X Y[ (1)
where X
i
is a component of X, is componentwise conjunction, [Z[ is the number of nonzero
components in Z.
A set of items is represented by componentwise disjunction of their codevectors
(Rachkovskij and Kussul 2001). However, when the set of sets is produced by componen-
twise disjunction of set codevectors, the information about original sets is lost, resulting in
114 COMPUTATIONAL INTELLIGENCE
superposition catastrophe (see, e.g., Rachkovskij and Kussul 2001 and references therein).
So, binding operation is needed to preserve information about item grouping in hierarchical
structures, as well as about the sequential order of items. Also, componentwise disjunc-
tion results in a vector that has more 1s than each input vector. So, to preserve sparseness,
normalization of the number of 1s is needed.
Let us provide a short description of one version of the context dependent thinning
(CDT) procedure that is used for binding and normalization in APNNs (see Rachkovskij and
Kussul 2001 for extended description and discussion).
2.2.1. Binding by Context Dependent Thinning. First, the codevector Z is formed by
disjunction of element codevectors X
i
to be bound:
Z =
i
X
i
. (2)
Then, the result Z) of binding is formed as
Z) =
k=1,K
(Z Z
(k)) = Z
k=1,K
Z
(k). (3)
Here Z
[ > [V
u
[ if u is the child (argument, element) of v (or v is the parent of u). This is
achieved by controlling the thinning factor F
CDT
dened as F
CDT
= [Z)[/[Z[. Obviously,
F
CDT
1. For F
CDT
F
CDT
the higher level codevectors have the desired property of the
increased number of 1s, where F
CDT
can be obtained experimentally or analytically for
specic relational instances. Table 1 illustrates the growth of an average number M of 1s in
codevectors of the higher levels of the Solar System analogue produced with the proposed
representation scheme and particular parameter instantiations. Note that the codevectors
here have more hierarchical levels than order levels identied for the analogue graph vertices
above.
3. RETRIEVAL OF SIMILAR ANALOGUES
To retrieve the most similar analogues of the knowledge base, all its analogues (fragments,
episodes) are rst represented as codevectors. The probe (input, target) analogue is also
represented by its codevector. Then, the most similar base analogue(s) are those having
codevectors with the maximal overlap (dot product) value with the probe codevector.
In particular,
(1) codevectors V
b
, b B of all base B analogues are constructed;
(2) codevector V
in
of the input (probe) analogue is constructed;
(3) overlaps of codevectors [V
b
V
in
[ for all base analogues b and the probe are calculated;
116 COMPUTATIONAL INTELLIGENCE
TABLE 1. An Average Number M of 1s in Codevectors of the Solar System Analogue Elements versus
Their Vertex Order and Codevector Levels. The Codevector Dimensionality N = 100,000. For the Terminal
Codevectors, M = 1,000 for Objects and Attributes, M = 2,000 for Roles. The Thinning Factor F
CDT
0.33.
(4) base analogues most similar to the probe analogue are output.
For single most similar analogue it may be summarized as
b
/
= argmax
bB
[V
b
V
in
[. (5)
The most similar analogues may also be selected as top L
), or O(L1 log L
),
where n is the number of analogue elements, L
).
The ARCS complexity is usually dominated by the second (computationally expensive)
stage with the computational complexity O(L
2
n
4
) as estimated by Thagard et al. (1990).
Because the ARCS complexity is worse that that of MAC/FAC, let us compare SBDR and
MAC/FAC.
118 COMPUTATIONAL INTELLIGENCE
TABLE 2. Estimations of the Retrieval Computational Complexity for SBDR and MAC/FAC.
Example Think Think Medium Medium Medium Large Large Large
knowledge bases Net Net KB1 KB2 KB3 KB4 KB5 KB6
L 10
2
10
2
10
5
10
5
10
5
10
9
10
9
10
9
n 10
2
10
2
10 10
2
10
4
10 10
2
10
4
L
10 10 10
4
10
4
10
3
10
8
10
8
10
8
R 10 10 10 10 10
4
10 10 10
3
M 10
2
10
2
10
3
10
3
10
4
10
3
10
3
10
5
L
/
10 10 10
4
10
4
10
3
10
8
10
7
10
8
L
//
10 10 10
4
10
4
10
3
10
8
10
7
10
8
SBDR1 O(ML
/
) 10
3
10
4
10
7
10
7
10
7
10
11
10
10
10
13
SBDR2 O(L) 10
3
10
3
10
5
10
5
10
5
10
9
10
9
10
9
MAC1 O(RL
//
) 10
2
10
2
10
5
10
5
10
7
10
9
10
8
10
11
MAC2 O(L) 10
2
10
3
10
5
10
5
10
5
10
9
10
9
10
9
FAC O(n
2
L
) 10
5
10
5
10
6
10
9
10
11
10
10
10
12
10
16
SBDR total 10
3
10
4
10
7
10
7
10
7
10
11
10
10
10
13
MAC/FAC total 10
5
10
5
10
6
10
9
10
11
10
10
10
12
10
16
Note: Italic shows the results dominating particular stage; bold face shows the best overall results.
To explore scaling of the models, we consider some knowledge base examples.
They include the real knowledge base ThinkNet used in the experiments later, as well
as possible parameters of larger, but unreal bases. The results of the computational
complexity estimations are given in Table 2. They show that for the real and simu-
lated knowledge bases examples, the computational complexity of SBDR is 0.00110 of
MAC/FAC.
Note, that the SBDR complexity in the Table 2 is always dominated by the SBDR1
stage, whereas the MAC/FAC complexity is dominated by the FAC stage. The main dif-
ference is that n is explicitly involved in MAC/FAC and is not involved in SBDR. In
SBDR, n was involved at the encoding step, when codevectors for all base analogues were
formed.
For the knowledge bases where all episodes include the same features with the same
corresponding frequencies but with different structure, MAC should return all L episodes,
so that FAC should work with the whole base, and its complexity becomes O(n
2
L). For the
bases where some groups of episodes have this property, MAC should return all the episodes
of the group.
Another major factor is the number of nonzero components R in the MACfeature vectors
and M in the SBDR codevectors, as well as L
/
and L
//
. For the considered examples, the
MAC complexity is always less than the SBDR complexity. So, the MAC/SBDR hybrid may
seem benecial for particular use cases.
The similarity estimation procedures can be naturally parallelizede.g., as im-
plemented in search engines. Also, associative memory implementations are possi-
ble (Frolov, Rachkovskij, and Husek 2002), as well as implementations using hash-
ing that provide even lower computational complexity (Frolov, Rachkovskij, and Husek
2006).
Using similarity clustering of analogues as preprocessing of the base could decrease
retrieval complexity compared to indexing, both for the codevector and for the graph
representations.
SIMILARITY-BASED RETRIEVAL 119
TABLE 3. Types of Analogue Similarity.
Similarity Common 1st-order Common high- Common object
type relations order relations attributes Examples using animal stories
Base dog(Spot); human(Jane);
cause(bite(Spot, Jane), ee(Jane,
Spot))
LS dog(Fido); human(John);
cause(bite(Fido, John), ee(John,
Fido))
SF dog(Fido); human(John);
cause(ee(John, Fido), bite(Fido,
John))
AN mouse(Mort); cat(Felix);
cause(bite(Felix, Mort),
ee(Mort, Felix))
FOR mouse(Mort); cat(Felix);
cause(ee(Mort, Felix),
bite(Felix, Mort))
4. EXPERIMENTS
Let us investigate performance of the proposed approach for nding similar analogues
using the knowledge bases and experimental scheme that were previously applied in the
study of leading analogy models MAC/FAC (Forbus et al. 1995) and ARCS (Thagard et al.
1990).
4.1. Experimental Scheme
4.1.1. Similarity Types of Analogues. It is known that humans retrieve some types of
analogues more readily than others. By analyzing various types of analogues, their similarity
types have been identied and presented in terms of retrievability order (Gentner 1983; Ross
1989; Wharton et al. 1994; Forbus et al. 1995).
Similarity types are considered compared to some base analogue and have varied types of
commonalities summarized in Table 3 adapted from Forbus et al. (1995). All episodes share
rst-order relations. The literal similarity (LS) episodes also share both higher order relations
and object attributes. The true analogy (AN) episodes share higher order relations but have
different attributes. The surface features (SF) episodes share attributes but have different
higher order relations. The rst order relations (FOR) episodes differ both in attributes and
higher order relational structure. Examples using animal stories (Thagard et al. 1990; adapted
by Plate) are also given in Table 3. They have attributes dog and human, rst-order relations
bite and ee, higher order relation cause.
Generally, for analogical retrieval it is considered that semantic similarity is preferred to
relational similarity, so that the retrievability order can be expressed as LS SF > AN
FOR (Gentner 1983; Ross 1989; Wharton et al. 1994; Forbus et al. 1995).
4.1.2. Experimental Setup and Test Knowledge Bases. The experimental knowledge
base is constructed so that it includes analogues with different and known type and degree of
120 COMPUTATIONAL INTELLIGENCE
similarity to each other. Some analogues are selected as the target (input, probe) analogues,
and some as the base ones. Because it is known in advance which analogues are more similar
to each other and should be retrieved by humans, performance of a retrieval system can be
estimated compared to that gold standard.
We have conducted experimental testing using the knowledge base employed for testing
of ARCS (Thagard et al. 1990; later became available as ThinkNet, THNET 2010), with the
additions supplemented and used for testing of MAC/FAC (Forbus et al. 1995, see acknowl-
edgments). The knowledge base consists of formal descriptions of analogical episodes and
situations. These descriptions were constructed manually by experts on the basis of textual
material used in psychological experiments to elucidate characteristics of human analogical
reasoning.
ThinkNet includes Lisp-formalized descriptions of 100 Aesops fables, 25 Shakespeares
plays, 5 stories about Karla the Hawk, as well as West Side Story musical and 4 Sour Grapes
fable variations. The MAC/FAC base additionally includes 45 episodes consisting of 9
base analogues and 4 versions of each with different similarity types (LS, SF, AN, FOR).
On average, an analogue in those knowledge bases contains 90 propositions, including
50 attributes and 40 relations encoded in Lisp.
The retrieval results are presented in terms of the retrievability order and compared
to the results of the most advanced model MAC/FAC. Also, for our system we calculated
(and estimated where possible for MAC/FAC) the standard information retrieval measures
of precision P and recall R:
R =n 1/n2; P =n1/n3, (6)
where n1 is the number of correct analogues returned, n2 is the number of correct analogues
in the base, and n3 is the total number of analogues returned.
For our SBDR model, we also investigated dependencies of recall and precision on the
dimensionality of codevectors.
4.1.3. Codevector Implementation and Parameters. Experiments were implemented
using SLANG (Slipchenko 2005a)a symbolic language for distributed representation that
combines a vivid description of analogues as predicate expressions with simple description
of operations with distributed representations. SLANG was used to convert the knowledge
base descriptions to codevector internal representations using the representation scheme of
Section 2 and to program the analogical retrieval process itself.
Codevectors and their parameters have been chosen as follows. The terminal codevec-
tors for attributes, objects and relational roles were randomly generated and then memorized
and used the same for any occurrence of the particular terminal item. Dimensionality of
codevectors was N = 10
5
if not stated otherwise. The average number of 1s M(attribute) =
M(object) = 1,000 has been chosen as in Rachkovskij (2001), whereas M(role) = 2,000 >
M(attribute) = M(object) = 1,000 was chosen to reect importance of relations. In-
stantiations of those parameters and of F
CDT
= {0.1, 0.2} were selected from the in-
terval that provided correct retrievability scores for the animal episodes presented in
Table 3.
All analogues were represented by codevectors, and for each target (probe) analogue
codevector from the experimental set the dot products (overlaps) with the codevectors of
the base analogues were calculated. One or several most similar analogues were selected as
the retrieval result. The results reported below were averaged by 100 instances of random
terminal codevectors used to construct the codevectors of analogues.
SIMILARITY-BASED RETRIEVAL 121
TABLE 4. Recall Values. Test 1: Retrieval of the Base-Type Analogues Given Different Versions of Probes
(Similarity Types LS, SF, and AN). The FOR Analogues Were Also in the Base Serving as Distractors.
Probe LS SF AN
MAC/FAC (10%) 1.00 0.89 0.67
SBDR (10%) F
CDT
= 0.1 1.00 1.00 0.76
SBDR (10%) F
CDT
= 0.2 1.00 1.00 0.78
SBDR (1) F
CDT
= 0.1 1.00 0.79 0.37
SBDR (1) F
CDT
= 0.2 1.00 0.76 0.58
4.2. Results of Experiments
4.2.1. Retrievability Order and Recall. Test 1. For this test we used the scheme of
Cognitive simulation experiment 1 of Forbus et al. (1995). Nine base analogues of Karla
the Hawk story and their FOR variants which served as distractors were placed to the base.
The LS, SF, and AN variants of the base analogues were in turn used as a probe to the
retrieval system, and the number of correct base analogues in the returned list was counted.
As in Forbus et al. (1995), the returned list contained analogues within the 10% interval
of similarity value relative to the most similar base analogue. In addition, for SBDR we
considered the results when single most similar base analogue was returned.
The results are shown in Table 4. The values of recall were calculated for each type of
probe: LS, SF, and AN. Results for MAC/FAC and humans are from Forbus, Gentner, and
Law (taking into account that their proportion of correct retrievals actually provides recall
values). Forbus, Gentner, and Law also present results of related experiments with human
subjects, which showed recall of 0.56 for the LS analogues, 0.53 for SF, 0.12 for AN, and
0.09 for FOR.
As mentioned in Forbus et al. (1995), MAC/FAC performance is much better than that
of human subjects, perhaps partly because of the differences in the experimental setup.
However, the key point is that the MAC/FAC results show the same retrievability order
LS SF > AN.
The SBDR results also show the same retrievability order with higher recall values.
Though we used more 1s in the codevectors of roles than in the codevectors of attributes
or objects (Section 4.1.3), the resulting similarity pattern showed preference to the surface
similarity compared to the relational one. The results for SBDR (1) with just 1 (single best)
selected analogue show that they are close to MAC/FAC (10%).
This deterioration of the MAC/FACresults is because of the MACstage that uses attribute
vectors of analogues (frequencies of objects, attributes and relations) to estimate similarity
by dot product. This stage is computationally efcient but does not consider structure, and
so misses some valid analogues.
Test 2. For the next test we used the scheme of Cognitive simulation experiment 2
of Forbus et al. (1995). The LS, SF, AN, and FOR variants of nine base Karla the Hawk
analogues were placed to the base. Each of nine base analogues was used as a probe. As
Forbus et al. (1995) mention, this is almost the reverse of the task human subjects faced, and
more difcult.
For each analogue type, the number of its occurrences in the returned list was counted.
The recall values for each analogue type are given in Table 5. Again, they show correct
retrievability order (matching the results for human subjects) both for MAC/FAC and SBDR
(except for ANat SBDR(10%) for the thinning factor F
CDT
=0.2). For the analogues that are
122 COMPUTATIONAL INTELLIGENCE
TABLE 5. Recall Values. Test 2: Retrieval of Different Analogue Similarity Types (LS, SF, AN, and FOR)
Given the Base Versions as Probes.
Retrievals LS SF AN FOR Other
MAC (10%) 0.78 0.78 0.33 0.22 1.33
MAC/FAC (10%) 0.78 0.44 0.22 0.00 0.22
SBDR (10%) F
CDT
= 0.1 0.81 0.78 0.68 0.09 0.39
SBDR (10%) F
CDT
= 0.2 0.87 0.78 0.91 0.22 0.84
SBDR (1) F
CDT
= 0.1 0.48 0.30 0.22 0.00 0.00
SBDR (1) F
CDT
= 0.2 0.50 0.26 0.22 0.00 0.02
TABLE 6. Precision of Retrieval Given Different Similarity Types (LS, SF, AN, and FOR) of Analogues as
Probes.
Probe LS SF AN FOR
MAC (10%) 0.5 0.1 0.08 0.09
MAC/FAC (10%) 1.00 1.00 1.00 0.5
SBDR (1) F
CDT
= 0.1 1.00 1.00 1.00 1.00
SBDR (1) F
CDT
= 0.2 1.00 1.00 1.00 1.00
SBDR (10%) F
CDT
= 0.1 1.00 1.00 1.00 1.00
SBDR (10%) F
CDT
= 0.2 1.00 1.00 1.00 0.63
similarity types variations of the base analogue used as a probe, the recall values for SBDR
are higher than those for MAC/FAC, and even than those for MAC alone. At the same time,
SBDR (10%) F
CDT
=0.1 returns much less Other analogues (i.e., irrelevant analoguesthat
is, any retrieval from a story set different from the one to which the base belongs) compared
to MAC (0.39 vs. 1.33), and comparable to MAC/FAC (0.39 vs. 0.22). SBDR (1) for
F
CDT
= 0.1 returns no Other analogues at all.
4.2.2. Retrieval Precision. Test 3. To investigate precision, we used the MAC/FAC
experimental scheme and results from Computational experiment 4: Hawk stories of Forbus
et al. (1995). All fables and plays of the ThinkNet analogues and the Karla base analogue
were placed to the knowledge base in memory. The LS, SF, AN, and FOR versions of the
Karla base analogue were used as probes. For MAC and FAC, the numbers of returned
analogues in the result list were taken from table 16 in Forbus et al. (1995) and precision
values were calculated. The obtained results are shown in Table 6.
For all experiments, the Karla base story was returned as the most similar analogue. It
means that recall is always 1. It may be noted that the results of ARCS (Thagard et al. 1990)
reported in Forbus et al. (1995) provide the proper Karla base story as the most similar only
for the LS probe.
The precision results are as follows: SBDR demonstrates the maximal precision = 1
for all cases, except FOR at SBDR (10%) F
CDT
= 0.2, where precision = 0.63. This last
result is comparable to MAC/FAC (10%) for FOR, where precision =0.5. For MAC top 10%
similarity interval, the precision results are 510 times lower for SF, AN, and FOR; and for
LS, they are 3050% lower. Low precision values for MAC lead to more candidates to be
processed at the second computationally expensive stage of FAC(using SME, see Section 1.3
and Forbus et al. 1995). Remind that the SBDR results are single-stage.
SIMILARITY-BASED RETRIEVAL 123
(a)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
100 1000 10000 100000
N
R
MAC/FAC LS SF AN
F_CDT=0.1 LS SF AN
F_CDT=0.2 LS SF AN
F_CDT=0.5 LS SF AN
F_CDT=0.8 LS SF AN
(b)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
100 1000 10000 100000
N
P
F_CDT=0.1
LS
SF
AN
F_CDT=0.2
LS
SF
AN
F_CDT=0.5
LS
SF
AN
F_CDT=0.8
LS
SF
AN
FIGURE 4. Reimplementing Test 1: Retrieval of the base-type analogs given different similarity types (LS, SF,
and AN) of probes. Codevector dimensionality N was varied preserving the constant fraction of unit components
(1s). (a) recall; (b) precision.
4.2.3. Varying Codevector Size. Test 4. The computational complexity of SBDR is
determined by the number M of 1s in codevectors. So, we repeated the SBDR part of
Test 1 for the varied values of M preserving the M/N values indicated in Section 4.1.3
(and thus obtaining the corresponding values N of codevector dimensionality). The results
in Figure 4 show that for N decreasing from 10
5
to 10
3
(and proportionally decreasing
M) the recall values remain approximately the same and precision decreases. A noticeable
drop of recall and precision is observed at N = 500 and less, where M becomes 5 and
less, and so the CDT procedure becomes unstable (it may output zero codevectors or the
input codevectors, depending of their random realizations). Also, the similarity values (see
equation (1)) themselves become unstable for small Ms.
124 COMPUTATIONAL INTELLIGENCE
5. CONCLUSION
To increase performance of operating with structures, we develop distributed represen-
tations of structures that carry immediate information on both the set of structural elements
of various hierarchical levels and their structural organization. Such representations allow
for holistic processing of structures, without the need to follow edges or to match individual
vertices of graph structures. However, decoding structure elements and operating with them
individually is also possible, if required, as discussed elsewhere (Rachkovskij and Kussul
2001; Rachkovskij 2004; Slipchenko, Rachkovskij, and Misuno 2005).
Being an instance of vector representations, distributed representations allow: an easy
estimation of complex object similarity by measures of vector similarity; a massively parallel
implementation; and application of the whole arsenal of methods elaborated for vector spaces.
Compared to symbolic and neural network localist representations of graphs, they provide:
a better account of semantic content; exibility; and the ability to cope with noisy and unex-
pected input. Compared to localist vector representations, distributed representations allow
a natural representation and calculation of gradual similarity between items; make a more
efcient use of representational resources; are more robust and neurobiologically plausible;
and may be considered as the idea of how related representations and processes are imple-
mented in the brain. Sparse distributed representations that we develop additionally allow an
effective use of inverse indexing and other efcient computer structures and algorithms, a
simple and efcient parallel hardware implementation, and are even more neurobiologically
plausible.
Usually, when graph embeddings in vector spaces are investigated, of interest is how
accurately they approximate well-known graph similarity measures in the original space.
However, because those graph-theoretic measures may be inadequate for similarity-based
retrieval from knowledge bases, nding close analogues by vector space embeddings may
appear more appropriate for a particular application.
In this paper we have described a scheme for distributed representation of analogues
and experimentally investigated it on the standard knowledge base used for testing leading
analogical retrieval models. The results showthat our codevector approach provided the same
retrievability order as the best available retrieval model MAC/FAC of Forbus et al. (1995)
and as the results of experiments with human subjects. They also show some increase in
recall and noticeable increase in precision.
The reason is that MAC/FAC is a combined vector-symbolic model that uses a two-
stage approach to retrieval, where the rst MAC stage does not take structure into account
and so degrades the retrieval quality. On the other hand, the second FAC stage that estimates
similarity taking structure into account has a high computational complexity, at least O(n
2
L
),
i.e., quadratic in the number of analogue elements n and linear in the number of candidates
L
provided by the rst stage. Note that the computational complexity of the ARCS retrieval
model of Thagard et al. (1990) is even higher: O(n
4
L
2
). The reason for high computational
complexity of traditional methods is their use of some kind of explicit procedure for nding
corresponding elements of analogues that introduces quadratic or higher degrees of n to the
complexity estimation.
In contrast, in codevector approaches the number of analogue elements n inuences only
the complexity of their codevector construction. In the process of similarity estimation, the
corresponding elements automatically nd each other as common 1s of codevectors. So,
computational complexity dominated by O(ML
/
) is proportional to the number M of 1s in the
codevector (that may be considered constant) and the average number L
/
of the base analogues
whose codevectors has 1 in the particular component. Such a moderate computational cost
makes this approach particularly well suited for similarity-based retrieval from large-scale
SIMILARITY-BASED RETRIEVAL 125
knowledge bases having complex records with many elements in each. We believe that similar
results could be obtained with other (appropriately modied) structure-sensitive distributed
representation schemes, such as HRRs of Plate (1991, 2003), Binary Spatter Codes of
Kanerva (1996, 2009), etc.
Let us note that the knowledge base we used for testing has been previously used only
for testing of MAC/FAC and, partially, ARCS. Other known models of analogical retrieval
used (at most) small parts of this base or small number of other episodes. Nevertheless, this
knowledge base is still rather small. Also, its attribute set is very limited, thus reecting poor
account of semantics.
We argue that the developed representation scheme is useful for large-scale knowledge
bases and free-structured database applications (see also Beal and Roberts 2009). This is both
because of a computationally efcient estimation of structure similarity and because of po-
tential of taking into account semantics of objects by natural incorporating their descriptions
in terms of numerical feature vectors (Rachkovskij et al. 2005a, b, 2012; Slipchenko 2005b)
or context vectors reecting meaning of concepts they represent (Misuno, Rachkovskij, and
Slipchenko 2005; Misuno et al. 2005; Jones and Mewhort 2007; Sahlgren et al. 2008). Par-
ticular aspects of codevector representations and approach, as a whole, should be elaborated
taking into account specic challenges of demanding applications with knowledge bases and
free-structured databases.
ACKNOWLEDGMENTS
We thank Ken Forbus and Paul Thagard for generously providing their analogical
episodes that we used in the experiments. We are grateful to three anonymous reviewers
for their valuable comments on an earlier version of the manuscript. D.R. also thanks Pentti
Kanerva, Art Markman, and Tony Plate for helpful discussions during various stages of this
research.
REFERENCES
AAMODT, A., and E. PLAZA. 1994. Case-based reasoning: Foundational issues, methodological variations, and
system approaches. AI Communications, 7(1):3959.
AHA, D. W. 1998. The omnipresence of case-based reasoning in science and application. Knowledge-Based
Systems, 11(56):261273.
BEAL, J., and J. ROBERTS. 2009. Enhancing methodological rigor for computational cognitive science: Complexity
analysis. In Proceedings of the 31th Annual Conference of the Cognitive Science Society. Edited by N.A.
Taatgen and H. van Rijn. Cognitive Science Society: Austin, TX, pp. 99104.
BERGMANN, R. 2002. Experience Management: Foundations, Development Methodology, and Internet-Based
Applications, Vol. 2432 of Lecture Notes in Computer Science, Springer: Berlin, New York, pp. 393.
BERGMANN, R., K.-D. ALTHOFF, S. BREEN, M. G OKER, M. MANAGO, R. TRAPH ONER, and S. WESS. 2009.
Developing industrial case-based reasoning applications: The INRECA methodology. In Lecture Notes in
Computer Science/Lecture Notes in Articial Intelligence. Springer: Berlin.
BJORNESTAD, S. 2003. Analogical reasoning for reuse of object-oriented specications. In Case-Based Reasoning
Research and Development, Vol. 2689 of Lecture Notes in Computer Science. Edited by K. D. Ashley and
D. G. Bridge. Springer: Berlin/Heidelberg, pp. 5064.
BORNER, K. 1993. Structural similarity as guidance in case-based design. In Proceedings of the First European
Workshop on Case-Based Reasoning. Springer: Berlin, pp. 197208.
BROWNE, A., and R. SUN. 2001. Connectionist inference models. Neural Networks, 14(10): 13311355.
126 COMPUTATIONAL INTELLIGENCE
BUNKE, H. 1997. On a relation between graph edit distance and maximum subgraph. Pattern Recognition Letters,
18:689694.
BUNKE, H. 1999. Error correcting graph matching: On the inuence of the underlying cost function. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 21(9):917922.
BUNKE, H. 2000a. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of
Vision Interface 2000, pp. 8288.
BUNKE, H. 2000b. Recent developments in graph matching. In Proceedings of the Fifteenth International Con-
ference on Pattern Recognition, Vol. 2, pp. 117124.
BUNKE, H., and B. T. MESSMER. 1993. Structural similarity as guidance in case-based design. In Proceedings of
the First European Workshop on Case-Based Reasoning. Springer: Berlin, pp. 106118.
BUNKE, H., X. JIANG, and A. KANDEL. 2000. On the minimum common supergraph of two graphs. Computing,
65:1325.
CHAMPIN, P. A., and C. SOLNON. 2003. Measuring the similarity of labeled graphs. In Proceedings of the Fifth
International Conference on Case-Based Reasoning. Springer: Berlin, pp. 8095.
CHAUDHRI, A. B., A. RASHID, and R. ZICARI. 2003. XML Data Management: Native XML and XML-Enabled
Database Systems. Addison-Wesley Professional: Redwood City, CA.
DOYLE, D., P. CUNNINGHAM, D. BRIDGE, and Y. RAHMAN. 2004. Explanation oriented retrieval. In Proceedings
of the Seventh European Conference on Case-Based Reasoning. Springer: Berlin, pp. 157168.
ELIASMITH, C., and P. THAGARD. 2001. Integrating structure and meaning: A distributed model of analogical
mapping. Cognitive Science, 25(2):245286.
FALKENHAINER, B., K. D. FORBUS, and D. GENTNER. 1989. The structure-mapping engine: Algorithm and
examples. Articial Intelligence, 41:163.
FELLBAUM, C. 1998. WordNet: An Electronic Lexical Database. The MIT Press: Cambridge, MA.
FERN ANDEZ, M.-L., and G. VALIENTE. 2001. A graph distance metric combining maximum common subgraph
and minimum common supergraph. Pattern Recognition Letters, 22:753758.
FORBUS, K. 2001. Exploring analogy in the large. In The Analogical Mind: Perspectives from Cognitive Science.
Edited by D. Gentner, K. Holyoak, and B. Kokinov. MIT Press: Cambridge, MA, pp. 2358.
FORBUS, K. D., D. GENTNER, and K. LAW. 1995. MAC/FAC: A model of similarity-based retrieval. Cognitive
Science, 19(2):141205.
FORBUS, K., and T. HINRICHS. 2006. Companion Cognitive Systems: Astep towards human-level AI. AI Magazine,
27(2):8395.
FORBUS, K., K. LOCKWOOD, and A. SHARMA. 2009. Steps towards a 2nd generation learning by reading system.
In Proceedings of AAAI Spring Symposium on Learning by Reading.
FRASCONI, P., M. GORI, and A. SPERDUTI. 1998. A general framework for adaptive processing of data structures.
IEEE Transactions on Neural Networks, 9(5):768786.
FROLOV, A. A., D. A. RACHKOVSKIJ, and D. HUSEK. 2002. On information characteristics of Willshaw-like
auto-associative memory. Neural Network World, 2:141157.
FROLOV, A. A., D. HUSEK, and D. A. RACHKOVSKIJ. 2006. Time of searching for similar binary vectors in
associative memory. Cybernetics and Systems Analysis, 5:615623.
GALLANT, S. I. 2000. Context vectors: A step toward a Grand Unied Representation. In Hybrid Neural
Systems, Lecture Notes in Computer Science, Vol. 1778. Edited by S. Wermter, and R. Sun. Springer:
Berlin, pp. 204210.
GAYLER, R. 1998. Multiplicative binding, representation operators, and analogy. In Advances in Analogy Re-
search: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences. Edited by
K. Holyoak, D. Gentner, and B. Kokinov. New Bulgarian University: Soa, Bulgaria.
GAYLER, R. W., and S. D. LEVY. 2009. A distributed basis for analogical mapping. In Proceedings of the Second
International Analogy Conference. NBU Press: Soa, Bulgaria, pp. 165174.
GENTNER, D. 1983. Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7:155170.
SIMILARITY-BASED RETRIEVAL 127
GENTNER, D., and A. B. MARKMAN. 2003. Analogy-based reasoning and metaphor. In The Handbook of Brain
Theory and Neural Networks. Edited by M. A. Arbib. The MIT Press: Cambridge, MA, pp. 106109.
GRAY, B., G. S. HALFORD, W. H. WILSON, and S. PHILLIPS. 1997. A neural net model for mapping hierarchically
structured analogs. In Proceedings of the Fourth Conference of the Australasian Cognitive Science Society,
University of Newcastle, NSW, Australia.
HOFSTADTER, D. R., and M. MITCHELL. 1988. Conceptual slippage and mapping: A report of the copycat project.
In Proceedings of the Tenth Annual Conference of Cognitive Science Society, pp. 601607.
HOLYOAK, K. J., and P. THAGARD. 1989. Analogical mapping by constraint satisfaction. Cognitive Science,
13(3):295355.
HUMMEL, J., and K. HOLYOAK. 1997. Distributed representations of structure: A theory of analogical access and
mapping. Psychological Review, 104:427466.
JIANG, X., and H. BUNKE. 1999. Optimal quadratic-time isomorphism of ordered graphs. Pattern Recognition,
32:12731283.
JONES, M. N., and D. J. K. MEWHORT. 2007. Representing word meaning and order information in a composite
holographic lexicon. Psychological Review, 114:137.
KANERVA, P. 1996. Binary spatter-coding of ordered k-tuples. In Articial Neural Networks, Proceedings of
ICANN 96. Edited by C. von der Malsburg, W. von Seelen, J. Vorbruggen, and B. Sendhoff. Springer-
Verlag: Berlin, pp. 869873.
KANERVA, P. 2009. Hyperdimensional computing: An introduction to computing in distributed representation
with highdimensional random vectors. Cognitive Computation, 1:139159.
KEANE, M. T., T. LEDGEWAY, and S. DUFF. 1994. Constraints on analogical mapping: A comparison of three
models. Cognitive Science, 18:287334.
KOKINOV, B. 1988. Associative memory based reasoning: How to represent and retrieve cases. In Articial
Intelligence III: Methodology, Systems, Applications. Edited by T. OShea, and V. Sgurev. Elsevier Science
Publishers B.V.: Amsterdam, the Netherlands, pp. 5158.
KOKINOV, B., and R. FRENCH. 2003 Computational models of analogy-making. In Encyclopedia of Cognitive
Science. Edited by L. Nadel. Nature Publishing Group: London, pp. 113118.
KUSSUL, E. M. 1992. Associative Neuron-Like Structures. Naukova Dumka: Kiev. (In Russian)
KUSSUL, E. M., T. N. BAIDYK, D. C. WUNSCH, O. MAKEYEV, and A. MARTIN. 2006. Permutation coding
technique for image recognition systems. IEEE Ttransactions on Neural Networks, 17(6):15661579.
LEVI, G. 1972. A note on the derivation of maximal common subgraphs of two directed or undirected graphs.
Calcolo, 9:341354.
LOPEZ DE MANTARAS, R., D. MCSHERRY, D. BRIDGE, D. LEAKE, B. SMYTH, S. CRAW, B. FALTINGS, M. L.
MAHER, M. T. COX, K. FORBUS, M. KEANE, A. AAMODT, and I. WATSON. 2005. Retrieval, reuse, revision
and retention in case-based reasoning. Knowledge Engineering Review, 20(3):215240.
MARKMAN, A. B., D. A. RACHKOVSKIJ, I. S. MISUNO, and E. G. REVUNOVA. 2003. Analogical reasoning techniques
in intelligent counterterrorism systems. Informational Theories and Applications, 10(2):139146.
MARKMAN, A. B., and D. GENTNER. 2000. Structure mapping in the comparison process. American Journal of
Psychology, 113:501538.
MCGREGOR, J. J. 1982. Backtrack search algorithms and the maximal common subgraph problem. Software
Practice and Experience, 12:2334.
MCSHERRY, D. 2003. Similarity and compromise. In Proceedings of the Fifth International Conference on Case-
Based Reasoning. Springer: Berlin, pp. 291305.
MEDASANI, S., R. KRISHNAPURAM, and Y. S. CHOI. 2001. Graph matching by relaxation of fuzzy assignments.
IEEE Transactions on Fuzzy Systems, 9(1):173182.
MESSMER, B. T., and H. BUNKE. 1998. Anewalgorithmfor error-tolerant subgraph isomorphismdetection. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 20(5):493504.
MISUNO, I. S., D. A. RACHKOVSKIJ, and S. V. SLIPCHENKO. 2005. Vector and distributed representations reecting
semantic relatedness of words. Mathematical Machines and Systems, 3:5066. (In Russian)
128 COMPUTATIONAL INTELLIGENCE
MISUNO, I. S., D. A. RACHKOVSKIJ, S. V. SLIPCHENKO, and A. M. SOKOLOV. 2005. Searching for text information
with vector representations. Problems in Programming, 4:5059. (In Russian)
PAGE, M. 2000. Connectionist modelling in psychology: A localist manifesto. Behavioral and Brain Sciences,
23:443512.
PELILLO, M. 1999. Replicator equations, maximal cliques, and graph isomorphism. Neural Computation,
11:19331955.
PLATE, T. A. 1991. Holographic Reduced Representations: Convolution algebra for compositional distributed
representations.In Proceedings of the 12th International Joint Conference on Articial Intelligence (IJCAI).
Edited by J. Mylopoulos and R. Reiter. Morgan Kaufmann: San Mateo, CA, pp. 3035.
PLATE, T. A. 1994a. Estimating structural similarity by vector dot products of Holographic Reduced Represen-
tations. In Advances in Neural Information Processing Systems 6 (NIPS93). Edited by J. D. Cowan, G.
Tesauro, and J. Alspector. Morgan Kaufmann: San Mateo, CA, pp. 11091116.
PLATE, T. A. 1994b. Distributed representations and nested compositional structure Ph.D. Thesis, Depart-
ment of Computer Science, University of Toronto, Toronto, Canada Available at http://internet.cybermesa.
com/champagne/tplate/.
PLATE, T. A. 2000. Analogy retrieval and processing with distributed vector representations. Expert Systems:
The International Journal of Knowledge Engineering and Neural Networks, Special Issue on Connectionist
Symbol Processing, 17(1):2940.
PLATE, T. A. 2003. Holographic Reduced Representation: Distributed Representation for Cognitive Science.
CSLI Publications: Stanford, CA, pp. 300.
RAMSCAR, M., and D. YARLETT. 2003. Semantic grounding in models of analogy: An environmental approach.
Cognitive Science, 27:4171.
RACHKOVSKIJ, D. A. 2001. Representation and processing of structures with binary sparse distributed codes.
IEEE Transactions on Knowledge and Data Engineering, 13(2):261276.
RACHKOVSKIJ, D. A. 2004. Some approaches to analogical mapping with structure sensitive distributed represen-
tations. Journal of Experimental and Theoretical Articial Intelligence, 16(3):125145.
RACHKOVSKIJ, D. A., and E. M. KUSSUL. 2001. Binding and normalization of binary sparse distributed represen-
tations by context-dependent thinning. Neural Computation, 13(2):411452.
RACHKOVSKIJ, D. A., I. S. MISUNO, and S. V. SLIPCHENKO. 2012. Randomized projective methods for the
construction of binary sparse vector representations. Cybernetics and Systems Analysis, 48(1):146156.
RACHKOVSKIJ, D. A., S. V. SLIPCHENKO, E. M. KUSSUL, and T. N. BAIDYK. 2005a. Properties of numeric codes
for the scheme of random subspaces RSC. Cybernetics and Systems Analysis, 4:509520.
RACHKOVSKIJ, D. A., S. V. SLIPCHENKO, I. S. MISUNO, E. M. KUSSUL, and T. N. BAIDYK. 2005b. Sparse binary
distributed encoding of numeric vectors. Journal of Automation and Information Sciences, 11:4761.
RIESEN, K., S. FANKHAUSER, H. BUNKE, and P. DICKINSON. 2009. Efcient suboptimal graph isomorphism. In
Graph-Based Representations in Pattern Recognition, Vol. 5534 of Lecture Notes in Computer Science.
Edited by A. Torsello, F. Escolano, and L. Brun. Springer: Berlin, pp. 124133.
RIESEN, K., M. NEUHAUS, and H. BUNKE. 2007. Graph embedding in vector spaces by means of prototype
selection. In Graph-Based Representations in Pattern Recognition, Vol. 4538 of Lecture Notes in Computer
Science. Edited by F. Escolano and M. Vento. Springer: Berlin, pp. 383393.
RISSLAND, E. L. 1983. Examples in legal reasoning: Legal hypotheticals. In Proceedings of IJCAI 1983. Morgan
Kauffman: San Mateo, CA, pp. 9093.
ROBLES-KELLY, A., and E. R. HANCOCK. 2007. A Riemannian approach to graph embedding. Pattern Recogni-
tion, 40:10241056.
ROSS, B. H. 1989. Distinguishing types of supercial similarities: Different effects on the access and use of
earlier examples. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(3):456468.
SAHLGREN, M., A. HOLST, and P. KANERVA. 2008. Permutations as a Means to Encode Order in Word Space. In
Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci08), Washington, DC,
pp. 2326.
SIMILARITY-BASED RETRIEVAL 129
SCHENKER, A., H. BUNKE, M. LAST, and A. KANDEL. 2005. Graph-theoretic Techniques for Web Content Mining.
World Scientic: River Edge, NJ.
SCHANK, R. C. 1982. Dynamic Memory: A Theory of Reminding and Learning in Computers and People,
Cambridge University Press: New York.
SLIPCHENKO, S. V. 2005a. SLANG: A symbolic language for distributed representation. In Proceedings of the
Fourteenth International Conference on Neurocybernetics, Vol. 2, Rostov-on-Don, Russia, pp. 237239.
SLIPCHENKO, S. V. 2005b. Distributed representations in the problems of processing structured numeric and
symbolic information. System Technologies, 6(41):134141. (In Russian)
SLIPCHENKO, S. V., D. A. RACHKOVSKIJ, and I. S. MISUNO. 2005. Decoding binary distributed representations of
numerical vectors. Computer Mathematics, 3:108120. (In Russian)
SMYTH, B., and M. T. KEANE. 1998. Adaptation-guided retrieval: Questioning the similarity assumption in
reasoning. Articial Intelligence, 102(2): 249293.
SMYTH, B., and E. MCKENNA. 1999. Footprint based retrieval. In Proceedings of the Third International Confer-
ence on Case-based Reasoning. Springer: Berlin, pp. 343357.
SMYTH, B., and E. MCKENNA. 2001. Competence guided incremental footprint-based retrieval. Knowledge-Based
Systems, 14(34):155161.
SOKOLSKY, O., S. KANNAN, and I. LEE. 2006. Simulation-based graph similarity. In Tools and Algorithms for the
Construction and Analysis of Systems, Vol. 3920 of Lecture Notes in Computer Science. Springer: Berlin/
Heidelberg, pp. 426440.
SOWA, J. F. 2000. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks
Cole: Pacic Grove, CA.
STANFILL, C., and D. WALTZ. 1986. Toward memory-based reasoning. Communications of the ACM,
29(12):12131228.
SUN, Z., and G. R. FINNIE. 2004. Intelligent Techniques in E-Commerce: A Case Based Reasoning Perspective.
Springer: Berlin.
THAGARD, P., K.J. HOLYOAK, G. NELSON, and D. GOCHFELD. 1990. Analog retrieval by constraint satisfaction.
Articial Intelligence, 46(12):259310.
THNET. 2012. THNET: THinkNet connectionist software. Available at http://www.cs.cmu.edu/afs/cs/project/
ai-repository/ai/areas/neural/systems/thnet/0.html. Accessed February 2012.
THORPE, S. 2003. Localized versus distributed representations. In The Handbook of Brain Theory and Neural
Networks. Edited by M. A. Arbib. The MIT Press: Cambridge, MA, pp. 643646.
TINKER, P., J. FOX, C. GREEN, D. ROME, K. CASEY, and C. FURMANSKI. 2005. Analogical and case-based
reasoning for predicting satellite task schedulability. In Proceedings of ICCBR 2005, Vol. 3620 of Lecture
Notes in Computer Science. Springer: Berlin/Heidelberg, pp. 566578.
WATSON, I. 1997. Applying Case-Based Reasoning: Techniques for Enterprise Systems. Morgan Kaufmann: San
Francisco, CA.
WESS, S., K.-D. ALTHOFF, and G. DERWAND. 1993. Using K-D trees to improve the retrieval step in case-based
reasoning. In Proceedings of the First European Workshop on Case-Based Reasoning. Springer: Berlin,
pp. 167181.
WETZEL, J., and K. FORBUS. 2009. Automated critique of sketched mechanisms. In Proceedings of the 21st
Innovative Applications of Articial Intelligence Conference, Pasadena, CA.
WHARTON, C., K. HOLYOAK, P. DOWNING, T. LANGE, T. WICKENS, and E. MELZ. 1994. Below the surface:
Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26(1):64101.
WILSON, R. C., E. R. HANCOCK, and B. LUO. 2005. Pattern vectors fromalgebraic graph theory. IEEETransactions
on Pattern Analysis and Machine Intelligence, 27(7):11121124.
XML database. 2012. Available at http://en.wikipedia.org/wiki/XML_database. Accessed February 7, 2012.