Similarity-Based Retrieval With Structure-Sensitive Sparse Binary Distributed Representations

Computational Intelligence, Volume 28, Number 1, 2012
SIMILARITY-BASED RETRIEVAL WITH STRUCTURE-SENSITIVE

SPARSE BINARY DISTRIBUTED REPRESENTATIONS
DMITRI A. RACHKOVSKIJ AND SERGE V. SLIPCHENKO
Department of Neural Information Processing Technologies, International Research and Training Center for
Information Technologies and Systems, Kiev, Ukraine
We present an approach to similarity-based retrieval from knowledge bases that takes into account both the
structure and semantics of knowledge base fragments. Those fragments, or analogues, are represented as sparse
binary vectors that allow a computationally efcient estimation of structural and semantic similarity by the vector
dot product. We present the representation scheme and experimental results for the knowledge base that was
previously used for testing of leading analogical retrieval models MAC/FAC and ARCS. The experiments show that
the proposed single-stage approach provides results compatible with or better than the results of two-stage models
MAC/FAC and ARCS in terms of recall and precision. We argue that the proposed representation scheme is useful
for large-scale knowledge bases and free-structured database applications.
Key words: similarity based retrieval, analogical access, binary distributed representation, codevector,
case-based reasoning, APNN.
1. INTRODUCTION
Growth of the volume, complexity, and diversity of information available in electronic
form requires ever more effective and intelligent technologies for information organization,
access, and processing. The majority of information produced or processed by humans or
business processes are free-formor structured texts. However, traditional relational databases
work with data of strictly specied format, which hampers their adaptation to changes in the
structure of dynamically evolving heterogeneous information.
The desire to use best of the both worlds of structured and unstructured information led to
the specication of XML language and databases (Chaudhri, Rashid, and Zicari 2003; XML
database 2012). However, the issues of effective and efcient acquisition and exploitation
of XML information remain open. Related methods developed for knowledge management
systems require an explicit representation of knowledge in the appropriate form (e.g., KIF,
CycL and their XML counterparts RDF, DAML, and OWL, etc.) and demand a signicant
investment of human efforts.
On the other hand, ever increasing volumes of data and knowledge bases provide more
information for problem solving, decision making, action planning, and other intelligent
activities. A promising approach for using this information to solve a wide range of in-
formation management problems is case-based and analogy-based reasoning. This generic
approach is also known as memory-based reasoning (Stanll and Waltz 1986; Kokinov 1988)
and example-based reasoning (Rissland 1983). It does not require an explicit construction
of complex domain models and is based on the fact that people widely use past solutions
of similar problems (or reasoning by known examples) to solve new problems (or making
decisions) when facing incomplete, inaccurate, contradictory input information (e.g., Schank
1982; Gentner 1983).
1.1. Case-Based Reasoning and Analogy-Based Reasoning
The main part of an example-based reasoning system is a database of examples that
stores descriptions of problems with their solutions, situations with the predictions of their
Address correspondence to Dmitri A. Rachkovskij, Department of Neural Information Processing Technologies, Interna-
tional Research and Training Center for Information Technologies and Systems, Prospect Glushkova 40, Kiev, 03680, Ukraine;
e-mail: dar@infrm.kiev.ua
C _
2012 Wiley Periodicals Inc.
SIMILARITY-BASED RETRIEVAL 107
development, etc.i.e., examples of prior experience that are usually called cases, precedents
or analogues. For a new input situation, the system retrieves from the knowledge base one
or several similar situations and makes decisions, predictions or inferences about the input
situation by adapting knowledge from the existing examples.
Example-based reasoning involves the following steps (Aamodt and Plaza 1994; Kokinov
and French 2003):
(1) Representation: Obtaining a description of the target problem or situation and repre-
senting it in an internal format for processing. For example, internal representations
may be produced by extraction of useful features or some other transformation of the
initial descriptions to increase system performance.
(2) Retrieval: Finding in the database of examples the most appropriate source case(s) or
analogue(s) to the given input (target, or probe). For example, those can be previous
problems stored with their known solutions that can be used to solve the input target
problem.
(3) Usage: An attempt to reuse the solution from the retrieved case, possibly after adapting
it to account for differences in problem descriptions. Usually this complex step is sub-
divided into ne-grained (sub)steps, including that of mapping (matching, i.e., nding
corresponding elements of source and target analogues), and others, such as subsequent
learning (modication of the case base).
The key task in example-based reasoning is retrieval (also known as access, recall, or
search). The reasons include the following:
(1) Retrieval should select one of fewcandidates for future processing. It requires some form
of analyzing of the whole example database. So it is assumed to be a computationally
expensive step, especially for large databases, complex cases and representations, and
complex estimation of case pertinence.
(2) Retrieval results may already be the output of the system, because the further usage may
be accomplished by humans, as exemplied by Internet search, information retrieval in
document repositories, or case-based reasoning where human manually adapt the found
case to the target one.
(3) The quality of retrieval resultsthat is, the adequacy or relevance of examples retrieved
from the baseinuences the further usage steps, whether manual or automatic.
The efciency of the retrieval step and therefore the overall performance of the example-
based reasoning system is basically dominated by the representation step because of the
following reasons:
(1) The initial description of target problems or situations should contain enough informa-
tion for pertinent retrieval. This commonly should include both signicant semantic
and structural information (see Sections 1.2 and 1.3).
(2) The internal format of the system affects both computational efciency and quality of
retrieval.
(3) The transformation process of the initial descriptions to the internal system format
affects computational efciency, mainly at the step of representing all database cases.
The case-based reasoning instantiation of the example-based reasoning methodology
has been useful in a wide range of applications and domains (e.g., Watson 1997; Aha 1998;
Bergmann 2002; Sun and Finnie 2004; Bergman et al. 2009). However, case-based reasoning
108 COMPUTATIONAL INTELLIGENCE
is rather application-specic (Lopez De Mantaras et al. 2005): it requires a time-consuming
manual adjustment of a system to perform a specic task; knowledge of content theories that
reect the knowledge required for particular task domains; features specic to subject areas.
In addition,in recent years there is a growing awareness of the importance of structure
in the area of example-based reasoning as well as in other related areas such as databases,
knowledge-bases, web, etc. It is reected in the appearance of example bases in the form
of databases and knowledge bases, including those in XML format. Also, simplication of
system development requires universal mechanisms for retrieval and processing of example
bases, rather than those specic for a particular system objective and a particular example
base.
This attracts more attention to analogy-based reasoning, which is focused on taking
structure into account (e.g., in the form of hierarchical systems of relations, see Section 1.2),
and models broadly general cognitive processes allowing for a more universal processing.
However, its methods of working with structures are computationally expensive and poorly
take semantics into account (see also Section 1.2 and 1.3). Also, analogy-based reasoning
has not yet reached the maturity of case-based reasoning usage, although moving in this
direction (e.g., Forbus 2001; Bjornestad 2003; Markman et al. 2003; Tinker et al. 2005;
Forbus and Hinrichs 2006; Forbus, Lockwood, and Sharma 2009; Wetzel and Forbus 2009).
Let us consider the step of example retrieval and its challenges.
1.2. Representation of Examples and Similarity-Based Retrieval
A prominent role in the retrieval task is played by similarity-based retrieval. It should
be noted that other criteria for retrieval are also considered important (Lopez De Mantaras
et al. 2005), such as how effectively the solution space is covered by the retrieved cases
(McSherry 2003), how easily their solutions can be adapted to solve the target problem
(Smyth and Keane 1998), or explained (Doyle et al. 2004). Nevertheless, those criteria are
usually complementary and used in combination with similarity criteria.
In some applications of example-based reasoning, it may be adequate to assess similarity
of the stored cases in terms of their surface features. The surface features of a case are typically
represented using attribute-value pairs, which are provided as part of its description. It is a
common practice in case-based reasoning to represent a case as a simple feature vector that
may be considered as an instance of the surface feature vector. This evidently largely ignores
structure processing.
In yet other applications, cases are represented by complex structures such as graphs
and retrieval requires an assessment of their structural similarity. Similarity assessment of
such complex nested structures as analogues or XML database records is computationally
expensive. However, the advantage is that more relevant cases may be retrieved. This is
usually the case for analogy-based reasoning, because processing of analogies is heavily
based on manipulations with structured information.
Structural similarity of analogues reects how the elements of analogues (i.e., enti-
ties, relations, and substructures of various hierarchical levels) are arranged with respect to
each other. It is based on the notion of structural consistency (Falkenhainer et al. 1989;
Gentner and Markman 2003) or isomorphism (Thagard et al. 1990; Hummel and Holyoak
1997; Eliasmith and Thagard 2001). Analogues are also matched by the surface or su-
percial similarity (Gentner 1983; Forbus, Gentner, and Law 1995) based on common
analogue elements or a broader semantic similarity (Thagard et al. 1990; Hummel and
Holyoak 1997; Eliasmith and Thagard 2001) based on, e.g., a joint membership in a tax-
onomic category (Fellbaum 1998; Sowa 2000), similarity of characteristic feature vectors
(Forbus et al. 1995), or context (co-occurrence) vectors (Gallant 2000; Ramscar and Yarlett
2003; Sahlgren, Holst, and Kanerva 2008).
Experiments based on human assessment of similarities and analogies have conrmed
that both surface and structural similarity are necessary for sound retrieval (Forbus et al.
1995). Although structural similarity in analogical retrieval is considered less important than
in mapping, the models of retrieval, which take into account only surface similarity, are
considered inadequate.
The problems of similarity-based retrieval are common to general problems of analogue
retrieval mentioned above and have the goal to improve retrieval performance in terms of
nding the base fragments similar to the target one. If we take the original description of
episodes as given, then the problems reduce to the efciency of similarity estimation in
terms of similarity measure quality and computational efciency. Computational efciency
and quality, in turn, are determined by the adequacy of original descriptions and their
similarity measures, those of internal representations, and example base organization.
1.2.1. Example Descriptions and Similarity Measures. As mentioned above, case-
based reasoning cases are usually initially described as feature vectors and may be repre-
sented as attribute-feature pairs. Measures of vector (dis)similaritysuch as distance or dot
producthave a near linear computational complexity in terms of vector dimensionality, so
they are acceptable even for large bases.
Most often, analogues in analogy-based reasoning are originally represented as directed
acyclic graphs (Frasconi, Gori, and Sperduti 1998). They include labeled graphs (with non-
unique node labels; Champin and Solnon 2003). So, it is tempting to use graph-theoretic
measures of graph similarity (Bunke 2000a, b; Schenker et al. 2005), such as
(1) subgraph isomorphism (Messmer and Bunke 1998; Bunke, Jiang, and Kandel 2000;
Medasani, Krishnapuram, and Choi 2001);
(2) maximum common subgraph (Levi 1972; McGregor 1982; Borner 1993; Bunke 1997;
Messmer and Bunke 1998);
(3) minimum common supergraph (Bunke et al. 2000; Fern andez and Valiente 2001);
(4) graph edit distances (Bunke and Messmer 1993; Bunke 1997, 1999).
Regretfully, those methods typically rely on solving NP-complete problems or have large
potential for combinatorial explosion (Schenker et al. 2005). So because of computational
intractability they are practically inapplicable to large graphs, and even more so to large
databases of such graphs. (Note, however, approximate approaches that provide good exper-
imental results on some graph collections, e.g., Pelillo 1999.) On the other hand, there is
evidence that even fully isomorphic graphs often do not meet the requirements that humans
apply to analogical similarity (Markman and Gentner 2000). This means that supercial,
semantic similarity is also important.
Intractability and inadequacy of graph-theoretical models and similarity measures for
structured examples are dealt with by various measures and models of similarity estimation.
To achieve this, the original descriptions are transformed and/or augmented with the internal
representations or models of operation.
Researchers of analogy-based reasoning proposed a number of heuristic models of
similarity-based analogical retrieval. The most inuential of them are still MAC/FAC that
operates with symbolic structures (Forbus et al. 1995) and ARCS using localist neural
network structures (Thagard et al. 1990), see also Section 1.3. Another approach stems
from graph theory. It uses approximate suboptimal graph matching (Schenker et al. 2005;
Riesen et al. 2009) or specic type of graphs (Jiang and Bunke 1999; Sokolsky, Kannan,
and Lee 2006). Those methods of similarity estimation have polynomial computational
complexity. However, polynomial complexity for large analogues and example bases is also
not acceptable, at least for the polynomial power value exceeding 23.
1.2.2. Vector Representations of Examples. An alternative approach is to overcome
the shortcomings of high computational complexity and poor accounting of semantics by
embedding or projecting graphs into a space with a computationally simple measures of
similarity. As mentioned above, this internal representation may be a vector space that allows
a computationally efcient retrieval of examples in case-based reasoning. However, here
vectors should reect not only semantics, but also structure.
Recently, some approaches to graph embeddings in vector spaces have been introduced
in graph theory (Wilson, Hancock, and Luo 2005; Riesen, Neuhaus, and Bunke 2007;
Robles-Kelly and Hancock 2007). Even before that, some interesting embeddings have been
proposed in the research area of distributed representations (Thorpe 2003) that stemmed
fromdistributed or holographic ideas for information representation in the brain, as opposing
to localist representations (Page 2000). In localist representations, each item (such as
a surface feature, object, relation, etc.) is represented by some node, symbol, or vector
component. So, each vector component has some meaning. In distributed representations,
each information item (feature, object, relation, scene, etc.) is represented by a number of
vector components, and each vector component may belong to representations of different
(many) items (Thorpe 2003). So, here semantics of individual components of representation
vector is undened, in distinction to usual (localist) vector representation of surface features.
Distributed representations provide a high information capacity (e.g., if single item is
represented by M components of N dimensional representational vector, the number of
representable items for distributed representations is N choose M as opposed to N/M
for localist representations); allow for an efcient (dis)similarity measures (dot products or
distances); can be processed by well-developed methods for vector information processing
(e.g., support vector machines); provide a rich semantic basis by allowing feature-based
representations (and even similarity of features).
It was considered that the main drawback of distributed representations is inability to
represent structure (see, e.g., Rachkovskij and Kussul 2001 and references therein). However,
recently the schemes for structure-sensitive distributed representations as vectors of various
formats have appeared, where binding operations are used for structure representation. Holo-
graphic Reduced Representations (HRRs) of Plate (1991, 2003) use real-valued vectors and
circular convolution for binding. Kanervas (1996) Binary Spatter Codes use Boolean vectors
and componentwise exclusive-or. Gaylers (1998) Multiply, Add, Permute coding uses vectors
with components from {1, 1} and componentwise multiplication. Associative-Projective
Neural Networks (APNNs) use sparse binary vectors {0,1} and special context dependent
thinning operation for binding (see Section 2.2.1; Kussul 1992; Rachkovskij and Kussul
2001). Estimation of vector similarity is linear with the vector dimensionality and therefore
is computationally acceptable. So the question is how adequately distributed representations
represent structural similarity in combination with semantic similarity of examples.
Once internal representation is selected, records (episodes, fragments) of the example
base are converted to it. Then standard means for effective access to databases can be used
to improve retrieval performance. For example, example-based reasoning applied various
indexing schemes or efcient computer structures such as k-d trees (Wess, Althoff, and
Derwand 1993), some kinds of clustering and staged access (a two-stage retrieval approach)
through cluster-representative prototypes (Smyth and McKenna 1999, 2001), the use of
massively parallel computers (Stanll and Waltz 1986).
1.3. Computational Models and Representations for Analogical Retrieval
Implementation and efciency of the operations on structures that are used for repre-
sentation of analogues (such as estimation of similarity, comparison, nding corresponding
elements, traversing, etc.) required by example-based reasoning depend essentially upon
the scheme for structure representation employed. Because, till recently, representation and
processing of structures were achievable only with hierarchical symbolic or localist neural
network representations, it is natural that those representations are used in the most inuential
computational models of analogical reasoning: ARCS-ACME (Holyoak and Thagard 1989;
Thagard et al. 1990), MAC/FAC-SME (Falkenhainer, Forbus, and Gentner 1989; Forbus
et al. 1995), IAM (Keane, Ledgeway, and Duff 1994), and Copycat (Hofstadter and Mitchell
1988).
At the step of retrieval, a computationally expensive structure matching with symbolic
and localist representations makes prohibitive a structure-sensitive comparison of the input
(probe) to each of many potential analogues stored in memory. This leads to a common
strategy to introduce a two-stage process for retrieving analogues, as in MAC/FAC (Forbus
et al. 1995) and ARCS (Thagard et al. 1990). At the rst stage, a computationally cheap
process operating with vector representations is used to select the candidates on the basis of
surface similarity only. The second stage works with hierarchical structure representations
that take structure into account. This requires complex computations to estimate all aspects
of similarity between the probe and the candidates, similar to those used in mapping with
SME (Falkenhainer et al. 1989) and ACME (Thagard et al. 1990).
However, the quest to enhance the semantic basis of representations, scaling, and degree
of neural relevancy led to the attempts to augment the models of analogy with some share
of vector representations, e.g., in LISA (Hummel and Holyoak 1997), STAR2 (Gray et al.
1997), and DRAMA (Eliasmith and Thagard 2001), see also (Browne and Sun 2001; Gayler
and Levy 2009; Kanerva 2009) and discussion therein.
Emerged structure-sensitive distributed representations mentioned in Section 1.2.2 have
been applied to modeling of analogical retrieval by Plate (see, e.g., Plate 1994a, b, 2000,
2003 and references therein). In structure-sensitive distributed representations, both the set of
structure elements and their arrangement inuence their similarity, so that similar structures
produce similar distributed representations. Plate suggested that it could be possible to
construct structure-sensitive distributed representations of analogues whose vector similarity
pattern corresponds to the experimental results observed for people (Ross 1989; Wharton
et al. 1994; Forbus et al. 1995, see also Section 4.1.1) and modeling results reported for
MAC/FAC and ARCS. However, because the vector similarity measures are computationally
not expensive, the two-stage comparison (retrieval) process of those models is not needed
for the structure-sensitive distributed representations. Plate has proposed such representation
schemes for analogical episodes and tested them on analogical retrieval tasks using HRRs
(Plate 1994a, b, 2000, 2003). He has shown that the results obtained by a single-stage
vector similarity estimation process for HRRs are consistent with the empirical results of
psychologists and the mentioned leading models of analogical retrieval (where different
episodes, but of the same similarity types, were used). Similar results were also reported for
the APNN distributed representations using Plates episodes in Rachkovskij (2001).
1.4. Contributions and Paper Organization
Previous attempts to model retrieval of similar analogues with structure-sensitive dis-
tributed representations were fragmentary and mainly demonstrated proof of concept.
Comparison with the leading models of analogical retrieval was made using small fragments
of the knowledge bases and experimental schemes of those models. In this paper we describe
a particular scheme for distributed representations of analogues and apply it to similarity-
based analogical retrieval. Using the knowledge base that was previously used for testing the
most advanced model of analogical reasoning, we provide comparison of results.
In Section 2, we describe the Sparse Binary Distributed Representation (SBDR) scheme
for analogues. The usage of SBDRfor retrieval of similar analogues is considered is Section 3.
The setup and the results of experimental investigations are given in Section 4. Conclusions
are provided in Section 5.
2. STRUCTURED EXAMPLES AND THEIR DISTRIBUTED
REPRESENTATIONS
2.1. Structured Descriptions of Examples
In the leading theories and models of analogical reasoning (Gentner 1983; Holyoak and
Thagard 1989; Thagard et al. 1990; Hummel and Holyoak 1997; Gentner and Markman 2003;
Plate 2003) analogues are considered as hierarchically structured descriptions of situations,
models, and domains. Descriptions include entities, attributes and relationships. Objects
are entities of subject domains (for example, the Sun, Planet, etc.). Attributes describe the
properties of objects (e.g., mass, temperature, etc.). Relations determine the relationships
between the elements of analogues (e.g., attracts, more, cause, etc.). Arguments of relations
may be objects, attributes and other relations. Attributes may be considered as relations with
single argument, so thereafter we will mainly speak about relations.
Structured descriptions of analogues can also be seen as directed ordered acyclic graphs
(Frasconi et al. 1998). Consider the graph of analogue G(V, E, <, T). Here V is the set of
vertices; it includes the set of objects (entities), attributes (attribute instances), and relations
(relational instances). E is the set of edges dening directed connections from attribute and
relation instances to their arguments. < species the sequential order of arguments. T is
the set of vertex labels (object, attribute, and relation names) and edge labels (role names).
Dening both sequential order of arguments and edge labels may be redundant, because
the edge label usually denes the role of the child vertex (agent, recipient, etc.). The role
itself can also be inferred by the position of output edges relative to each other in the graph
picture. So we will usually omit edge labels and consider the sequential order of edges and
corresponding arguments as enumerated from left to right, as shown in Figure 1 for relations
GREATER and CAUSE.
In Figures 13, various representations of the analog episode describing the Solar system
(from the well-known Rutherford Atom-Solar System analogy) are given. Figure 1 shows
graph sketch description (adapted from Falkenhainer et al. 1989); Figure 2 shows bracketed
notation (Forbus et al. 1995); Figure 3 shows the SBDR representation proposed in this
paper.
2.2. Binary Sparse Distributed Representations of Analogues
Let us consider the SBDR representations. Each item x (element of analogueattribute,
object, and relation) is represented by the codevector X (x X). A codevector is a form of
vector representation that is binary (X {0,1}
N
) and sparse (the fraction of nonzero vector
components M of codevector X with dimensionality N is small: M/N _1/2). Similar items
(in context of the application problem) should have similar codevectors, whereas items with
undened similarity should have dissimilar codevectors.
2
1
2
1
2 1
TEMP
PLANET SUN
MASS
GRAVITY
ATTRACT
CAUSE
GREATER
AND
REVOLVE
CAUSE
order 1 TEMP
GREATER
MASS
order 2
order 4
order 3
order 0
FIGURE 1. Graph sketch description of the Solar System analogue.

(CAUSE
(GRAVITY (MASS SUN) (MASS PLANET))
(ATTRACTS SUN PLANET) )
(GREATER ( TEMPERATURE SUN)
( TEMPERATURE PLANET) )
(CAUSE
(AND (GREATER (MASS SUN)
(MASS PLANET) )
(ATTRACTS SUN PLANET) )
(REVOLVE-AROUND PLANET SUN) )
FIGURE 2. The Solar System analogue in bracketed notation.
SOLAR_SYSTEM =
CAUSE_1 GRAVITY_1 MASS SUN GRAVITY_2 MASS PLANET
CAUSE_2 ATTRACTS_1 SUN ATTRACTS_2 PLANET
GREATER_1 TEMPERATURE SUN
GREATER_2 TEMPERATURE PLANET
CAUSE_1
AND GREATER_1 MASS SUN GREATER_2 MASS PLANET
AND ATTRACTS_1 SUN ATTRACTS_2 PLANET
CAUSE_2
REVOLVE-AROUND_1 PLANET REVOLVE-AROUND_2 SUN
FIGURE 3. The SBDR distributed representation of the Solar System analogue.
Codevector similarity is dened based on dot product, which for binary codevectors, is
equal to the number of 1s in common, that is, overlap of X, Y:
i =1,N
X
i
Y
i
= [X Y[ (1)
where X
i
is a component of X, is componentwise conjunction, [Z[ is the number of nonzero
components in Z.
A set of items is represented by componentwise disjunction of their codevectors
(Rachkovskij and Kussul 2001). However, when the set of sets is produced by componen-
twise disjunction of set codevectors, the information about original sets is lost, resulting in
superposition catastrophe (see, e.g., Rachkovskij and Kussul 2001 and references therein).
So, binding operation is needed to preserve information about item grouping in hierarchical
structures, as well as about the sequential order of items. Also, componentwise disjunc-
tion results in a vector that has more 1s than each input vector. So, to preserve sparseness,
normalization of the number of 1s is needed.
Let us provide a short description of one version of the context dependent thinning
(CDT) procedure that is used for binding and normalization in APNNs (see Rachkovskij and
Kussul 2001 for extended description and discussion).
2.2.1. Binding by Context Dependent Thinning. First, the codevector Z is formed by
disjunction of element codevectors X
i
to be bound:
Z =
i
X
i
. (2)
Then, the result Z) of binding is formed as
Z) =
k=1,K
(Z Z
(k)) = Z
k=1,K
Z
(k). (3)
Here Z
(k) is Zwith the permuted components. For each k, randomindependent permutation

is used, xed for this k (there can be versions with single xed permutation iteratively applied
to Z
(Kussul et al. 2006).

This procedure may be implemented as follows. At each step k, a different permutation
of the input codevector Z is obtained and conjuncted with Z. The resulting codevector is
disjuncted with Z) (that is initially empty). So, the nal number of 1s in Z) is controlled
by the number K of such steps.
The same dimensionality of Z) and X
i
allows for a recursive application of binding
by CDT. Similarity of Z) and X
i
allows retrieval of X
i
by Z). Subset of 1s of each
element codevector X
i
preserved in Z) depends on Z and therefore on each and all X
i
,
thus preserving information on the particular subset of elements that produced it, and so
providing binding property. Binding procedure may be considered as a functional analogue
of grouping brackets used to represent a set of elements in symbolic notation: (a, b, c . . . )
A, B, C. . . ), where (a, b, c . . . ) are the group elements, A, B, C. . . are their codevectors,
. . . ) is the binding procedure.
On the basis of the CDT procedures, methods for representing various types of hier-
archical relational structures as codevectors have been proposed (Rachkovskij 2001, 2004;
Rachkovskij and Kussul 2001) that preserve information on grouping and sequence of ob-
jects and relations. Structures with similar objects and relations produce similar codevectors.
Greater similarity of relations and their arguments leads to the greater similarity of the re-
sulting codevectors. This provides the basis for computationally efcient and qualitatively
novel methods for similarity-based processing of structured information fromknowledge and
databases that takes into account both structure and semantics of knowledge. Let us consider
codevector representations of knowledge fragments that will be used in the experiments on
similarity-based retrieval of analogues later.
2.2.2. Codevectors of Analogues. Let the order value of vertex u (that is, structure
element u) of graph G be the natural number calculated as follows:
(1) objects have order zero;
(2) the order of a relation is 1 plus the maximal order of its arguments.
The elements order determines the height of its hierarchical structure. This is an
analogue of relational order in Forbus et al. (1995) and level of codevector hierarchy in
Rachkovskij (2001).
Let us dene top-level elements as vertices without input edges. In Figure 1 these
elements are: GREATER, CAUSE, and CAUSE. Codevector of an analogue will be formed
by componentwise disjunction of the codevectors of its top-level elements.
Codevectors of top-level vertices V
are recursively constructed from the codevectors

of their elements, starting from the terminal codevectors of objects. Here we use the role-
ller representation of relations (Rachkovskij 2001; Rachkovskij and Kussul 2001), so the
resulting codevector is
V
v
=
(v,u)E
V
(v,u)
V
u
), (4)
where (v,u) E denotes all edges from the parent vertex (relation) v to its child vertices
(arguments) u; V
(v,u)
is the codevector of the particular edge (v, u) corresponding to the role
the vertex u takes in the relation v; V
u
is the codevector of the element u that is the argument
of the relation v; ) denotes the CDT operation.
For example, attracts(sun, planet) is represented as attracts_1sun)attracts_2
planet). The resulting codevector Solar system of the analogue describing the Solar system
obtained using this representation scheme is shown in Figure 3.
As mentioned above, CDT allows controlling the number of 1s in codevectors. If we use
a xed number M of 1s in codevectors representing vertices (subgraphs) of any order (as in
Rachkovskij 2001; Rachkovskij and Kussul 2001), then deep substructures will inuence
similarity to the same degree as shallow ones. On the other hand, systematicity principle
of analogical reasoning (Gentner 1983) says that connected systems of relations in deep
hierarchies are preferred to isolated relations. To take this into account, codevectors provide
natural meanscontrolling the number of 1s. For an adequate account of systematicity, let us
construct codevectors so that deeper structures have more 1s than their element codevectors:
[V
[ > [V
u
[ if u is the child (argument, element) of v (or v is the parent of u). This is
achieved by controlling the thinning factor F
CDT
dened as F
CDT
= [Z)[/[Z[. Obviously,
F
CDT
1. For F
CDT
F
CDT
the higher level codevectors have the desired property of the
increased number of 1s, where F
CDT
can be obtained experimentally or analytically for
specic relational instances. Table 1 illustrates the growth of an average number M of 1s in
codevectors of the higher levels of the Solar System analogue produced with the proposed
representation scheme and particular parameter instantiations. Note that the codevectors
here have more hierarchical levels than order levels identied for the analogue graph vertices
above.
3. RETRIEVAL OF SIMILAR ANALOGUES
To retrieve the most similar analogues of the knowledge base, all its analogues (fragments,
episodes) are rst represented as codevectors. The probe (input, target) analogue is also
represented by its codevector. Then, the most similar base analogue(s) are those having
codevectors with the maximal overlap (dot product) value with the probe codevector.
In particular,
(1) codevectors V
b
, b B of all base B analogues are constructed;
(2) codevector V
in
of the input (probe) analogue is constructed;
(3) overlaps of codevectors [V
b
V
in
[ for all base analogues b and the probe are calculated;
TABLE 1. An Average Number M of 1s in Codevectors of the Solar System Analogue Elements versus
Their Vertex Order and Codevector Levels. The Codevector Dimensionality N = 100,000. For the Terminal
Codevectors, M = 1,000 for Objects and Attributes, M = 2,000 for Roles. The Thinning Factor F
CDT
0.33.
(4) base analogues most similar to the probe analogue are output.
For single most similar analogue it may be summarized as
b
/
= argmax
bB
[V
b
V
in
[. (5)
The most similar analogues may also be selected as top L
of most similar analogues, or

by a particular value of similarity:
(1) inside some similarity value interval t around the analogue b
/
most similar to the probe
(Forbus et al. 1995):
{b: ([V
b
/ V
in
[ [V
b
V
in
[) /[V
b
V
in
[ < t ];
(2) or, with the similarity value more than the threshold :
{b: [V
b
V
in
[ > ].
3.1. Computational Complexity of Retrieval
Computational complexity of analogue retrieval by similarity estimation of the knowl-
edge base codevectors is determined by the particular representation and similarity estimation
methods and their implementations.
The binary and sparse nature of codevectors suggests using indexing as one of the
straightforward approaches for an efcient calculation of similarity and retrieval. Let us
estimate its complexity. Indexing N-dimensional codevectors is made by constructing a
set of N strings, so that the ith string a
i
(i = 1, N) contains the numbers (IDs) of the
base codevectors that have nonzero ith component. Index of L codevectors with M nonzero
components in each requires O(L M) memory cells.
The similarity values of the probe codevector to the base codevectors are the components
of the L-dimensional similarity vector. They are calculated by taking each nonzero compo-
nent of the probe codevector and adding 1 to the components of the similarity vector that
correspond to the numbers in the index string corresponding to that component. Computa-
tional complexity is on the average O(M, L
/
), where L
/
is the average number of knowledge
base entries whose codevectors has 1 in the particular component. If the codevectors are
considered random, the complexity becomes O(LM
2
/N), and often codevector parameters
are chosen so that M
2
/N is of the order of 1.
Here we consider M as the average number of 1s in codevectors of the most complex
(deep) analogues of the knowledge base. As Table 1 shows, for the Solar System analogue
example its M is about 2.5 times more than that of its terminal object codevectors. Anyway,
maximal M or M
2
/N is approximately constant for codevectors generated for the particular
knowledge base.
Considering L
/
, though it depends on the peculiarities of the particular knowledge base
and particular probe codevector, it also generally grows with the size of knowledge base L
for xed codevector parameters.
Obtaining a sorted list subset of L
maximal values of the similarity vector has complexity

of O(L log L
), or O(L1 log L
) if the similarity vector has L1 nonzero components. If we

need not sort the similarity list, as in the experiments below, where all analogues inside 10%
similarity interval from the most similar are output, the complexity is O(L1). Let us also take
L1 L. So, the overall complexity for SBDR is O(M L
/
) O(L).
Now, let us consider and compare the complexity for known leading models of analogical
retrievalMAC/FAC and ARCS.
For MAC, feature vectors for each analogue of the base are obtained. Unlike codevectors,
their components are the frequencies of objects, attributes, and relations of the analogue.
Again, the index of those vectors is obtained for the whole knowledge base. Let the probe
MAC feature vector have R nonzero components. So, nding similarity of the probe vector
to all feature vectors of the whole knowledge base has the complexity of O(RL
//
), where L
//
is the average number of the knowledge base entries in which a feature is present. So, the
overall MAC complexity is O(RL
//
) O(L).
For FAC, because it is based on SME that has the complexity of O(n
2
) (and worst case
performance O(n!) as stated by Falkenhainer et al. 1989), the complexity is close to O(n
2
L
),
where n is the number of analogue elements, L
is the number of analogues output by MAC.

So, the overall MAC/FAC complexity is O(RL
//
) O(L) O(n
2
L
).
The ARCS complexity is usually dominated by the second (computationally expensive)
stage with the computational complexity O(L
2
n
4
) as estimated by Thagard et al. (1990).
Because the ARCS complexity is worse that that of MAC/FAC, let us compare SBDR and
MAC/FAC.
TABLE 2. Estimations of the Retrieval Computational Complexity for SBDR and MAC/FAC.
Example Think Think Medium Medium Medium Large Large Large
knowledge bases Net Net KB1 KB2 KB3 KB4 KB5 KB6
L 10
2
10
2
10
5
10
5
10
5
10
9
10
9
10
9
n 10
2
10
2
10 10
2
10
4
10 10
2
10
4
L
10 10 10
4
10
4
10
3
10
8
10
8
10
8
R 10 10 10 10 10
4
10 10 10
3
M 10
2
10
2
10
3
10
3
10
4
10
3
10
3
10
5
L
/
10 10 10
4
10
4
10
3
10
8
10
7
10
8
L
//
10 10 10
4
10
4
10
3
10
8
10
7
10
8
SBDR1 O(ML
/
) 10
3
10
4
10
7
10
7
10
7
10
11
10
10
10
13
SBDR2 O(L) 10
3
10
3
10
5
10
5
10
5
10
9
10
9
10
9
MAC1 O(RL
//
) 10
2
10
2
10
5
10
5
10
7
10
9
10
8
10
11
MAC2 O(L) 10
2
10
3
10
5
10
5
10
5
10
9
10
9
10
9
FAC O(n
2
L
) 10
5
10
5
10
6
10
9
10
11
10
10
10
12
10
16
SBDR total 10
3
10
4
10
7
10
7
10
7
10
11
10
10
10
13
MAC/FAC total 10
5
10
5
10
6
10
9
10
11
10
10
10
12
10
16
Note: Italic shows the results dominating particular stage; bold face shows the best overall results.
To explore scaling of the models, we consider some knowledge base examples.
They include the real knowledge base ThinkNet used in the experiments later, as well
as possible parameters of larger, but unreal bases. The results of the computational
complexity estimations are given in Table 2. They show that for the real and simu-
lated knowledge bases examples, the computational complexity of SBDR is 0.00110 of
MAC/FAC.
Note, that the SBDR complexity in the Table 2 is always dominated by the SBDR1
stage, whereas the MAC/FAC complexity is dominated by the FAC stage. The main dif-
ference is that n is explicitly involved in MAC/FAC and is not involved in SBDR. In
SBDR, n was involved at the encoding step, when codevectors for all base analogues were
formed.
For the knowledge bases where all episodes include the same features with the same
corresponding frequencies but with different structure, MAC should return all L episodes,
so that FAC should work with the whole base, and its complexity becomes O(n
2
L). For the
bases where some groups of episodes have this property, MAC should return all the episodes
of the group.
Another major factor is the number of nonzero components R in the MACfeature vectors
and M in the SBDR codevectors, as well as L
/
and L
//
. For the considered examples, the
MAC complexity is always less than the SBDR complexity. So, the MAC/SBDR hybrid may
seem benecial for particular use cases.
The similarity estimation procedures can be naturally parallelizede.g., as im-
plemented in search engines. Also, associative memory implementations are possi-
ble (Frolov, Rachkovskij, and Husek 2002), as well as implementations using hash-
ing that provide even lower computational complexity (Frolov, Rachkovskij, and Husek
2006).
Using similarity clustering of analogues as preprocessing of the base could decrease
retrieval complexity compared to indexing, both for the codevector and for the graph
representations.
TABLE 3. Types of Analogue Similarity.
Similarity Common 1st-order Common high- Common object
type relations order relations attributes Examples using animal stories
Base dog(Spot); human(Jane);
cause(bite(Spot, Jane), ee(Jane,
Spot))
LS dog(Fido); human(John);
cause(bite(Fido, John), ee(John,
Fido))
SF dog(Fido); human(John);
cause(ee(John, Fido), bite(Fido,
John))
AN mouse(Mort); cat(Felix);
cause(bite(Felix, Mort),
ee(Mort, Felix))
FOR mouse(Mort); cat(Felix);
cause(ee(Mort, Felix),
bite(Felix, Mort))
4. EXPERIMENTS
Let us investigate performance of the proposed approach for nding similar analogues
using the knowledge bases and experimental scheme that were previously applied in the
study of leading analogy models MAC/FAC (Forbus et al. 1995) and ARCS (Thagard et al.
1990).
4.1. Experimental Scheme
4.1.1. Similarity Types of Analogues. It is known that humans retrieve some types of
analogues more readily than others. By analyzing various types of analogues, their similarity
types have been identied and presented in terms of retrievability order (Gentner 1983; Ross
1989; Wharton et al. 1994; Forbus et al. 1995).
Similarity types are considered compared to some base analogue and have varied types of
commonalities summarized in Table 3 adapted from Forbus et al. (1995). All episodes share
rst-order relations. The literal similarity (LS) episodes also share both higher order relations
and object attributes. The true analogy (AN) episodes share higher order relations but have
different attributes. The surface features (SF) episodes share attributes but have different
higher order relations. The rst order relations (FOR) episodes differ both in attributes and
higher order relational structure. Examples using animal stories (Thagard et al. 1990; adapted
by Plate) are also given in Table 3. They have attributes dog and human, rst-order relations
bite and ee, higher order relation cause.
Generally, for analogical retrieval it is considered that semantic similarity is preferred to
relational similarity, so that the retrievability order can be expressed as LS SF > AN
FOR (Gentner 1983; Ross 1989; Wharton et al. 1994; Forbus et al. 1995).
4.1.2. Experimental Setup and Test Knowledge Bases. The experimental knowledge
base is constructed so that it includes analogues with different and known type and degree of
similarity to each other. Some analogues are selected as the target (input, probe) analogues,
and some as the base ones. Because it is known in advance which analogues are more similar
to each other and should be retrieved by humans, performance of a retrieval system can be
estimated compared to that gold standard.
We have conducted experimental testing using the knowledge base employed for testing
of ARCS (Thagard et al. 1990; later became available as ThinkNet, THNET 2010), with the
additions supplemented and used for testing of MAC/FAC (Forbus et al. 1995, see acknowl-
edgments). The knowledge base consists of formal descriptions of analogical episodes and
situations. These descriptions were constructed manually by experts on the basis of textual
material used in psychological experiments to elucidate characteristics of human analogical
reasoning.
ThinkNet includes Lisp-formalized descriptions of 100 Aesops fables, 25 Shakespeares
plays, 5 stories about Karla the Hawk, as well as West Side Story musical and 4 Sour Grapes
fable variations. The MAC/FAC base additionally includes 45 episodes consisting of 9
base analogues and 4 versions of each with different similarity types (LS, SF, AN, FOR).
On average, an analogue in those knowledge bases contains 90 propositions, including
50 attributes and 40 relations encoded in Lisp.
The retrieval results are presented in terms of the retrievability order and compared
to the results of the most advanced model MAC/FAC. Also, for our system we calculated
(and estimated where possible for MAC/FAC) the standard information retrieval measures
of precision P and recall R:
R =n 1/n2; P =n1/n3, (6)
where n1 is the number of correct analogues returned, n2 is the number of correct analogues
in the base, and n3 is the total number of analogues returned.
For our SBDR model, we also investigated dependencies of recall and precision on the
dimensionality of codevectors.
4.1.3. Codevector Implementation and Parameters. Experiments were implemented
using SLANG (Slipchenko 2005a)a symbolic language for distributed representation that
combines a vivid description of analogues as predicate expressions with simple description
of operations with distributed representations. SLANG was used to convert the knowledge
base descriptions to codevector internal representations using the representation scheme of
Section 2 and to program the analogical retrieval process itself.
Codevectors and their parameters have been chosen as follows. The terminal codevec-
tors for attributes, objects and relational roles were randomly generated and then memorized
and used the same for any occurrence of the particular terminal item. Dimensionality of
codevectors was N = 10
5
if not stated otherwise. The average number of 1s M(attribute) =
M(object) = 1,000 has been chosen as in Rachkovskij (2001), whereas M(role) = 2,000 >
M(attribute) = M(object) = 1,000 was chosen to reect importance of relations. In-
stantiations of those parameters and of F
CDT
= {0.1, 0.2} were selected from the in-
terval that provided correct retrievability scores for the animal episodes presented in
Table 3.
All analogues were represented by codevectors, and for each target (probe) analogue
codevector from the experimental set the dot products (overlaps) with the codevectors of
the base analogues were calculated. One or several most similar analogues were selected as
the retrieval result. The results reported below were averaged by 100 instances of random
terminal codevectors used to construct the codevectors of analogues.
TABLE 4. Recall Values. Test 1: Retrieval of the Base-Type Analogues Given Different Versions of Probes
(Similarity Types LS, SF, and AN). The FOR Analogues Were Also in the Base Serving as Distractors.
Probe LS SF AN
MAC/FAC (10%) 1.00 0.89 0.67
SBDR (10%) F
CDT
= 0.1 1.00 1.00 0.76
SBDR (10%) F
CDT
= 0.2 1.00 1.00 0.78
SBDR (1) F
CDT
= 0.1 1.00 0.79 0.37
SBDR (1) F
CDT
= 0.2 1.00 0.76 0.58
4.2. Results of Experiments
4.2.1. Retrievability Order and Recall. Test 1. For this test we used the scheme of
Cognitive simulation experiment 1 of Forbus et al. (1995). Nine base analogues of Karla
the Hawk story and their FOR variants which served as distractors were placed to the base.
The LS, SF, and AN variants of the base analogues were in turn used as a probe to the
retrieval system, and the number of correct base analogues in the returned list was counted.
As in Forbus et al. (1995), the returned list contained analogues within the 10% interval
of similarity value relative to the most similar base analogue. In addition, for SBDR we
considered the results when single most similar base analogue was returned.
The results are shown in Table 4. The values of recall were calculated for each type of
probe: LS, SF, and AN. Results for MAC/FAC and humans are from Forbus, Gentner, and
Law (taking into account that their proportion of correct retrievals actually provides recall
values). Forbus, Gentner, and Law also present results of related experiments with human
subjects, which showed recall of 0.56 for the LS analogues, 0.53 for SF, 0.12 for AN, and
0.09 for FOR.
As mentioned in Forbus et al. (1995), MAC/FAC performance is much better than that
of human subjects, perhaps partly because of the differences in the experimental setup.
However, the key point is that the MAC/FAC results show the same retrievability order
LS SF > AN.
The SBDR results also show the same retrievability order with higher recall values.
Though we used more 1s in the codevectors of roles than in the codevectors of attributes
or objects (Section 4.1.3), the resulting similarity pattern showed preference to the surface
similarity compared to the relational one. The results for SBDR (1) with just 1 (single best)
selected analogue show that they are close to MAC/FAC (10%).
This deterioration of the MAC/FACresults is because of the MACstage that uses attribute
vectors of analogues (frequencies of objects, attributes and relations) to estimate similarity
by dot product. This stage is computationally efcient but does not consider structure, and
so misses some valid analogues.
Test 2. For the next test we used the scheme of Cognitive simulation experiment 2
of Forbus et al. (1995). The LS, SF, AN, and FOR variants of nine base Karla the Hawk
analogues were placed to the base. Each of nine base analogues was used as a probe. As
Forbus et al. (1995) mention, this is almost the reverse of the task human subjects faced, and
more difcult.
For each analogue type, the number of its occurrences in the returned list was counted.
The recall values for each analogue type are given in Table 5. Again, they show correct
retrievability order (matching the results for human subjects) both for MAC/FAC and SBDR
(except for ANat SBDR(10%) for the thinning factor F
CDT
=0.2). For the analogues that are
TABLE 5. Recall Values. Test 2: Retrieval of Different Analogue Similarity Types (LS, SF, AN, and FOR)
Given the Base Versions as Probes.
Retrievals LS SF AN FOR Other
MAC (10%) 0.78 0.78 0.33 0.22 1.33
MAC/FAC (10%) 0.78 0.44 0.22 0.00 0.22
SBDR (10%) F
CDT
= 0.1 0.81 0.78 0.68 0.09 0.39
SBDR (10%) F
CDT
= 0.2 0.87 0.78 0.91 0.22 0.84
SBDR (1) F
CDT
= 0.1 0.48 0.30 0.22 0.00 0.00
SBDR (1) F
CDT
= 0.2 0.50 0.26 0.22 0.00 0.02
TABLE 6. Precision of Retrieval Given Different Similarity Types (LS, SF, AN, and FOR) of Analogues as
Probes.
Probe LS SF AN FOR
MAC (10%) 0.5 0.1 0.08 0.09
MAC/FAC (10%) 1.00 1.00 1.00 0.5
SBDR (1) F
CDT
= 0.1 1.00 1.00 1.00 1.00
SBDR (1) F
CDT
= 0.2 1.00 1.00 1.00 1.00
SBDR (10%) F
CDT
= 0.1 1.00 1.00 1.00 1.00
SBDR (10%) F
CDT
= 0.2 1.00 1.00 1.00 0.63
similarity types variations of the base analogue used as a probe, the recall values for SBDR
are higher than those for MAC/FAC, and even than those for MAC alone. At the same time,
SBDR (10%) F
CDT
=0.1 returns much less Other analogues (i.e., irrelevant analoguesthat
is, any retrieval from a story set different from the one to which the base belongs) compared
to MAC (0.39 vs. 1.33), and comparable to MAC/FAC (0.39 vs. 0.22). SBDR (1) for
F
CDT
= 0.1 returns no Other analogues at all.
4.2.2. Retrieval Precision. Test 3. To investigate precision, we used the MAC/FAC
experimental scheme and results from Computational experiment 4: Hawk stories of Forbus
et al. (1995). All fables and plays of the ThinkNet analogues and the Karla base analogue
were placed to the knowledge base in memory. The LS, SF, AN, and FOR versions of the
Karla base analogue were used as probes. For MAC and FAC, the numbers of returned
analogues in the result list were taken from table 16 in Forbus et al. (1995) and precision
values were calculated. The obtained results are shown in Table 6.
For all experiments, the Karla base story was returned as the most similar analogue. It
means that recall is always 1. It may be noted that the results of ARCS (Thagard et al. 1990)
reported in Forbus et al. (1995) provide the proper Karla base story as the most similar only
for the LS probe.
The precision results are as follows: SBDR demonstrates the maximal precision = 1
for all cases, except FOR at SBDR (10%) F
CDT
= 0.2, where precision = 0.63. This last
result is comparable to MAC/FAC (10%) for FOR, where precision =0.5. For MAC top 10%
similarity interval, the precision results are 510 times lower for SF, AN, and FOR; and for
LS, they are 3050% lower. Low precision values for MAC lead to more candidates to be
processed at the second computationally expensive stage of FAC(using SME, see Section 1.3
and Forbus et al. 1995). Remind that the SBDR results are single-stage.
(a)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
100 1000 10000 100000
N
R
MAC/FAC LS SF AN
F_CDT=0.1 LS SF AN
F_CDT=0.2 LS SF AN
F_CDT=0.5 LS SF AN
F_CDT=0.8 LS SF AN
(b)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
100 1000 10000 100000
N
P
F_CDT=0.1
LS
SF
AN
F_CDT=0.2
LS
SF
AN
F_CDT=0.5
LS
SF
AN
F_CDT=0.8
LS
SF
AN
FIGURE 4. Reimplementing Test 1: Retrieval of the base-type analogs given different similarity types (LS, SF,
and AN) of probes. Codevector dimensionality N was varied preserving the constant fraction of unit components
(1s). (a) recall; (b) precision.
4.2.3. Varying Codevector Size. Test 4. The computational complexity of SBDR is
determined by the number M of 1s in codevectors. So, we repeated the SBDR part of
Test 1 for the varied values of M preserving the M/N values indicated in Section 4.1.3
(and thus obtaining the corresponding values N of codevector dimensionality). The results
in Figure 4 show that for N decreasing from 10
5
to 10
3
(and proportionally decreasing
M) the recall values remain approximately the same and precision decreases. A noticeable
drop of recall and precision is observed at N = 500 and less, where M becomes 5 and
less, and so the CDT procedure becomes unstable (it may output zero codevectors or the
input codevectors, depending of their random realizations). Also, the similarity values (see
equation (1)) themselves become unstable for small Ms.
5. CONCLUSION
To increase performance of operating with structures, we develop distributed represen-
tations of structures that carry immediate information on both the set of structural elements
of various hierarchical levels and their structural organization. Such representations allow
for holistic processing of structures, without the need to follow edges or to match individual
vertices of graph structures. However, decoding structure elements and operating with them
individually is also possible, if required, as discussed elsewhere (Rachkovskij and Kussul
2001; Rachkovskij 2004; Slipchenko, Rachkovskij, and Misuno 2005).
Being an instance of vector representations, distributed representations allow: an easy
estimation of complex object similarity by measures of vector similarity; a massively parallel
implementation; and application of the whole arsenal of methods elaborated for vector spaces.
Compared to symbolic and neural network localist representations of graphs, they provide:
a better account of semantic content; exibility; and the ability to cope with noisy and unex-
pected input. Compared to localist vector representations, distributed representations allow
a natural representation and calculation of gradual similarity between items; make a more
efcient use of representational resources; are more robust and neurobiologically plausible;
and may be considered as the idea of how related representations and processes are imple-
mented in the brain. Sparse distributed representations that we develop additionally allow an
effective use of inverse indexing and other efcient computer structures and algorithms, a
simple and efcient parallel hardware implementation, and are even more neurobiologically
plausible.
Usually, when graph embeddings in vector spaces are investigated, of interest is how
accurately they approximate well-known graph similarity measures in the original space.
However, because those graph-theoretic measures may be inadequate for similarity-based
retrieval from knowledge bases, nding close analogues by vector space embeddings may
appear more appropriate for a particular application.
In this paper we have described a scheme for distributed representation of analogues
and experimentally investigated it on the standard knowledge base used for testing leading
analogical retrieval models. The results showthat our codevector approach provided the same
retrievability order as the best available retrieval model MAC/FAC of Forbus et al. (1995)
and as the results of experiments with human subjects. They also show some increase in
recall and noticeable increase in precision.
The reason is that MAC/FAC is a combined vector-symbolic model that uses a two-
stage approach to retrieval, where the rst MAC stage does not take structure into account
and so degrades the retrieval quality. On the other hand, the second FAC stage that estimates
similarity taking structure into account has a high computational complexity, at least O(n
2
L
),
i.e., quadratic in the number of analogue elements n and linear in the number of candidates
L
provided by the rst stage. Note that the computational complexity of the ARCS retrieval
model of Thagard et al. (1990) is even higher: O(n
4
L
2
). The reason for high computational
complexity of traditional methods is their use of some kind of explicit procedure for nding
corresponding elements of analogues that introduces quadratic or higher degrees of n to the
complexity estimation.
In contrast, in codevector approaches the number of analogue elements n inuences only
the complexity of their codevector construction. In the process of similarity estimation, the
corresponding elements automatically nd each other as common 1s of codevectors. So,
computational complexity dominated by O(ML
/
) is proportional to the number M of 1s in the
codevector (that may be considered constant) and the average number L
/
of the base analogues
whose codevectors has 1 in the particular component. Such a moderate computational cost
makes this approach particularly well suited for similarity-based retrieval from large-scale
knowledge bases having complex records with many elements in each. We believe that similar
results could be obtained with other (appropriately modied) structure-sensitive distributed
representation schemes, such as HRRs of Plate (1991, 2003), Binary Spatter Codes of
Kanerva (1996, 2009), etc.
Let us note that the knowledge base we used for testing has been previously used only
for testing of MAC/FAC and, partially, ARCS. Other known models of analogical retrieval
used (at most) small parts of this base or small number of other episodes. Nevertheless, this
knowledge base is still rather small. Also, its attribute set is very limited, thus reecting poor
account of semantics.
We argue that the developed representation scheme is useful for large-scale knowledge
bases and free-structured database applications (see also Beal and Roberts 2009). This is both
because of a computationally efcient estimation of structure similarity and because of po-
tential of taking into account semantics of objects by natural incorporating their descriptions
in terms of numerical feature vectors (Rachkovskij et al. 2005a, b, 2012; Slipchenko 2005b)
or context vectors reecting meaning of concepts they represent (Misuno, Rachkovskij, and
Slipchenko 2005; Misuno et al. 2005; Jones and Mewhort 2007; Sahlgren et al. 2008). Par-
ticular aspects of codevector representations and approach, as a whole, should be elaborated
taking into account specic challenges of demanding applications with knowledge bases and
free-structured databases.
ACKNOWLEDGMENTS
We thank Ken Forbus and Paul Thagard for generously providing their analogical
episodes that we used in the experiments. We are grateful to three anonymous reviewers
for their valuable comments on an earlier version of the manuscript. D.R. also thanks Pentti
Kanerva, Art Markman, and Tony Plate for helpful discussions during various stages of this
research.
REFERENCES
AAMODT, A., and E. PLAZA. 1994. Case-based reasoning: Foundational issues, methodological variations, and
system approaches. AI Communications, 7(1):3959.
AHA, D. W. 1998. The omnipresence of case-based reasoning in science and application. Knowledge-Based
Systems, 11(56):261273.
BEAL, J., and J. ROBERTS. 2009. Enhancing methodological rigor for computational cognitive science: Complexity
analysis. In Proceedings of the 31th Annual Conference of the Cognitive Science Society. Edited by N.A.
Taatgen and H. van Rijn. Cognitive Science Society: Austin, TX, pp. 99104.
BERGMANN, R. 2002. Experience Management: Foundations, Development Methodology, and Internet-Based
Applications, Vol. 2432 of Lecture Notes in Computer Science, Springer: Berlin, New York, pp. 393.
BERGMANN, R., K.-D. ALTHOFF, S. BREEN, M. G OKER, M. MANAGO, R. TRAPH ONER, and S. WESS. 2009.
Developing industrial case-based reasoning applications: The INRECA methodology. In Lecture Notes in
Computer Science/Lecture Notes in Articial Intelligence. Springer: Berlin.
BJORNESTAD, S. 2003. Analogical reasoning for reuse of object-oriented specications. In Case-Based Reasoning
Research and Development, Vol. 2689 of Lecture Notes in Computer Science. Edited by K. D. Ashley and
D. G. Bridge. Springer: Berlin/Heidelberg, pp. 5064.
BORNER, K. 1993. Structural similarity as guidance in case-based design. In Proceedings of the First European
Workshop on Case-Based Reasoning. Springer: Berlin, pp. 197208.
BROWNE, A., and R. SUN. 2001. Connectionist inference models. Neural Networks, 14(10): 13311355.
BUNKE, H. 1997. On a relation between graph edit distance and maximum subgraph. Pattern Recognition Letters,
18:689694.
BUNKE, H. 1999. Error correcting graph matching: On the inuence of the underlying cost function. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 21(9):917922.
BUNKE, H. 2000a. Graph matching: Theoretical foundations, algorithms, and applications. In Proceedings of
Vision Interface 2000, pp. 8288.
BUNKE, H. 2000b. Recent developments in graph matching. In Proceedings of the Fifteenth International Con-
ference on Pattern Recognition, Vol. 2, pp. 117124.
BUNKE, H., and B. T. MESSMER. 1993. Structural similarity as guidance in case-based design. In Proceedings of
the First European Workshop on Case-Based Reasoning. Springer: Berlin, pp. 106118.
BUNKE, H., X. JIANG, and A. KANDEL. 2000. On the minimum common supergraph of two graphs. Computing,
65:1325.
CHAMPIN, P. A., and C. SOLNON. 2003. Measuring the similarity of labeled graphs. In Proceedings of the Fifth
International Conference on Case-Based Reasoning. Springer: Berlin, pp. 8095.
CHAUDHRI, A. B., A. RASHID, and R. ZICARI. 2003. XML Data Management: Native XML and XML-Enabled
Database Systems. Addison-Wesley Professional: Redwood City, CA.
DOYLE, D., P. CUNNINGHAM, D. BRIDGE, and Y. RAHMAN. 2004. Explanation oriented retrieval. In Proceedings
of the Seventh European Conference on Case-Based Reasoning. Springer: Berlin, pp. 157168.
ELIASMITH, C., and P. THAGARD. 2001. Integrating structure and meaning: A distributed model of analogical
mapping. Cognitive Science, 25(2):245286.
FALKENHAINER, B., K. D. FORBUS, and D. GENTNER. 1989. The structure-mapping engine: Algorithm and
examples. Articial Intelligence, 41:163.
FELLBAUM, C. 1998. WordNet: An Electronic Lexical Database. The MIT Press: Cambridge, MA.
FERN ANDEZ, M.-L., and G. VALIENTE. 2001. A graph distance metric combining maximum common subgraph
and minimum common supergraph. Pattern Recognition Letters, 22:753758.
FORBUS, K. 2001. Exploring analogy in the large. In The Analogical Mind: Perspectives from Cognitive Science.
Edited by D. Gentner, K. Holyoak, and B. Kokinov. MIT Press: Cambridge, MA, pp. 2358.
FORBUS, K. D., D. GENTNER, and K. LAW. 1995. MAC/FAC: A model of similarity-based retrieval. Cognitive
Science, 19(2):141205.
FORBUS, K., and T. HINRICHS. 2006. Companion Cognitive Systems: Astep towards human-level AI. AI Magazine,
27(2):8395.
FORBUS, K., K. LOCKWOOD, and A. SHARMA. 2009. Steps towards a 2nd generation learning by reading system.
In Proceedings of AAAI Spring Symposium on Learning by Reading.
FRASCONI, P., M. GORI, and A. SPERDUTI. 1998. A general framework for adaptive processing of data structures.
IEEE Transactions on Neural Networks, 9(5):768786.
FROLOV, A. A., D. A. RACHKOVSKIJ, and D. HUSEK. 2002. On information characteristics of Willshaw-like
auto-associative memory. Neural Network World, 2:141157.
FROLOV, A. A., D. HUSEK, and D. A. RACHKOVSKIJ. 2006. Time of searching for similar binary vectors in
associative memory. Cybernetics and Systems Analysis, 5:615623.
GALLANT, S. I. 2000. Context vectors: A step toward a Grand Unied Representation. In Hybrid Neural
Systems, Lecture Notes in Computer Science, Vol. 1778. Edited by S. Wermter, and R. Sun. Springer:
Berlin, pp. 204210.
GAYLER, R. 1998. Multiplicative binding, representation operators, and analogy. In Advances in Analogy Re-
search: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences. Edited by
K. Holyoak, D. Gentner, and B. Kokinov. New Bulgarian University: Soa, Bulgaria.
GAYLER, R. W., and S. D. LEVY. 2009. A distributed basis for analogical mapping. In Proceedings of the Second
International Analogy Conference. NBU Press: Soa, Bulgaria, pp. 165174.
GENTNER, D. 1983. Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7:155170.
GENTNER, D., and A. B. MARKMAN. 2003. Analogy-based reasoning and metaphor. In The Handbook of Brain
Theory and Neural Networks. Edited by M. A. Arbib. The MIT Press: Cambridge, MA, pp. 106109.
GRAY, B., G. S. HALFORD, W. H. WILSON, and S. PHILLIPS. 1997. A neural net model for mapping hierarchically
structured analogs. In Proceedings of the Fourth Conference of the Australasian Cognitive Science Society,
University of Newcastle, NSW, Australia.
HOFSTADTER, D. R., and M. MITCHELL. 1988. Conceptual slippage and mapping: A report of the copycat project.
In Proceedings of the Tenth Annual Conference of Cognitive Science Society, pp. 601607.
HOLYOAK, K. J., and P. THAGARD. 1989. Analogical mapping by constraint satisfaction. Cognitive Science,
13(3):295355.
HUMMEL, J., and K. HOLYOAK. 1997. Distributed representations of structure: A theory of analogical access and
mapping. Psychological Review, 104:427466.
JIANG, X., and H. BUNKE. 1999. Optimal quadratic-time isomorphism of ordered graphs. Pattern Recognition,
32:12731283.
JONES, M. N., and D. J. K. MEWHORT. 2007. Representing word meaning and order information in a composite
holographic lexicon. Psychological Review, 114:137.
KANERVA, P. 1996. Binary spatter-coding of ordered k-tuples. In Articial Neural Networks, Proceedings of
ICANN 96. Edited by C. von der Malsburg, W. von Seelen, J. Vorbruggen, and B. Sendhoff. Springer-
Verlag: Berlin, pp. 869873.
KANERVA, P. 2009. Hyperdimensional computing: An introduction to computing in distributed representation
with highdimensional random vectors. Cognitive Computation, 1:139159.
KEANE, M. T., T. LEDGEWAY, and S. DUFF. 1994. Constraints on analogical mapping: A comparison of three
models. Cognitive Science, 18:287334.
KOKINOV, B. 1988. Associative memory based reasoning: How to represent and retrieve cases. In Articial
Intelligence III: Methodology, Systems, Applications. Edited by T. OShea, and V. Sgurev. Elsevier Science
Publishers B.V.: Amsterdam, the Netherlands, pp. 5158.
KOKINOV, B., and R. FRENCH. 2003 Computational models of analogy-making. In Encyclopedia of Cognitive
Science. Edited by L. Nadel. Nature Publishing Group: London, pp. 113118.
KUSSUL, E. M. 1992. Associative Neuron-Like Structures. Naukova Dumka: Kiev. (In Russian)
KUSSUL, E. M., T. N. BAIDYK, D. C. WUNSCH, O. MAKEYEV, and A. MARTIN. 2006. Permutation coding
technique for image recognition systems. IEEE Ttransactions on Neural Networks, 17(6):15661579.
LEVI, G. 1972. A note on the derivation of maximal common subgraphs of two directed or undirected graphs.
Calcolo, 9:341354.
LOPEZ DE MANTARAS, R., D. MCSHERRY, D. BRIDGE, D. LEAKE, B. SMYTH, S. CRAW, B. FALTINGS, M. L.
MAHER, M. T. COX, K. FORBUS, M. KEANE, A. AAMODT, and I. WATSON. 2005. Retrieval, reuse, revision
and retention in case-based reasoning. Knowledge Engineering Review, 20(3):215240.
MARKMAN, A. B., D. A. RACHKOVSKIJ, I. S. MISUNO, and E. G. REVUNOVA. 2003. Analogical reasoning techniques
in intelligent counterterrorism systems. Informational Theories and Applications, 10(2):139146.
MARKMAN, A. B., and D. GENTNER. 2000. Structure mapping in the comparison process. American Journal of
Psychology, 113:501538.
MCGREGOR, J. J. 1982. Backtrack search algorithms and the maximal common subgraph problem. Software
Practice and Experience, 12:2334.
MCSHERRY, D. 2003. Similarity and compromise. In Proceedings of the Fifth International Conference on Case-
Based Reasoning. Springer: Berlin, pp. 291305.
MEDASANI, S., R. KRISHNAPURAM, and Y. S. CHOI. 2001. Graph matching by relaxation of fuzzy assignments.
IEEE Transactions on Fuzzy Systems, 9(1):173182.
MESSMER, B. T., and H. BUNKE. 1998. Anewalgorithmfor error-tolerant subgraph isomorphismdetection. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 20(5):493504.
MISUNO, I. S., D. A. RACHKOVSKIJ, and S. V. SLIPCHENKO. 2005. Vector and distributed representations reecting
semantic relatedness of words. Mathematical Machines and Systems, 3:5066. (In Russian)
MISUNO, I. S., D. A. RACHKOVSKIJ, S. V. SLIPCHENKO, and A. M. SOKOLOV. 2005. Searching for text information
with vector representations. Problems in Programming, 4:5059. (In Russian)
PAGE, M. 2000. Connectionist modelling in psychology: A localist manifesto. Behavioral and Brain Sciences,
23:443512.
PELILLO, M. 1999. Replicator equations, maximal cliques, and graph isomorphism. Neural Computation,
11:19331955.
PLATE, T. A. 1991. Holographic Reduced Representations: Convolution algebra for compositional distributed
representations.In Proceedings of the 12th International Joint Conference on Articial Intelligence (IJCAI).
Edited by J. Mylopoulos and R. Reiter. Morgan Kaufmann: San Mateo, CA, pp. 3035.
PLATE, T. A. 1994a. Estimating structural similarity by vector dot products of Holographic Reduced Represen-
tations. In Advances in Neural Information Processing Systems 6 (NIPS93). Edited by J. D. Cowan, G.
Tesauro, and J. Alspector. Morgan Kaufmann: San Mateo, CA, pp. 11091116.
PLATE, T. A. 1994b. Distributed representations and nested compositional structure Ph.D. Thesis, Depart-
ment of Computer Science, University of Toronto, Toronto, Canada Available at http://internet.cybermesa.
com/champagne/tplate/.
PLATE, T. A. 2000. Analogy retrieval and processing with distributed vector representations. Expert Systems:
The International Journal of Knowledge Engineering and Neural Networks, Special Issue on Connectionist
Symbol Processing, 17(1):2940.
PLATE, T. A. 2003. Holographic Reduced Representation: Distributed Representation for Cognitive Science.
CSLI Publications: Stanford, CA, pp. 300.
RAMSCAR, M., and D. YARLETT. 2003. Semantic grounding in models of analogy: An environmental approach.
Cognitive Science, 27:4171.
RACHKOVSKIJ, D. A. 2001. Representation and processing of structures with binary sparse distributed codes.
IEEE Transactions on Knowledge and Data Engineering, 13(2):261276.
RACHKOVSKIJ, D. A. 2004. Some approaches to analogical mapping with structure sensitive distributed represen-
tations. Journal of Experimental and Theoretical Articial Intelligence, 16(3):125145.
RACHKOVSKIJ, D. A., and E. M. KUSSUL. 2001. Binding and normalization of binary sparse distributed represen-
tations by context-dependent thinning. Neural Computation, 13(2):411452.
RACHKOVSKIJ, D. A., I. S. MISUNO, and S. V. SLIPCHENKO. 2012. Randomized projective methods for the
construction of binary sparse vector representations. Cybernetics and Systems Analysis, 48(1):146156.
RACHKOVSKIJ, D. A., S. V. SLIPCHENKO, E. M. KUSSUL, and T. N. BAIDYK. 2005a. Properties of numeric codes
for the scheme of random subspaces RSC. Cybernetics and Systems Analysis, 4:509520.
RACHKOVSKIJ, D. A., S. V. SLIPCHENKO, I. S. MISUNO, E. M. KUSSUL, and T. N. BAIDYK. 2005b. Sparse binary
distributed encoding of numeric vectors. Journal of Automation and Information Sciences, 11:4761.
RIESEN, K., S. FANKHAUSER, H. BUNKE, and P. DICKINSON. 2009. Efcient suboptimal graph isomorphism. In
Graph-Based Representations in Pattern Recognition, Vol. 5534 of Lecture Notes in Computer Science.
Edited by A. Torsello, F. Escolano, and L. Brun. Springer: Berlin, pp. 124133.
RIESEN, K., M. NEUHAUS, and H. BUNKE. 2007. Graph embedding in vector spaces by means of prototype
selection. In Graph-Based Representations in Pattern Recognition, Vol. 4538 of Lecture Notes in Computer
Science. Edited by F. Escolano and M. Vento. Springer: Berlin, pp. 383393.
RISSLAND, E. L. 1983. Examples in legal reasoning: Legal hypotheticals. In Proceedings of IJCAI 1983. Morgan
Kauffman: San Mateo, CA, pp. 9093.
ROBLES-KELLY, A., and E. R. HANCOCK. 2007. A Riemannian approach to graph embedding. Pattern Recogni-
tion, 40:10241056.
ROSS, B. H. 1989. Distinguishing types of supercial similarities: Different effects on the access and use of
earlier examples. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(3):456468.
SAHLGREN, M., A. HOLST, and P. KANERVA. 2008. Permutations as a Means to Encode Order in Word Space. In
Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci08), Washington, DC,
pp. 2326.
SCHENKER, A., H. BUNKE, M. LAST, and A. KANDEL. 2005. Graph-theoretic Techniques for Web Content Mining.
World Scientic: River Edge, NJ.
SCHANK, R. C. 1982. Dynamic Memory: A Theory of Reminding and Learning in Computers and People,
Cambridge University Press: New York.
SLIPCHENKO, S. V. 2005a. SLANG: A symbolic language for distributed representation. In Proceedings of the
Fourteenth International Conference on Neurocybernetics, Vol. 2, Rostov-on-Don, Russia, pp. 237239.
SLIPCHENKO, S. V. 2005b. Distributed representations in the problems of processing structured numeric and
symbolic information. System Technologies, 6(41):134141. (In Russian)
SLIPCHENKO, S. V., D. A. RACHKOVSKIJ, and I. S. MISUNO. 2005. Decoding binary distributed representations of
numerical vectors. Computer Mathematics, 3:108120. (In Russian)
SMYTH, B., and M. T. KEANE. 1998. Adaptation-guided retrieval: Questioning the similarity assumption in
reasoning. Articial Intelligence, 102(2): 249293.
SMYTH, B., and E. MCKENNA. 1999. Footprint based retrieval. In Proceedings of the Third International Confer-
ence on Case-based Reasoning. Springer: Berlin, pp. 343357.
SMYTH, B., and E. MCKENNA. 2001. Competence guided incremental footprint-based retrieval. Knowledge-Based
Systems, 14(34):155161.
SOKOLSKY, O., S. KANNAN, and I. LEE. 2006. Simulation-based graph similarity. In Tools and Algorithms for the
Construction and Analysis of Systems, Vol. 3920 of Lecture Notes in Computer Science. Springer: Berlin/
Heidelberg, pp. 426440.
SOWA, J. F. 2000. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks
Cole: Pacic Grove, CA.
STANFILL, C., and D. WALTZ. 1986. Toward memory-based reasoning. Communications of the ACM,
29(12):12131228.
SUN, Z., and G. R. FINNIE. 2004. Intelligent Techniques in E-Commerce: A Case Based Reasoning Perspective.
Springer: Berlin.
THAGARD, P., K.J. HOLYOAK, G. NELSON, and D. GOCHFELD. 1990. Analog retrieval by constraint satisfaction.
Articial Intelligence, 46(12):259310.
THNET. 2012. THNET: THinkNet connectionist software. Available at http://www.cs.cmu.edu/afs/cs/project/
ai-repository/ai/areas/neural/systems/thnet/0.html. Accessed February 2012.
THORPE, S. 2003. Localized versus distributed representations. In The Handbook of Brain Theory and Neural
Networks. Edited by M. A. Arbib. The MIT Press: Cambridge, MA, pp. 643646.
TINKER, P., J. FOX, C. GREEN, D. ROME, K. CASEY, and C. FURMANSKI. 2005. Analogical and case-based
reasoning for predicting satellite task schedulability. In Proceedings of ICCBR 2005, Vol. 3620 of Lecture
Notes in Computer Science. Springer: Berlin/Heidelberg, pp. 566578.
WATSON, I. 1997. Applying Case-Based Reasoning: Techniques for Enterprise Systems. Morgan Kaufmann: San
Francisco, CA.
WESS, S., K.-D. ALTHOFF, and G. DERWAND. 1993. Using K-D trees to improve the retrieval step in case-based
reasoning. In Proceedings of the First European Workshop on Case-Based Reasoning. Springer: Berlin,
pp. 167181.
WETZEL, J., and K. FORBUS. 2009. Automated critique of sketched mechanisms. In Proceedings of the 21st
Innovative Applications of Articial Intelligence Conference, Pasadena, CA.
WHARTON, C., K. HOLYOAK, P. DOWNING, T. LANGE, T. WICKENS, and E. MELZ. 1994. Below the surface:
Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26(1):64101.
WILSON, R. C., E. R. HANCOCK, and B. LUO. 2005. Pattern vectors fromalgebraic graph theory. IEEETransactions
on Pattern Analysis and Machine Intelligence, 27(7):11121124.
XML database. 2012. Available at http://en.wikipedia.org/wiki/XML_database. Accessed February 7, 2012.

Similarity-Based Retrieval With Structure-Sensitive Sparse Binary Distributed Representations

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Similarity-Based Retrieval With Structure-Sensitive Sparse Binary Distributed Representations

Загружено:

Авторское право:

Доступные форматы

Computational Intelligence, Volume 28, Number 1, 2012

SIMILARITY-BASED RETRIEVAL WITH STRUCTURE-SENSITIVE

(k) is Zwith the permuted components. For each k, randomindependent permutation

(Kussul et al. 2006).

are recursively constructed from the codevectors

of most similar analogues, or

maximal values of the similarity vector has complexity

) if the similarity vector has L1 nonzero components. If we

is the number of analogues output by MAC.

Вам также может понравиться