Knowledge Representation

KNOWLEDGE REPRESENTATION FOR EXPERT SYSTEMS
MAREK PETRIK
Abstract. The purpose of this article is to summarize the state-of-the-art of the expert
systems research field. First, we introduce the basic notion of knowledge, and specifically
of shallow knowledge, and deep knowledge.
The first section of the document summarizes the history of the field. We analyze
the differences between the first generation of expert systems, based primarily upon rulebased and frame-based representation of shallow knowledge. We mainly concentrate on
the most important expert systems and their impact on the subsequent research. These
are the traditional Mycin and Prospector expert systems, but also less famous ones such
as General Problem Solver, Logic Theory Machine, and others. Finally, we present some
modern expert systems and shells such as Gensyms G2 and also some light-weight prolog
based expert systems, usually based on deep knowledge of the domain.
In the forth section, we compare various knowledge representation languages. We
briefly describe each, present some inference techniques, and also discuss primary the
upsides and downsides. For each, we finally present successful expert systems and shells
using the language. As for shallow knowledge, we review mainly rule-based and framebased knowledge representation languages. We try to argue why these are not very
suitable to model complex relationships present in many real world applications and
therefore not suitable for deep knowledge representation. Subsequently, we present early
semantic networks as the first attempt to model deep knowledge. Then more in depth, we
analyze the approaches based on simplifications and extensions of traditional logic. In the
first place this is the propositional logic, first order predicate logic, modal logic, and finally
logic programming (Prolog). We further continue extended with constraint programming.
Then we discuss nonmonotonic knowledge representation languages, such as answer set
programming and default logic. At last we analyze representation of knowledge for
continuous domains, mostly addressed by qualitative and semi-qualitative simulation.
The fifth section presents various uncertainty measures and their combinations with
the previously mentioned representation languages. First we reason why we need to represent uncertainty. Then we present generally required properties to which the measure
should adhere. We follow with the classic probability theory and its combinations with
propositional, first order predicate logic, and modal logic, and logic programs. We discuss the most popular representation model, Bayesian belief networks. We point out the
properties and reasons why probability is a measure that works perfectly for statisticians,
but is not completely satisfactory for many artificial intelligence domains. We continue
with Dempster-Shafer theory. We introduce the Transferable Belief Model that employs
this measure. Next we present possibility theory and its combination with predicate
logic, known as fuzzy logic. We present why this seems to be a very popular choice for
simple systems and why it seems unsuitable for large and complex expert systems. We
also presents systems trying to combine rule-based systems with neural networks.
1. Introduction
Even before the conception of the first computer, people dreamt of creating an intelligent
machine. With the conception of first computers, it seamed the idea would be shortly materialized. Despite the tremendous growth of the processing power of the modern computers,
their intelligence, as understood by most people, remains very limited. And it is only the
limited domains where the computers achieved the most success. We now live in the world
where the best chess player is not a human, and hardly anyone would take his chance to
build a bridge without a computer aid.
Key words and phrases. Expert Systems, Uncertainty, Knowledge Representation, Adaptive Systems,
Multiagent Systems.
1
The feature of computers people valuate the most is their precision. Computers do not
make mistakes. Computers are made to deal with perfect information and they do that well.
Unfortunately, the world is not perfect; incomplete, noisy or misleading data is present in
every aspect of our lives. That is why computers are able to make decisions usually only
with heavy support from human operators. The main research field, which tries to overcome
this serious drawback is the domain of Expert Systems (ES). The crucial question for every
system is how to represent and acquire knowledge. This is the question we try to answer in
this article.
2. History
We will not try to introduce the general concept of expert systems, we only point interested readers to [Girratano, 1998], which provides an excellent overview. In further we use
the following definition for expert system.
Expert System (Knowledge-based system, ES) is a reasoning system that performs comparable to or better than a human expert within a specified domain. We further augment
the term ES to represents also traditional AI agents that are more knowledge than computationally intensive.
Expert systems have experienced a tremendous success in the last two decades.For example, one of the most successful companies developing ES is Gensym Corporation - creator
of G2 system. By 1995 all of the 10 biggest world corporations were using the G2 expert
system in some point of their operation. [Jozef Kelemen, 1996]. In the next section we
will chronologically describe the characteristics of expert systems in the various eras. This
overview is mostly based upon [Girratano, 1998] and [Eric J. Horvitz, 1988].
2.1. Beginnings. Probably the first expert systems were built in early 1960s. One of the
most famous ones was the General Problem Solver build by Newell and Simon in 1961. It
was built to imitate human problem solving protocols. The system had a goal, which could
be achieved by achieving a series of subgoals. These were chosen by an arbitrary heuristic
function. In fact many of the subsequent expert systems were based on human cognition
processes.
2.2. Probabilistic Systems. The first probabilistic ES were conceived for diagnostic problems. This was a very early approach influenced by the very developed decision theory, which
deals with probability and utility theory. This approach created some concerns among the
opponents, because of the exponential inference complexity, conditional independence among
the variables was assumed in many cases. Despite the questionable approach, these systems
actually outperformed human experts in some domains. For example, the system of de
Dombal and his colleagues averaged over 90% correct diagnoses of acute abdominal pain,
where expert physicians were averaging 65%-80% correct.
These results were very surprising, because the models not only used simplifying assumptions, but also were given only a fraction of knowledge available to human experts.
2.3. Production Systems. Unfortunately, the enthusiasm for the ES based strictly on
probability has faded. The reasons were various. Some of them were reasonable, others
were based on poor user interface of the early systems. Main reason was the computational
explosion of systems using more complex representations, which designed these systems to
very small and thus impractical domains.
In the early 1970s, new and promising approach emerged. Since the exact methods
are usually intractable, the condition of precision was relaxed to gain possibility of reasoning with extremely large expert knowledge. Simple if-then rules emerged as a very useful
tool of expressing the human cognitive processes. Research found that it were desirable to
have graded measure of uncertainty, which could be used to compare the strength of evidence and hypotheses. The most famous systems of were Mycin and Prospector. The same
trends could be observed in other domains, such as combinatorial optimization problems
[Anant Singh Jain, 1998, Jones and Rabelo, 1998].
2.4. Modern Systems. Expert systems are a reality today. They are generating huge revenues and make companies successful [Jozef Kelemen, 1996]. Perception of most of audience
is in many cases the strict opposite. Since the diagnoses are still made almost strictly by
human doctors this fact is taken to mean that expert systems have so far failed to fulfill
their promise. This in not entirely accurate. Since mid 1980 a shift from the medical domain became apparent. Money is usually sparse in ordinary medical domain what leads
to decreased interest of industry (cite: ???). Moreover there are a lot of unanswered ethic
question and doubts. As a result expert systems are almost invisible in medical domains.
Business is taking the greatest advantage of expert systems and mostly financial, production
and research companies respectively.
Despite the promising outlook in the beginning, rule-based ES achieved only limited
success. Rule engines are still somewhat popular for mostly simple domains. Some of the
most popular are:
(1) Clips - NASA developed simple rule engine with no uncertainty measure.
(2) Jess - A CLIPS descendant coded entirely in Java
(3) ABLE - IBMs Agent Building and Learning Environment. Besides the standard
rule-engine with fuzzy measures offers possibility to use neural networks, decision
trees, Bayes networks and other common decision and learning techniques.
(4) iLog Rules - A commercial shell for creating business rules.
Rule systems suffer from both lack of expressivity and increased performance of modern
computer systems. This leads to their continually decreasing popularity. Other possibilities
to represent and acquire knowledge will be presented in following sections.
Most of expert systems are built either in-house or by specialized contractors. Since each
domain is very different, most companies specialize only on a part of market. General expert
system shells are very sparse and only developed by very large companies. Probably the
most successful general expert system producer is Gensym, the producer of world-famous G2
ES shell. Unfortunately for research, their technologies are very well guarded. On the other
hand, most of the simpler systems are built using standard tools, such as Prolog, Lisp or even
Visual Basic, Java, .NET. This is also erasing the distinctions between standard computer
systems and expert systems. The main distinction is that expert systems are usually more
free in managing the knowledge and making decision based on it. While standard computers
systems are programmed to make decision, expert systems are programmed how to make
decision from knowledge. This leads to separation of control and knowledge part of the
system.
3. Knowledge
As with the most of basic terms, it is hard to define what precisely knowledge means.
In the scope of this work, we understand knowledge as ability to reason about a consistent
environment, forecast, hypothesize in it. Consistent environment means an environment
that has rules that are time-invariant. Rules may be either first-order or of any other order
- enabling us to have rules about rules.
Some of the modern trends in Artificial Intelligence, such as Multi-agent systems, criticize
the failure of traditional artificial intelligence to achieve a break-through results during its
half-century of research. They claim that the world is the best representation of itself, and
as an example of such a problem they present the famous Subsumption Architecture. They
also reason that the artificial knowledge representation schemes are deemed to failure.
We think that using real world for representation of itself may be a very precise model,
but too complex to be practical. Usually the richer the representation is the harder the
reasoning with this representation is. This is why we model human language as context-free
grammars and why we use linear or polynomial functions to approximate functions that are
much more complex. This is also the reason why we use simple functions as heuristics to
direct us in the search.
We look at our knowledge only as on heuristics that does not necessarily exactly measures
the world, but instead offers an approximation. Obviously there is hardly a representation
that is superior to others on all domains, because each domain required a different quality,
structure and conditions on the representation scheme.
A basic concept in the ES domain is term knowledge engineering. This is a process
of encoding the human knowledge into a formalized framework. Many of the knowledge
representation schemes were developed mainly to ease development and maintenance of
extensive knowledge bases. We will deal with these issues only marginally, and focus mostly
on the expressive power, stability, scalability and inference performance.
Generally, knowledge can be divided into two following main categories. Procedural knowledge has been the one that has been exploited the most by the computer systems. Every
program represents a procedural knowledge how to achieve the desired result. Despite it is
enough for most of the simple applications, it is not satisfactory when dealing with partly
or fully undefined situations. This is the reason why computers crash and cannot write
an essay for you. Relational knowledge describes the relationship among the events in the
world. Some sources define a third kind - hierarchical knowledge, but we consider it only as
a subset of relational knowledge framework. In the following text we will deal mainly with
the relational knowledge representations.
Further, we can distinguish knowledge into two quantitatively different categories, that
is shallow knowledge and deep knowledge. The main distinction between these two is the
precision of the representation. Shallow knowledge concentrates on relationships among
perceivable attributes. Systems with shallow knowledge then concentrate on gathering as
much facts as possible and inferring the result from these. Deep knowledge involves not only
observable attributes but also hidden attributes resulting from modeling the domain. These
systems concentrate not only on gathering as much data as possible, but also they create
and evaluate number of hypotheses. Such systems were in part almost impossible because
of both lack of supporting theory and algorithms and also because of lack of computational
power. This has been changing recently.
There have been many successful expert systems, based upon the shallow knowledge
about the domain. Most of the early based statistical ES as well as most of the rule based
ES do not have a real model of the world, or the model is very simplified. On the other hand,
we think that deep knowledge might be very helpful in construction of flexible reasoning ES.
Expert systems have usually additional requirements on both knowledge and conclusions
they provide. Since they are built to save money, it is imperative that knowledge can
be managed with as little effort as possible. Also it is in many cases essential that they
can support their conclusion by arguments. People making decision based on decisions
of computer systems are more comfortable if they can understand the conclusions, check
them or find error in knowledge base. Therefore we prefer representations that are modular,
predictable and mainly contain symbolic information. These requirements make for example
neural networks infeasible.
In the next two sections we summarize the existing representation schemes, we describe
their applicability and drawbacks. After that we try to point out how this knowledge could
be automatically acquired.
4. Structural Knowledge
The most common representation of world as a consistent environment is as a set of
states and state transformation rules in time. Since enumeration of all possible states and
their transformations is infeasible even in small environments, we need a different method
to represent which states are possible.
A common method helping to decrease problem complexity is decomposition. Fortunately most practical environments may be decomposed into a dynamic set of variables,
each representing a fraction of the state. This not only dramatically decreases the complexity of representation, but also enables generalization and identification of a state based on a
partial observation. Further, to process the variables, it is very useful to define constraints
binding values of the possible states and their transformations. Knowledge Base is usually
this decomposition together with constraints binding the variables. We will also use this
definition. Interpretation is generally a set of variables and their values assigned to them.
Then we say that an interpretation is a model or a possible world when it fulfills all required
constraints of the knowledge base.
In the next part we analyze the possible methods. We start with basic methods employed
by the first expert systems. Since most of these methods have been applied in tens of
commercial and research applications, there are various advanced extensions and heuristic
inference methods. We do not try to describe all these new methods, we only try to describe
the principles and point out the differences.
These methods that enable representations of world as states fulfilling specified constraints. It is worth adding that all these representation assume strictly consistent environments - transformation rules are exactly same for the state at any instant of time. This
is a valid assumption for many domains, but there are strong reasons why this assumption
should be weakened. We discuss these reasons in section 5.
4.1. Semantic Networks. A semantic network is a classical representation of hierarchical
relational knowledge in AI1. It is basically a graph where the nodes are labeled by atomic
formulas, and arcs represent relations between them. The nodes of this graph then represent
entities and classes of entities. These classes then may be hierarchically ordered to represent
the knowledge. This leads to two basic relations between the node, that is: subclass, entity
of. A simple example of the network is in figure ????.
They were first developed as a method of representing human knowledge. In fact every
semantic network can be represented by the language of first order logic (see below). Even
better, semantic nets can be directly translated to logic programs. Therefore, we will not deal
much further with this representation until the sections devised for logic and non-monotonic
logic. Most of the principles of semantic networks are also employed in rule-based systems.
Though the basic feature of semantic networks that are employed in many very large ES.
This is hierarchical classification of knowledge. For people, this is a very natural reasoning
process, but not as much for computers. Hierarchical classification leads to enhanced generalization, information reduction and also can dramatically increase performance. On the
other hand it can lead to reduced precision. For a classic example of this feature consider
having an expert systems classifying animal species. For this it is much more convenient to
define that dogs bark instead of defining for each species of dog that barks. Obviously it is
computationally much simpler to consider a hypothesis dog instead of considering a large
number of dog species. Also adding a new dog to the knowledge base is easier when all
common features are specified for class of dogs and not for each entity separately.
4.2. Frames. The following description of frames is based open many sources, but mainly
[Vladimir Marik, 1993]. Frames have been the first attempt to mimic human reasoning
and knowledge hierarchy representation. They were first proposed by Minsky and found
tremendous application in early expert systems. Frames are grouping of slots that represent
semantically close knowledge. Despite their wide-spread application, their background is
mostly technically based, and more simplify the development for humans then offer a solid
base for sound inference. The principle of frames has been further enhanced and refined in
Object Oriented Programming paradigm and Multi-Agent System.
4.3. Rules. Rules represent a very human friendly knowledge representation. They are
composed of simple if-then clauses that are activated usually according to a custom heuristic
function. One of the often cited rule-based systems is their modularity, simplicity and good
performance [Girratano, 1998].
1
Artificial Intelligence
Rule-engines are generally not suitable for modeling complex world relationships, and
creation world models, but instead can used to represent procedural or shallow knowledge. Reasoning in a partially observable domain requires a measure of certainty in the
proposition. There have been many more or less successful approaches to represent the
uncertainty in a rule-based framework. The early trials and also the recent research have
shown that once the uncertainty measures are introduced, the system is no longer modular
[Eric J. Horvitz, 1988] and in case of Mycin, the ad-hoc certainty factors could lead to disastrous results [Stuart Russel, 2003]. Some of the most successful measures were fuzzy sets
and graded logic values. Both offer similar approach to the reasoning, but while graded logic
offers discrete values of membership, in fuzzy logics the uncertainty is usually measured by
values from infinite set of real numbers. For more information refer to the section dedicated
to fuzzy logic below.
This approach proved very efficient in simple domains, also due to very fast Rete matching
algorithm, but its use in complex systems is at least very questionable [Stuart Russel, 2003].We
point readers interested in this knowledge representation model to [Girratano, 1998], and
for an example of more advanced approach [Poli and Brayshaw, 1995].
4.4. Logic. The basic notion of logic has been known already to old Greeks. It is a systems that defines a framework for representing relational knowledge and reasoning about
it. Unlike rule systems, logics is a very suitable tool for representing real world models. It
can represent very complex relationships among objects, it can represent hierarchies, and
it is very extensible. The main problem of reasoning with logic is that inference is usually
an NP-complete problem, and there have not been many successful methods of expressing
heuristic shallow knowledge using logics.
The reasoning is performed according to strictly defined rules of inference. We will first
introduce the propositional logic, and then move on to first-order logic and also we will
introduce modal logics.
4.4.1. Propositional Logic. The syntax of propositional knowledge defines the allowed sentences. The atomic sentences, indivisible syntactic elements, consist of single propositional
elements. From these, complex sentences are composed using unary operator and binary
operators , , . Please note that the logic can be defined also by only a subset of these
operators. For detailed description of semantics, please see any of books dealing with logic
representation, for example [Stuart Russel, 2003], [Vladimir Marik, 1993].
Further important concepts are following.
(1)
(2)
(3)
(4)
model is an assignment of values to every propositional element

entailment |= if is true in all models where is true
valid sentence is true in all models
logically equivalent if they are true in the same set of models (their logical value is
same in all models)
(5) satisfiable sentence is true in at least one model
The aim of inference is to determine whether KB |= for some sentence , where KB is
the knowledge available to the agent. We say that the entailment procedure is sound when
everything it infers from KB is entailed and complete when everything that is entailed is
inferred. The main inference rules are Modus Ponens and And-Elimination.
Resolution is a single inference rule that yield both sound an complete inference. It is
usually applied to sentences in Conjunctive Normal Form (CNF) - conjunction of disjunctions of literals2. It can be shown that every logical sentence can be expressed in CNF with
disjunctions.
Theorem 4.1. Every logical sentence can be expressed in the CNF form.
2Literal is an atom or a negated atom.
Proof. Let have a sentence and lets try to express its negation in Disjunctive Normal
Form (DNF). DNF is disjunction of conjunctions and therefore for each model, where
is true, we can write a single conjunction, which is true in this model only. Therefore
6= = (1 2 . . .) . . . (k n ldots), where i are propositional literals3. Then
= (6= 1 2 . . .) . . . (k n ldots) what is CNF.
Now, resolution is a simple procedure, where we join two disjunctive clauses = (i 1 )
and = ( alphai 2 ) where i is a propositional atom and i is a disjunctive clause.
The result is (1 2 ). We can use this method to determine entailment because |= is
equivalent to unsatisfiability of . Therefore if entails we get an empty clause
by resolving it, otherwise there are some models in which is true and is not.
Theorem 4.2. Resolution is both sound and complete.
Complexity. While resolution offers a good method of theorem proof, it works in O(2n ),
where n is the number of clauses. The following theorem says, that we hardly expect an
algorithm, which could be asymptotically faster.
Theorem 4.3. Inference in propositional logic is NP-complete from the number of clauses
in the knowledge base.
Some of the algorithm, which use heuristic or probabilistic principles to decide entailment
follow
(1) DPLL Davis-Putnam Algorithm - It is essentially a heuristic depth first enumeration
of possible models.
(2) WalkSAT is a probabilistic Monte Carlo algorithm, which uses the min-conflicts
CSP4 heuristics to find a model satisfying the formula. In most cases it is more than
twice as fast as DPLL.
4.4.2. First Order Predicate Logic. The predicate logic is enriched by two additional quantifiers for-all - and exists - , where exists quantifier may be defined also as (6= ).
Also, we allow atoms to be only predicates - what is a relation on the universe of possible
values. Unlike in propositional logic, we allow functions that map several terms into another
term. The logic without functions is called datalog.
The concept of first order-logic and inference is too complex and well-known that we only
point interested readers to any classical literature. We only mention that inference in first
order logic is only semidecidable, algorithms exists that say yes to every entailed sentence,
but never say that a sentence is not entailed.
The key concepts are:
(1) Lifting is a generalized Modus Ponens.
(2) Unification is process of assigning propositional symbols and functions in place of
quantified expressions in two sentences. It finds a unique most general unifier, which
always exists.
The concepts of resolution, Horn clauses (define clauses), forward and backward chaining
are defined in the same way as in the propositional logic. Only principal difference is the
need of unification between the clauses. Also, resolution is, based on the mentioned semidecidability, is only refutation-complete.
4.4.3. Modal Logic. Modal logic extends propositional logic of possibility and necessity .
It has been invented to represent not only the model of the wold, but also agents beliefs
about it. It was invented be Lewis in 1913 in an attempt to avoid paradoxes of implication,
when a false proposition implies anything.
There are many families of the modal logic. Many of them are based upon weak logic
called K (after Saul Kripke), extension of propositional logic. The symbols of K logic include
for not, for if-then, and for necessity. The possibility can be expressed by necessity in
3Literal is an atom or a negated atom.
4Constraint Satisfaction Problem
the similar fashion as exists quantifier. is defined by for-all (). A good reference
for the modal logics framework is Modal Logic.
In propositional logic, a valuation of the atomic sentences (or row of a truth table) assigns
a truth value (T or F) to each propositional variable p. Then the truth values of the complex
sentences is calculated with truth tables. In modal semantics, a set W of possible worlds
is introduced. A valuation then gives a truth value to each propositional variable for each
of the possible worlds in W. This means that value assigned to p for world w may differ
from the value assigned to p for another world w. The possible worlds are defined by an
accessibility relation which conforms to certain rules. This is beyond the scope of this article.
Some of the extensions of modal logics are:
(1) Temporal Logic Modal logic, where the operators are understood as time validity
specification, e.g. it is possible in some time moments and also that its is necessary
for some time moments.
(2) Deontic Logic defines obligatory and permissible operators.
(3) Conditional Logic Tries to avoid the above mentioned paradoxes of implication, like
for example a (a b)
4.4.4. Non-monotonic Logic. Non-monotonic logic is result of synthesis of cognitive sciences
and traditional logic representation. One of the results of cognitive sciences, as well as
daily life is that people tend to come conclusions that are not valid in all models of their
knowledge. This leads to behavior in which additional information can lead to rejection of
previous conclusion thus non-monotonic.
There have been many disputes how the cognitive models are built and used. As a consequence a number of possible representations have been proposed. Some of these are Default
Logics and Closed/Open World Assumption in logic systems, and Answer Set Programming.
Answer set programming is a very interesting and dynamically growing representation similar to constraint programming mentioned in ??subsection 4.6. There are two main systems
for DLV and Smodels. There are no widely-known Expert Systems in this paradigm, but
this seem promising, thought yet immature field.
4.5. Logic Programming. Since the inference in the logic systems is in most cases intractable, researchers sought ways to improve the reasoning process. The general approach
is to restrict the expressivity of the knowledge representation in order to boost the performance. In this section we will present such approach to both propositional and first order
logic.
As mentioned above, entailment is an NP-complete problem even in the logic as weak as
propositional is. One of the ways to restrict the expressive power to achieve much better
performance is to permit only Horn clauses. Horn clause is a disjunction of literals, of
which at most one is positive. With these rules we are able to represent even very complex
structures, with very fast inference.
Definite logic program is a set of Horn clauses5, among which conjunction is assumed.
Main methods of inference in these programs is Forward chaining and Backward chaining,
which moth have the same asymptotic computational complexity, but differ in their approach to data and query. Their principles are beyond the scope of this article, a good
introduction may be found in [Stuart Russel, 2003], and also a good source of information
is [Ulf Nilsson, 1990].
Application of logic programs in ES domain is probably even wider than that of ruleengines. Logic programs can be used to express even very complex structures. Like, rulebased systems they are very modular, and stable. Their main drawback is their lack of
uncertainty representation. Though, there are ways to deal with this problem, none of them
proved to be successful in general sense.
5From first-order predicative logic.
4.6. Constraint Programming. Constraint programming is a paradigm to solve constraint satisfaction and optimization problems just by their specification. Principally, a
programmer only needs to know how a solution for his problem looks like. Specification
of the problem is accomplished by a set of constraints over a set of variables. Both of
these sets may be dynamic and unbounded. Then the programming environment uses standard methods for solving constraint satisfaction problems. These usually employ constraint
propagation and value distribution and branch and bound.
There is a number of different domain and constraint specifications. The most widely used
domain is a finite set of integers, but also real numbers and tree structures are common.
The most popular approach to specify the constraints is a logic programming language that
is very similar to prolog. A common feature is possibility to specify meta-constraints - i.e.
constraints on constraints. This can be also very elegantly addressed by standard logic
programming. This combination is known as Constraint Logic Programming - CLP(D),
where D stands for the domain of the variables.
Applicability of pure constraint programming to ES domain is somewhat limited due
to the large scale of these systems. Thought there have been some efforts in using this
paradigm in the domain of expert systems, most notably in [Bartak, 1999]. The employed
strategy implements the Hierarchical Constraint Logic Programming(HCLP) discussed in
[Wilson and Borning, 1993]. HCLP uses a partial preference ordering on the constraints
identifying a preference of the constraint being satisfied.
4.7. Mathematical Programming. Mathematical modeling is a subset of constraint programming model, in which domains are real numbers and constraints are specified as functions. The standard model is:
min g0 (x)
gi (x) 0; i = 1, . . . , m
x X <n
There are following common variations of mathematical programming:
(1) Convex Programming if both g0 and gi ; i function are convex and X is a convex
set6.
(2) Linear Programming convex programming with all constraints as functions are linear.
(3) Integer Linear Programming convex programming with linear constraints and X
n
(4) Mixed Linear Programming is a combination of linear and integer programming.
Linear programming as the simplest presented model is also the one most widely used. The
first method that addressed this type of problems was simplex method. Though the method
is in worst case exponential, it has been applied with many successes. The more recent
polynomial method belong to a set of interior-point methods [Karmarkar, 1984]. Complexities for the other problems are much less optimistic. For instance, integers programming
problem has been proved to be NP complete7. Prominent method to solve the problem is
by Lagrangian relaxation.
Mathematical programming is very often employed in economical and optimization systems, but it is questionable whether these are expert systems. Since most of the economical
knowledge can be written in form of mathematical equations, it is a field where standard
logic usually fails.
6Informally a convex set is a set where or interior point can be connected with a straight line that is
whole inside of the set.

7Can be shown by SAT problem transformation.
10
5. Uncertain Knowledge
Most of the models mentioned in section 4 were not able to deal with the uncertainty. This
section will describe common methods for expressing uncertainty, and also their applications
with the knowledge representation languages.
As mentioned in section 3 we assume that environment is consistent. Yet there are many
reasons that lead us to represent uncertainty. Most common of these reasons are probably
the need to simplify the world representation, to represent noisy or biased measures of the
world, and to represent weight of knowledge in KB.
Three different models of representing uncertainty... [Halpern, 89]
During the study of general uncertainty measures, various requirements for the measure
were introduced. The ones we present here are from [Eric J. Horvitz, 1988].
(1) Clarity Propositions should be well defined.
(2) Scalar continuity A single real number is both necessary and sufficient for representing a degree of belief in a proposition.
(3) Completeness A degree of belief can be assigned to any well-defined proposition.
(4) Context dependency The belief assigned to a proposition can depend on the belief
in other propositions.
(5) Hypothetical conditioning There exists some function that allows the belief in a conjunction of propositions, B(X Y ), to be calculated from the belief in one proposition
and the belief in the other proposition given that the first proposition is true. That
is, B(X Y ) = f(B(X|Y ), B(Y ))
(6) Complementarity The belief in the negation of a proposition is a monotonically
decreasing function of the belief in the proposition itself.
(7) Consistency There will be equal belief in propositions that are logically equivalent.
These principles well model general requirements, but many measured defy some of them.
We will note in each of these measures which propositions are satisfied and which are not.
We will further concentrate mainly on methods that are interesting but generally not very
well known.
5.1. Probability. The notion of probability has been first introduced no later than in the
17th century in the works of Pascal, Bernoulli and Fermat [Eric J. Horvitz, 1988]. It is has
been ever since the most widely used and the most thoroughly developed system to represent
uncertain beliefs.
Ramsey and De Finetti argued in the famous Dutch book argument that any uncertainty
measure must fulfill the following properties. The argument is based upon [Freedman, 2003].
(1) P(E) h0, 1i
(2) If event E is certain then P(E) = 1
(3) If E = E1 E2 and E1 and E2 are exclusive events then P(E) = P(E1 ) + P(E2 )
De Finetti has shown that agent which did not adhere to these rules could be forced into an
all negative gain situation. This proof is based on certainty as willingness to bet money at
odds8 p1 : 1. Moreover .
The probability is defined as an additive measure9 on a -algebra. Some of the more
recent methods tried do define probability as a non-additive measure. We will describe
these in later sections.
The probability model assumes that the world is consistent. Usually, it is used to represent
uncertainty, not imprecision. There has been an ongoing dispute what actually probabilities
mean. Some researchers agree that probability is a fundamental property of the world, some
prefer complete Bayesian approach and assume that probabilities are subjective, and have
represent only view of the agent.
8Meaning that agent would be willing to pay $ 1 to get $ 1/p 1.
9Additive means that for exclusive sets E , E holds P(E E ) = P(E ) + P(E )
1
2
1
2
1
2
11
Unfortunately, probability does not appear to be a fundamentally good concept for uncertain reasoning, as it is for statistics. The reason might be in its inability to distinguish
between the lack of knowledge, and conflicting knowledge. For example, if we know nothing
about E we can only assume the prior probability, what might not be a very good approach in many cases. Thought, probability is more suitable to represent uncertainty and
inconsistency, it might be possible to use it for representation of imprecision. There is also
no standard tool to meta-reason, meaning to reason about the reasoning and uncertainties
themselves.
Probability has been very widely applied in the ES domain. We discuss some of the
combinations that proved to be most successful.
5.2. Bayes Networks and Influence Diagrams. Bayes Network is a graphical model
that depicts conditional dependencies among the concerned random variables, thus Bayesian.
It takes advantage of a powerful property of probability theory - conditional probability can
be inversed through Bayes rule. This means that we can calculate Pr[a|b] = Pr[b|a]
Pr[a]/ Pr[b] knowing only atomic probabilities. Instead of representing probability of each
variable as dependent on all other, we choose only ones that are the most influential.
Inference on the model is an NP complete or harder problem [Park and Darwiche, 2004],
depending on the questions asked. On the there hand there have been a number of algorithms
applicable to structures fulfilling special criteria, i.e. poly-trees, or statistical sampling algorithms such as Monte Carlo Markov Chain. A very nice introduction to Bayesian networks
can be found in [Heckerman, 1996, Stuart Russel, 2003].
This is a model that has been put in use by the first expert systems and has been very
thoroughly studied ever since. There have been many application of bayes networks to
expert systems, or knowledge intensive agents. Applications range from sorting spam out of
incoming e-mail correspondence to deciding what design of a car would be more beneficial.
Compared to the traditional rule-based expert systems, this approach is much more precise
and offers a well founded model for dealing with uncertain data. Since the model is based
on traditional probability measure, dutch book is inapplicable and the standard decision
theory methods can be applied.
5.3. First Order Probability Languages. While bayes networks have proved very beneficial to the whole AI community, they suffer from the same problem as propositional logic.
They can only express finite knowledge, what leads to very static problem representations,
and clumsy knowledge bases. Imagine the following example: The colt John has been born
recently on a stud farm. John suffers from a life threatening hereditary carried by a recessive
gene. The disease is so serious that John is displaced instantly, and the stud farm wants
the gene out of production, his parents are taken out of breeding.What are the probabilities
for the remaining horses to be carriers of the unwanted gene? Only possible representation
through bayes networks is to create a random variable for each sheep and its cancer probability. Obviously this is too complex and inflexible. Some research has therefore been applied
to extending the standard bayes model of the expressivity of first-order representations.
Some of these representations are Bayesian Logic Programs [Kersting and Raedt, 2000] and
Stochastic Logic Programs [Muggleton, 1995], Probabilistic Logic Programs. Neither of the
presented approaches has proved as successful as ordinary bayes networks.
5.4. Stochastic Programming.
5.5. Dempster Shafer Theory. This theory tries to address the main problem of probability theory, required prior beliefs. It has been pioneered by Dempster, and later refined
by Shafer. Sometimes it is called Theory of Evidence.
We assume to have a finite set of mutually exclusive and exhaustive elements called
frame of discourse and symbolized by . We also define define mass function m. Its basic
properties are:
(5.1)
m() = 0
12
m(A) = 1
Ain2
This setup allows us to assign a quasi-probability to each set from the potential set of the
environment. This is different then in the probability theory, where we assign probabilities
only to atomic events. Then we need to define one more function to define measure on any
subset of the environment.
Definition 5.1. Let m be the mass function defined over frame . Then we define the
belief function Bel as
X
Bel(A) =
m(B); A
BA
Unlike in probability, where the measure is single scalar constant. This theory introduces another measure, plausibility, which measures the weight of evidence, which does not
contradict the given event. This concept is also used possibility theory, which is discussed
below.
Definition 5.2. Let m be the mass function defined over the frame . Then we define the
plausibility function Pl as
X
m(B); A
Pl(A) =
AB6=
The important, and also questionable, part of the theory is the combination of evidence.
This is done by a Dempster rule. In the theory of probability, this is usually the combination
of prior beliefs, and evidence.
P
Definition 5.3. Assume that m1 and m2 are mass functions, such that Ai Bi 6= m1 (A)
m2 (B). The combination m1 m2 (A) of these mass functions according to the Dempster
Rule is, if A 6=
P
A B =A m1 (Ai ) m2 (Bj )
m1 m2 (A) = P i j
Ai Bj 6= m1 (Ai ) m2 (Bj )
and we set m1 m2 () = .
There are some applications of this theory, but very few of them practically used. Its
main drawback is computational complexity of combining the beliefs.
5.5.1. Transferable Belief Model. Transferable Belief Model is one of the most researched
model based on the Dempster-Shafer theory (DST). The research is mostly lead by professor
Philippe Smets. It introduces an interesting idea of having two levels of representation.
The first is credential, it is used to model ones belief. This level is built upon a slightly
modified DST. As mentioned in section 5.1, if measure used for decision making does not
conform to the probability laws, then agent can be forced into all lose position. Therefore,
since DST does not conform this laws, it can not be directly used to make decision. For
this purpose a new pignistic level is introduce, with a regular probability distribution over
evidence. The transformation from credential to pignistic level of representation is based
upon the generalized insufficient reason principle. For more details please see the thorough
description in [Smets et al., ].
5.6. Fuzzy Logic. An alternative approach to representing uncertain or indeterministic
knowledge was proposed by Lotfi Zadeh in 1965. It is an attempt to enrich the traditional 2
valued logic by graded truth values. Instead of being able to say that a statement is either
true of false, we can say that it is true to a certain degree. We use the definitions from
[Vladimir Marik, 2003].
Fuzzy logic is defined in the same manner as first order logic. The first notable difference
is the definition of predicates. Predicate in fuzzy logic are not ordinary relations, but instead
functions mapping terms from the universe of possible values to a real number from h0, 1i.
The second is that logical operators are defined as function f : h0, 1i h0, 1i h0, i . Finally,
13
we add a default atom

0, which always has truth value 0. Truth value 1 represents something
that is absolutely true, and on the other hand the truth value 0 represents something that
is absolutely false.
The most usual way of defining the logical operators is by using a t-norm and its residuum.
T-norm or triangular norm is a function ? which obeys the following properties.
x?y = y?x
x ? (y ? z) = (x ? y) ? z
x x0 y y 0 x ? y x0 ? y 0
1?x = x
Some of the most widely used measures are
x?y
x?y
=
=
min(x, y)
max(0, x + y 1)
We understand this operation to mean &, what is not equal to the standard conjunction
operation . Every t-norm has a residuum which is used as a definition for operation.
We define this as
x y = max{z|x ? z y}
(5.2)
We define values of the other logical operators from this t-norm as following:
= = 0
= &( )
= ) ) (( ) )
For all t-norms we get that = min(, ) and = max(, ). Specially for widely
used Lukasiewicz measure we get 6= = (1 ).
Only thing left to be defined are quantifiers. For these we use following10.
(x)
(x)
inf
sup
interpretations
interpretations
Now, the truth value of a formula is calculated in the same manner as it is in the ordinary
predicate logic.
The fuzzy logics has experienced a tremendous growth in applications the ta least decades.
It is used in anything from transmissions in cars to washing-machines. According to [Elkan, 1993],
there have been no successful ES applications based on fuzzy logic. He argues that under
some circumstances the fuzzy logic can collapsed to the ordinary 2valued logic. The reason
is that if we have two logically equivalent formulas A = neg( ) and B = ( )
their truth value represented by t should be equal. Then for fuzzy logic we get that either
t() = t() or t() = 1 t(). Therefore only possible truth values for and are 0 and
1, what is the same as the ordinary logic. He explains the success in the small devices as
due to very short inference chains and empirically learned coefficients.
5.7. Possibility. Possibility theory is, as well as probability theory a measure theory. It
was developed by Lotfi Zadeh in early 1980s and is based upon fuzzy sets. One of the
motivations for its conception was the fact that the human reasoning does not correspond
to the rules of probability.
The principal difference from the theory of probability is that the function is not defined
as a sum of the measures of subsets, but as their maximum value. The basic measure is
10We deliberately do not explicitly mention the interpretations, and therefore commit to a little crime on
the precise formal definition. This is to focus on the principle and not the technical difficulties. For precise
definitions, please consult references.
14
Possibility Pos, and as in DempsterShafer theory it has a conjugate measure Necessity Nec.
Basic properties of Possibility are:
Pos() = 0
Pos() = 1
Pos(A B) = max(Pos(A), Pos(B))
The for necessity measure we get:
Nec(A) = 1 Pos(A)
The basic advantage of this theory of the theory of probability is possibility to combine
measure of overlapping sets by simply calculating the maximum of the measures. The theory
has been rigidly axiomatized, but there is no Dutch Book argument supporting it. This
is a very active area of research.
We are not aware of any applications of possibility theory. There is a lot of theoretical
research, but specific applications are rarely discussed. Some authors claim that this theory
better represents indeterministic processes, and therefore is more suitable for representing
uncertainty. Since these claims are mostly unsupported, we believe that the only advantage
of possibility theory lies in its computational simplicity. Therefore we consider it to be useful
mostly for heuristic approximations of probabilistic models.
References
[Anant Singh Jain, 1998] Anant Singh Jain, S. M. (1998). A state-of-the-art review of job-shop scheduling
techniques. Technical report.
[Bacchus, 1991] Bacchus, F. (1991). Probabilistic belie logic.
[Bartak, 1999] Bartak, R. (1999). Expert Systems Based On Constraints. PhD thesis.
[Elkan, 1993] Elkan, C. (1993). The paradoxical success of fuzzy logic. IEEE Expert.
[Eric J. Horvitz, 1988] Eric J. Horvitz, J. S. B. (1988). Decision theory in expert systems and artificial
intelligence. International Journal of Approximate Reasoning, (2):247302.
[Freedman, 2003] Freedman, D. A. (2003). Notes on the duch book argument. Unpublished.
[Girratano, 1998] Girratano, R. (1998). Expert Systems, Priciples and Programming. PWS.
[Halpern, 89] Halpern, J. Y. (89). An analysis of first-order logics probability. Artificial Intelligence, 46:311
350.
[Heckerman, 1996] Heckerman, D. (1996). A tutorial on learning with bayesian networks. Technical report,
Microsoft Co.
[Jones and Rabelo, 1998] Jones, A. and Rabelo, L. C. (1998). Survey of job-shop scheduling techniques.
[Jozef Kelemen, 1996] Jozef Kelemen, M. L. (1996). Expertne Systemy pre Prax. SOFA.
[Karmarkar, 1984] Karmarkar, N. (1984). A new polynomial-time algorithm for linear programming. Combinatorica, 4.
[Kass and Raftery, 1994] Kass, R. E. and Raftery, A. E. (1994). Bayes factors. Technical report.
[Kersting and Raedt, 2000] Kersting, K. and Raedt, L. D. (2000). Bayesian logic programs. In Cussens, J.
and Frisch, A., editors, Proceedings of the Work-in-Progress Track at the 10th International Conference
on Inductive Logic Programming, pages 138155.
[Kuipers, 2001] Kuipers, B. (2001). Qualitative simulation.
[McCarthy, 1987] McCarthy, J. (1987). Generality in artificial intelligence. Commun. ACM, 30(12):1030
1035.
[Muggleton, 1995] Muggleton, S. (1995). Stochastic logic programs. In De Raedt, L., editor, Proceedings
of the 5th International Workshop on Inductive Logic Programming, page 29. Department of Computer
Science, Katholieke Universiteit Leuven.
[Park and Darwiche, 2004] Park, J. D. and Darwiche, A. (2004). Complexity results and approximation
strategies for map explanations. Journal of Artificial Intelligence Research, 1:101133.
[Poli and Brayshaw, 1995] Poli, R. and Brayshaw, M. (1995). A hybrid trainable rule-based system. Technical Report CSRP-95-3.
[Smets et al., ] Smets, P., Hsia, Y., Saffiotti, A., Kennes, R., Xu, H., and Umkehrer, E. The transferable
belief model. pages 9198.
[Stuart Russel, 2003] Stuart Russel, P. N. (2003). Artificial Intelligence, Modern Approach. ???
[Ulf Nilsson, 1990] Ulf Nilsson, J. M. (1990). LOGIC, PROGRAMMING AND PROLOG. John Wiley and
Sons Ltd.
[Vladimir Marik, 1993] Vladimir Marik, e. a. (1993). Umela Inteligence 1. ACADEMIA.
[Vladimir Marik, 2003] Vladimir Marik, e. a. (2003). Umela Inteligence 4. ACADEMIA.
15
[Wilson and Borning, 1993] Wilson, M. and Borning, A. (1993). Hierarchical constraint logic programming.
Technical Report TR-93-01-02.
E-mail address: marekpetrik@zoznam.sk

Knowledge Representation

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Knowledge Representation

Загружено:

Авторское право:

Доступные форматы

KNOWLEDGE REPRESENTATION FOR EXPERT SYSTEMS

model is an assignment of values to every propositional element

whole inside of the set.

we add a default atom

Вам также может понравиться