Вы находитесь на странице: 1из 11

Knowledge-Based Systems 16 (2003) 4757

www.elsevier.com/locate/knosys

Rule-based schema evolution in object-oriented databases


Reda Alhajja,*, Faruk Polatb
a

Department of Computer Science, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4
b
Department of Computer Engineering, Middle East Technical University, 06531 Ankara, Turkey
Received 1 August 2000; revised 18 March 2002; accepted 3 April 2002

Abstract
In this paper, a rule-based mechanism for schema evolution in object-oriented databases is presented. We have benefited from having an
object algebra maintaining closure that makes it possible to have the output from a query persistent in the hierarchy. The actual class
hierarchy and the corresponding hierarchy which reflects the relationship between operands and results of queries are utilized. In order to
have query results reflected into the class hierarchy and classes reflected into the operands hierarchy, we also define mappings between the
two hierarchies. As a result, it is possible to maximize reusability in object-oriented databases. The object algebra is utilized to handle basic
schema evolution functions without requiring any special set of built-in functions. The invariants and the conflict resolving rules are
specified. It is also shown how other schema functions are derivable from the basic ones. q 2003 Elsevier Science B.V. All rights reserved.
Keywords: Rule-based systems; Object-oriented data model; Schema evolution; Conflict resolution; Reusability

1. Introduction
Object-oriented systems evolved to satisfy the needs of
application areas where information about the domain is
incomplete or becomes available incrementally or even
highly subject to change. This requires flexibility in
changing the database schema. Consequently, existing
object-oriented data models allow a wide variety of schema
changes [29]. O2 [31] provides two modes for running an
application. Schema changes are allowed in the development mode, but not in the execution mode where the schema
is frozen and changes to it are forbidden. Schema changes in
ORION [6] are based on multiple inheritance while in
GemStone [21] simple inheritance based schema changes
are treated. The approach used in Ref. [27] is based on
keeping versions to maintain a consistent view of the type
hierarchy after a schema change. Monk and Sommerville
[16] described a model for the versioning of classes.
Programs written to access an old version of the schema can
use the new schema version. Lerner and Habermann [14]
discussed a method to schema evolution that keeps versions
to the whole schema rather than individual classes.
Lautemann [13] uses a propagation based algorithm to
handle schema versions in a coherent way. Sjoberg [26]
* Corresponding author.
E-mail addresses: alhajj@cpsc.ucalgary.ca (R. Alhajj), polat@ceng.
metu.edu.tr (F. Polat).

studied how frequent schema evolution occurs in practical


database applications. None of these approaches employs a
query model in implementing schema changes.
A basic consideration in schema evolution is how to
bring existing objects in line with a modified definition of an
existing class. Either all instances of a modified class are
instantaneously changed or they are modified only when
used, otherwise remain unchanged. ORION [6] and
ObServer [27] follow the approach known as screening
where the change is delayed and values are either filtered or
corrected as they are used. Another approach, known as
conversion, is used by GemStone [21] where all instances of
a class are modified in accordance with the change. The first
approach sounds more sensible as there is no need to do
something that will not be used. Furthermore, Nestorov [18,
19] described a method for deriving and maintaining
schema from semistructured data. It has been proven that
general problem of finding an optimal typing is NP-hard but
a heuristic based on clustering is used to find near optimal
solutions.
1.1. The motivation and contributions
After analyzing all existing approaches, we realized that
the ever growing need for schema evolution in databases
can be met by utilizing knowledge base support as there is
no mathematically optimal and efficient solution(s) to this

0950-7051/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved.
PII: S 0 9 5 0 - 7 0 5 1 ( 0 2 ) 0 0 0 5 1 - 5

48

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

problem. A knowledge-based approach can guide this


process as seen in many application areas such as
verification and validation of software systems, medical
diagnosis, engineering design, multi-agent systems, etc. [9,
17,20,22 24,30]. This paper presents the effectiveness of
rule-based schema evolution based on a novel object
algebra.
The work described in this paper is important for several
reasons. First, schema evolution is necessary for objectoriented databases to handle incomplete and incrementally
available data. Second, none of the approaches enumerated
earlier employs a query model to achieve schema evolution;
instead, new constructs have been defined. Simply because
the query models underlying such approaches are mainly
based on the nested relational model and hence do not
consider reusability maximization [1,7,12,15,25]. To the
best of our knowledge, none of the query languages
developed for object-oriented database systems could
handle schema changes.
In this paper, we introduce a model that maximizes
reusability through proper handling of schema evolution
functions using algebra operations. This goal can be
achieved only when an object-oriented query model
enforces closure. Classifying the result of a query so that
it occupies a meaningful position in the hierarchy that does
not violate the rules of inheritance is essential to preserve
the integrity of the database. However, enforcing closure
into a query model does not lead to the total achievement of
the goal. It is only a step in the right direction. The goal is
achieved when reusability is maximized via the proper
handling and placement of query results in the hierarchy.
The actual location of a class in the hierarchy should not be
dependent on whether the class is the result of a query or
else created by a user away from the query model [11,28].
Our first step towards achieving this goal is to classify
and distinguish query operands and results to be Object
Algebra Expressions (OAEs). An OAE is a pair of sets; a set
of objects and a set of message expressions, as detailed in
our previous work on query models [2 5]. The second step
is to distinguish between two hierarchies, a class hierarchy
and an OAEs hierarchy, where queries are evaluated against
the OAEs hierarchy as operands and results are supposed to
be OAEs. Therefore, we look at methods of having results
properly placed in the OAEs hierarchy. Thus, for a class to
participate in a query, it should have some corresponding
OAE present in the OAEs hierarchy.
As a result, we developed a rule-based system, built upon
our query model, to handle basic schema changes in an
effective way. Constraints and rules are specified aiming at
detecting and preventing any inconsistency in the database
due to a schema change. The main motivation is the
recognition that some algebra operations perform the
desired schema changes and that our query model maintains
closure and reusability by handling the proper placement of
the result in the hierarchy [2,3]. Consequently, it is proven

that schema changes could be achieved without having a


stand-alone language developed solely to serve the purpose.
The rest of the paper is organized as follows. The basic
model relevant for this paper is discussed in Section 2. In
Section 3, invariants and conflict resolving rules are
emphasized. The achievement of schema evolution functions using the object algebra is the subject of Section 4.
Section 5 is the conclusions.

2. The object data model


Our approach towards the maximization of reusability is
based on the distinction between two hierarchies, a class
hierarchy and an OAEs hierarchy. In this section, we
elaborate more on this classification by giving the basic
constructs related to each of the two hierarchies.
To start with, any object, with an object identifier oid,
qualifies to be considered in the set of objects of any class c,
denoted by Linstances(c ), if and only if oid understands nothing
more than the behavior defined for objects of class c.
The behavior is defined as a set of functions. Each
function has a header (called signature) and a body, which is
executed by sending the corresponding message to an object
from the class in which the function is defined. So, the
syntax of each message must match the signature of a
corresponding function. The signature consists of a function
name followed by a list of domains. Each domain covers the
set of all possible values for a particular variable used in the
body of a function and expecting a value to be supplied
when the function is to be executed. The body of each
function includes a sequence of statements to be executes on
calling the function with the expected parameters supplied.
Each function f( p1,p2,,pn) is defined in a particular class c
and can be applied on objects of c, as well as on objects of
the subclasses of c. However, f may be redefined in any of
the subclasses c1 of c, and it is not necessary for all copies of
f to have the same number of parameters and the same body.
This way, each class will give priority to its copy of f,
according to the conflict resolution function described in
Section 3.
Any object oid [ Linstances(c1) can be accessed from
within both classes c and c1. Consequently, applying
function f on oid leads to executing the copy of f defined
either in c or in c1, depending on whether oid is accessed
within c or c1, respectively. Function f is expected to modify
and/or utilize some values from the internal state of the
receiver object oid. If function f can be successfully applied
on object oid, then we say oid understands f. This is true if
and only if, function f is defined in class c of object oid, or in
any of the direct or indirect superclasses of c. Otherwise,
applying f on oid will lead to a run-time error (as we enforce
late binding) because f is undefined along the path(s) that
connect c to the root of the class hierarchy.
The behavior defined for objects of a given class consists
of two parts, inherited behavior and locally defined

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

49

Fig. 1. An example class hierarchy

behavior. So, let Lbehavior(c ) denotes the local behavior for


class c. Then, the whole behavior for class c, denoted by
Wbehavior(c ), is recursively defined to include the whole
behavior of its direct superclasses, say [cp1,cp2,,cpn].1
Formally,
Wbehavior c Lbehavior c

n
X

Wbehavior cpi

i1

All objects that understand at least the behavior in


Wbehavior(c ), constitute what is called the set of total
instances of class c, denoted by Winstances(c ). This set is
recursively defined in terms of total instances of the direct
subclasses of class c, say {cb1,cb2,,cbt}:2
Winstances c Linstances c

t
X

Winstances cbi

i1

Of course, by enforcing polymorphism and overriding, an


object oid which qualifies to be in both Winstances(ci) and
Winstances(cj) may respond differently to the same message
depending on whether oid has been accessed from within
Winstances(ci) or Winstances(cj). This is true because, having a
message m in both Wbehavior(ci) and Wbehavior(cj) does not
restrict the function underlying m to have the same
implementation in both ci and cj.
However, for an object to be able to understand and deal
with a certain behavior, it should have some predefined
basic knowledge, i.e. possess a certain state (instance
variables). The set of instance variables which determine the
state for each object in Linstances(c ) is denoted by Wattributes(c ) and defined recursively in terms of instance variables
1
A list notation is used for the superclasses because their order is
important for conflict resolution due to polymorphism and overriding.
2
Conflict resolution is not applicable here because only objects are
concerned; hence the set notation is used.

related to objects in the superclasses of c. Formally,


Wattributes c Lattributes c

n
X

Wattributes cpi

i1

where Lattributes(c ) denotes the additional instance variables


defined locally in c.
So, based on what has been defined so far, a class is
distinguished by some properties and constructs constituting
its definition. Consider a general class to include all such
class definitions. By definition each object in Linstances( )
holds the definition of at least one class c. Such object is
defined to be a tuple of the form (Cp,Cb,Lattributes(Cp,Cb,Lattributes(c ),Lbehavior(c )) where Cp is a list of the
direct superclasses of class c; and Cb is a set of the direct
subclasses of class c.
It is obvious that, the definition of class itself is an
object in class with the values
Cp f;
Cb f,
Lattributes {Cp : listclasses;
Cb :
setclasses; Lattributes c : setattributes;
Lbehavior c : setmessages},
Lbehavior {AppendCp class; position;
DropCp class;
AddCb class;
DropCb class;
AddLattributes attribute; domain;
DropLattributes attribute;
AddLbehavior message; function; DropLbehavior message}
After introducing these basic constructs, let us define the
other requirements for the class hierarchy. A class c is
formally defined as a pair (Pd,Po); where Pd is an identifier;
it is either the OID of an object in class or the identifier of
another existing class, say ci. For the former case, Po refers

50

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

include sequences of messages derived starting with


Wbehavior(c ), as detailed next in Definition 2.1.
Definition 2.1
Message expressions. Given a class c, Mexpression(c ) is
recursively defined by:

Fig. 2. Example objects from the classes given in Fig. 1

to the set of objects in Linstances(c ). For the latter case, on the


other hand, Po refers to a list of predicates the conjunction of
which filters objects from Winstances(ci) to be in Winstances(c ).
This way, we differentiate between two kinds of classes
according to the instantiation of Pd; base classes each of
which directly points to a class definition in class , i.e. its Pd
is an object in class , and brother classes which indirectly
refer to the same object in class via a common base class.
In other words, brother classes have the same class
definition, but their objects are determined based on some
filtering predicates.
To illustrate what has been introduced above, consider
the class hierarchy shown in Fig. 1 and the objects shown in
Fig. 2. Next to each class in Fig. 1, there are three sets which
include Lattributes, Lbehavior and Linstances, respectively.
Consequently, Pd for every class shown in Fig. 1 is defined
to be an object in class . Hence, Po refers to a set of OIDs,
which is explicitly given in the figure. Concerning brother
classes some examples are enumerated next.
MaleStudents is a brother class of student with Pd
student; and Po p sex M:3
StaffBrothersOfSusan is a brother class of staff with Pd
staff ; and
Po p sex M;
p1
p1 [
Winstances person; p1 name Susan; p2 p2 [
Winstances person; {p; p1 } , p2 children:
As it is obvious from the examples and detailed more in
Refs. [2,3,5], brother classes serve to hold the result of a
Selection operation. Their introduction do help in reusability maximization as they share the same class definition
with a certain base class. Any changes to a base class are
dynamically reflected to all its brother classes. This way,
database consistency is guaranteed.
After we have covered the class hierarchy definition and
before moving into the OAEs hierarchy definition, let us
start by introducing the basic constructs of the OAEs
hierarchy. To start with, it has already been mentioned that
an OAE is a paira set of objects and a set of message
expressions. The set of objects includes all objects in
Winstances(c ) for some class c. On the other hand, the set of
message expressions, denoted by Mexpression(c ), is defined to
3
Variable p is used as a pointer to objects in the target class, here
MaleStudents.

Wbehavior(c ) is included in Mexpression(c ), i.e. Wbehavior


(c ) # Mexpression(c )
if x [ M expression(c ) and x returns a value from
Winstances(ci), for some class ci, then (x Wbehavior(ci)) #
4
behavior(ci)) # Mexpression(c )
A message expression, when received by an object,
returns a value from a particular domain, which is the range
of the last message in the message expression. A returned
value is either a stored or a derived value. Related to the
classes given in Fig. 1, the following are example sets of
message expressions derived by invoking Definition 2.1:5
Mexpression person Wbehavior person
[

children Wbehavior person


childrenp Wbehavior person
Mexpression student Mexpression person
[

{year; courses; student-in}


[

coursesMexpression course
[

student-in; Mexpression department


After introducing message expressions, we can elaborate
more on OAEs and their relationships with existing classes.
By definition, for every class in the class hierarchy, there is a
corresponding OAE in the OAEs hierarchy. Consequently,
the relationship between OAEs is deduced directly from the
relationship between the corresponding classes. Based on
that, as we have brother classes, we also have brother OAEs.
In other words, if two classes (OAEs) are brothers, then the
corresponding OAEs (classes) are brothers too. Brother
OAEs share the same set of message expressions, but their
objects are filtered the same way as with brother classes.
Formally, an OAE e is defined as a pair (Em,Eo), where Em is
either a reference to a set of message expressions or a
reference to another OAE ei to share its set of message
expressions (here e and ei are brother OAEs). For the former
case, Eo is a reference to Winstances(c ), where c is the class
4

The notation x Wbehavior(ci) means that x is postfixed with every element


of Wbehavior(ci); the set of messages of class ci. For example, let
Wbehavior ci {m1 ; m2 }; then x Wbehavior ci {x11 ; x21 }; where x11
x m1 and x21 x m2 :
5
Notice that a p is used to indicate zero or more concatenations of a with
itself, i.e. e, a, aa,, while a indicates one or more concatenations of a
with itself, i.e. a, aa, aaa,

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

that corresponds to e. For the latter case, on the other hand,


Eo is a reference to a list of predicates the conjunction of
which filter objects in Winstances(ei) to derive objects in
Winstances(e ). To illustrate this, consider the following OAEs
that correspond to some classes of Fig. 1 together with their
brother classes mentioned earlier in this section.
Person is an OAE with Eo Winstances person and Em
Mexpression person
Student is an OAE with Eo Winstances student and
Em Mexpression student
Staff is an OAE with Eo Winstances staff and Em
Mexpression staff
MaleStudents is brother OAE of Student with Em
Student and Eo p sex M:
StaffBrothersOfSusan is a borther OAE of Staff with
Em staff and Eo p sex M;
p1 p1 [
Winstances person; p1 name Susan; p2 p2 [
Winstances person; {p; p1 } , p2 children
Finally, it worth mentioning that, in our query model
detailed in Refs. [2,3,5], we enforce closure by having
operands and results of queries as OAEs. However, to have
the result from a query persistent, a corresponding class
should be found in the class hierarchy. So, a mapping from
OAEs hierarchy to the class hierarchy is required.
The mapping from the class hierarchy to the OAEs
hierarchy is straight forward and follows directly by
definition of classes and OAEs. Recall that for every class
c, there is a corresponding OAE e such that, Winstances e
Winstances c and Mexpression e Mexpression c; this is the
relevant information related to any OAE e; other information indicated next can be implicitly deduced on need.
However, given that c has a brother class, say ci, leads to
have OAE e having brother OAE ei; here ei is the OAE that
corresponds to class ci. So, let eb1, eb2,, and ebt be the
OAEs that correspond to classes cb1, cb2,, and cbt,
respectively. Having Cb c {cb1 ; cb2 ; ; cbt } means that
eb1, eb2,, and ebt are the only subOAEs of OAE e. Finally,
let ep1, ep2,, and epn be the OAEs that correspond to classes
cp1, cp2,, and cpn, respectively. Given that Cp c
cp1 ; cp2 ; ; cpn  leads to have ep1, ep2,, and epn being the
only superOAEs of OAE e.
Concerning the other mapping, i.e. to find a class c for a
given OAE e, first it is necessary to determine which object
in Class holds the definition of class c or else to find a
brother class, say ci, for class c. For the latter case, it is
enough to specify a list of predicates, the conjunction of
which filters objects of class ci to decide on objects of class
c.
Let us look at the former case of finding an object in class
. By definition, each object of class is a tuple
Cp ; Cb ; Lattributes ; Lbehavior : If ep1, ep2,, and epn are superOAEs for OAE e, then Cp c cp1 ; cp2 ; ; cpn ; i.e. their
corresponding classes constitute the list of superclasses for
class c. On the other hand, if eb1, eb2,, and ebt are subOAEs

51

for OAE e, then Cb c {cb1 ; cb2 ; ; cbt }; i.e. their


corresponding classes constitute the set of subclasses for
class c.
In general, Lbehavior(c ) and Lattributes(c ) are determined as
follows.
Lbehavior c Wbehavior c 2

n
X

Wbehavior cpi

i1

t
\

!
Lbehavior cbi

i1

where Wbehavior(c ) is defined depending on Mexpression(e )


and Lbehavior(c ) is determined through polymorphism and
overriding. Finally, Lattributes(c ) is determined depending on
Lbehavior(c ).
2.1. The object algebra: an overview
In this section we briefly describe the algebraic
operations to be utilized while dealing with schema changes
described in Section 4. Consider each of ei and ej to be either
an OAE or a query expression, which is a sequence of one or
more query operators applied on some operands to produce
an OAE. Further, let Mei be subset of Mexpression(eI).
The Selection operation produces an OAE, which is a
brother of the operand.
Selectei ; p k{olo [ Winstances ei
^ po}; Mexpression ei l
where p is a predicate expression built using object
variables, message expressions and constants; also
quantifiers may be present in a predicate. Given an
object o, p(o ) denotes the evaluation of predicate
expression p by o substituting an object variable in p.
To restrict the accessibility to objects in Winstances(ei), the
Project operation is utilized with the result being a
superOAE of the operand.
Projectei ; Mei kWinstances ei ; Mei l
Only message expressions in Mei can be applied to
objects in the result; thus hides some values from the
accessible objects. On the other hand, the inverse of the
Project operation is to add new elements to the set of
message expressions of an OAE and it is defined at the
end of this section, after introducing the other operations
in terms of which it is represented.
The One-Level-Project operation is defined to
decrease the depth of nesting along the class
composition hierarchy.6
6

Related to an object o, value(o ) and identity(o ) are used to denote the


state and the identity of the object o, respectively. (o1 Mei returns the set of
the results of applying elements of Mei to o1.

52

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

OLprojectei ; Mei k{olo1 [ Winstances ei ^ valueo


o1 Mei }; {xlx1

This formulation of the Iproject operation is valid for the


case of adding some existing methods to a class. However,
for the case of having Mej consisting of new methods the
following is done:

[ Mei with x1 returning a stored value; x1


x2 m ^ lenx1 lenx2 1 ^ x3
[ Mexpression ei ^ x3 x2 x ^ x m x4 }

where m() is a message in the result of Nest(iv, ei, ej) with


the domain of m() being Winstances(ej).

{xlx1

Iprojectei ; {m1 : f1 ; m2 : f2 ; ; mn : fn }

[ Mei with x1 returning a derived value; lenx


1 ^ ;o1 [ Winstances ei o

where mi:fi specifies that message mi is to be used to invoke


the method that implements function fi.

[ Winstances OLprojectei ; Mei such that o1 x1 ox}l


The result of OLproject is in general a direct subOAE of
the root.
The Nest operations is defined to introduce new
relationships.
Nestiv; ei ; ej k{olo1 [ Winstances ei o2
[ Winstances ej ; ^; valueo
valueo1 identityo2 }; Mexpression ei
[
 m Mexpression ej l
where Winstances(ej]) is the domain of message m(), which
handles the value of the new instance variable iv added to
Lattributes(ei). The result of Nest is a subOAE of A.
On the other hand, to drop a present relationship, we
project on all message expressions of the operand except
those related with the pair of the relationship to be
dropped as follows:
Unnestei ; ej Projectei ; Mexpression ei
2 m Mexpression ej
Projectei ; Wbehavior ei 2 {m}
where m() in Lbehavior(ei) corresponds to the instance
variable iv in Lattributes(ei) with domain(iv ) being objects
in ej.
Finally the inverse of the Project operation, Iproject, is
defined in terms of other operations: To add a subset Mej
of Mexpression(ej) to Mexpression(eI), we first Nest ei and ej
then do a OLproject to have all Mexpression(ej) and
Mexpression(ei) together forming
one set; after that we
S
Project on Mexpression ei Mej to get the target set of
message expressions in the result OAE. Formally,
Iprojectei ; ej : Mej
ProjectOLprojectNestiv; ei ; ej ; Mexpression ei
[
[
 m Mexpression ej ; Mexpression ei Mej

3. Constraints of schema evolution and conflicts


resolving rules
In this section, we identify the properties of the class
hierarchy that must be preserved upon schema modifications in order not to leave the database in an
inconsistent state. In addition to the constraints, there
are some other rules that are used to resolve conflicts
due to schema changes. All such constraints and rules
have been formally adapted into the model to be
triggered to detect whether a schema change is allowed
or not. A schema change which leaves the database in
an inconsistent state is ignored. Knowledge encoded for
schema evolution is triggered on event basis, i.e. when
a schema change occurs the relevant constraints and
rules are fired.
3.1. Class lattice invariant
The class hierarchy is defined to be a connected directed
acyclic graph with a single root, the OBJECT. Any schema
change to the class hierarchy should maintain this property.
This is detected by rule R0 where ci is a class to be added to
Cp(cj):
R0: if (there exists at least one path p from ci to OBJECT,
such that path p includes cj) then ignore
3.2. Non-empty class invariant
A class whose instance and class variables and methods
are deleted is considered empty. The class hierarchy should
not contain any empty user-defined classes. If a schema
change results in an empty class, this class is automatically
dropped. Formally this is detected by rule R1:
R1: if Linstances c f ^ Lbehavior c f then for any
class ci such that c in Cp(ci)
Cp ci U Cp ci 2 {c} Cp c

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

53

3.3. Distinct names invariant

3.5. Inheritance priority rule

For any class c, Lbehavior(c ) and Lattributes(c ) are defined


to be sets, and each class must have a distinct unique name.
Name conflicts resulting from the inheritance of attributes
and methods are resolved according to the name conflict
resolving rule and the inheritance priority rule, given next
in this section. Formally, distinct names requirement is
enforced by rules R2, R3 and R4:

The addition of an instance variable (method) to a class


and the deletion of an instance variable (method) from a
class triggers rule R6 (R5). Accordingly, an instance variable
(method) in name conflict with another one may be inherited
on deleting the latter. Further, currently inherited instance
variables (methods) may cease to be considered after the
addition of an instance variable (method) with the same
name to a class with a higher priority.

R2 : if mi [ Lbehavior c ^ mi newmessage then


ignore newmessage
R3: if ivi [ Lattributes c ^ ivi newinstance variable then
ignore newinstance variable
R4:if ci [ schema ^ ci newclass then ignore newclass
3.4. Name conflict resolving rule
A name conflict occurs when a message (instance
variable) in a superclass of a class c has the same name as
a message (instance variable) in class c or in any of the other
superclasses of class c.
On name conflicts, priority is given to class c and priority
decreases while going away from class c towards the
OBJECT class. For classes that have the same priority
according to this rule, i.e. classes found in the same
superclass list with respect to class c, the conflict is resolved
according to the order in the list from head to tail. Formally,
name conflicts due to messages and instance variables are
resolved according to rules R5 and R6, respectively.
Let m and iv be the message and the instance variable in
class c related to which a name conflict is to be detected
between class c and classes in Cp(c ).
R5:
if ci ; cj [ Cp c ^ m [ Lbehavior ci ^ m [ Lbehavior cj
then
if lengthpathc; ci . lengthpathc; cj then mcj is
inherited
elseif lengthpathc; ci , lengthpathc; cj then mci is
inherited
elseif lengthpathc; ci lengthpathc; cj then
if ci precedes cj then mci is inherited
else mcj is inherited
R6:
if ci ; cj [ Cp c ^ iv [ Lattributes ci ^ iv [ Lattributes cj
then
if lengthpathc; ci . lengthpathc; cj then ivcj is
inherited
elseif lengthpathc; ci , lengthpathc; cj then ivci is
inherited
elseif lengthpathc; ci lengthpathc; cj then
if ci precedes cj then ivci is inherited
else ivcj is inherited

3.6. Full inheritance invariant


A class inherits all the instance variables and methods
from its superclass(es). No selection is done but name
conflicts are resolved according to rules R5 and R6.
3.7. Homogeneous domain invariant
The domain for each instance variable should be bound
to a specific class. When a class c is specified as a domain,
all its direct and indirect subclasses may be used in the same
context because Winstances(c ) subsumes those of direct and
indirect subclasses of class c. Formally, this is detected and
resolved by rules R7 and R8:
R7:
if domainiv Winstances c then for every class ci [
Cb c domainiv U Winstances ci is legal
R8:
if domainiv 2Winstances c then for every class ci [
Cb c domainiv U 2Winstances ci is legal

3.8. Class addition/deletion invariant


A new superclass ci of an existing class c should be
appended at the end of Cp(c ). Moreover, on deleting ci from
the hierarchy, classes in Cp(ci) should replace ci as
immediate superclasses of classes in Cb(ci). Classes from
Cp(ci) have lower priority than existing superclasses of
classes in Cb(c ). Formally, this is detected and resolved
according to rules R9 and R10:
R9:
if ci is added to Cp c then Cp c U Cp c {ci }
R10:
if ci is deleted from Cp c then for every class cj [ Cb ci
Cp cj U Cp cj Cp ci

54

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

4. Handling schema evolution functions using the


derived rule-base
In this section, we show how basic schema evolution
functions can be handled using the rule set given above and
the object algebra described in Section 2.1. Although we
assume operations on classes, the actual operations are
executed against the corresponding OAEs to yield OAEs.
This does not restrict our approach as we have already
proven in Section 2 that OAEs hierarchy and class hierarchy
are equivalent.
(1) Add to class c an instance variable iv with domain
objects in class ci.
To achieve this, the following query is executed with
rules R3 and R6 triggered. After rule R3 guarantees name
uniqueness, rule R6 handles name conflicts and inheritance
priority.
c U Nestiv; c; ci
Notice that the result of the Nest operation is considered to
be a subclass of the first operand c. However, the assignment
is used to have this result replacing class c itself. This way,
instance variable iv with domain(iv ) being objects in class ci
is added to Lattributes(c ). This is illustrated in the following
examples.
Example 4.1. Add the attribute brothers to the person class
to show the brothers of each person instance.
Nestbrothers; person%p; Selectperson%p1 ; p1 sex
M ^ p2 [ Winstances person ^ {p; p1 } # p2 children
where % indicates that the variables p and p1 are
bound to and range over objects of the operand, here the
person class. More than one variables may range over
objects of an operand. For example, person%p1%p2
indicates that p1 and p2 range over objects of the person
class.
In this example, the value of brothers instance variable
is assigned the OIDs of those persons who are children of
the same person with p, if any; otherwise it is nil. As a result
of this query, rules R3 and R6 are triggered and the person
class is adjusted into: Lattributes person {name :
string; age : integer; sex : F; M; children : {person};
brothers : {person}} and objects in Winstances( person ) are
extended to include the value of brothers instance
variable:
oid1 kJack; 21; M; f; {oid4 }l
oid2 kMary; 42; F; {oid1 ; oid4 }; fl
oid3 kJohn; 65; M; {oid5 }; fl
oid4 kSusan; 25; F; f; {oid1 }; 3; {oid9 ; oid10 }; oid7 l
oid5 kSmith; 45; M; {oid1 ; oid4 }; f; 50K; oid7 l
oid6 kGeorge; 22; M; f; f; 5; {oid11 }; oid7 ; 15K; oid7 l

Example 4.2. To each person instance, add his parents.


person U Nest(parents,person,person) Due to this
example and when asked for, nil is returned for the
value of the new instance variable parents in every
object found in Winstances( person ). This is because by
definition the Nest operation assigns nil to the value
of the new instance variable unless that value is
explicitly set using the Selection operation in
conjunction with the Nest operation. Thus, to get
the actual value of the parent instance variable for
every person being automatically assigned as it is the
case with brothers instance variable in Example 4.1,
the Selection operation is used together with the Nest
operation as follows:
person U Nestparents; person%p;
Selectperson%p1 %p2 ; p1 children
p2 children ^ p [ p1 children ^ p1
sex p2 sex:
In this formulation, the value of parents instance
variable is automatically set either to a pair of object
OIDs for the couple having the same set of children
including the given person, or to nil in case of not
having any such couple. Notice that p1 sex
p2 sex is included in the predicate expression to
avoid having p1 and p2 taking the same value.
Example 4.3. Assume that both student and staff classes
have an instance variable field specifying the field of
interest. It is required to assign to every student the set
of staff members that he can consult, assuming that a
student can consult stuff members sharing his/her field
of interest.
student U Nestconsult; student%s1 ;
Selectstaff %s2 ; s1 field s2 field ^ s1 s2 :
(2) Drop from Lattributes(c ) the instance variable whose
domain is specified as objects in class ci.
The query that does this is given next. Rules R5 and R6
are triggered to decide on any change in name conflicts and
inheritance priority due to the deletion.
c U Projectc; Wbehavior c 2 {m}
where m() handles within objects in Winstances(c ) the value
of the instance variable to be deleted.
Notice that, only locally defined attributes can be deleted,
inherited attributes may be redefined with a compatible
domain. Due to encapsulation, all references to a deleted
attribute are simply refused by its prior class.

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

Example 4.4. Drop prerequisites


courseUProject(course,Wbehavior(course)2{prerequisites()})
It is an implementation issue to decide on whether the
value of the deleted instance variable is to be physically
dropped from objects in Winstances(course ) or not. Due to
strict encapsulation, in our model the only means which
could be used to access the state of an object are the
corresponding messages. Thus, after dropping the
message that could access the prerequisites of a course,
it will be impossible to access that value inside any of the
objects in Winstances(course ), although it is there. In other
words, we incorporate screening.
(3) Add to the methods of class c1 one or more methods
from class c2 with their corresponding messages being
{m1 ; m2 ; ; mi } :
c1 U Iprojectc1 ; c2 : {m1 ; m2 ; ; mi }
However, for the case of having m1, m2,, and mn being
new methods with corresponding functions f1, f2,, and fn,
respectively, the following formulation is valid:
c1 U Iprojectc1 ; {m1 : f1 ; m2 : f2 ; ; mi : fn}
Here, distinct names invariant should be preserved by
forcing the name of the new methods to be distinct from
existing ones. Moreover, inheritance priority may change
following both the inheritance priority rule and the name
conflict resolving rule. Consequently, rules R2 and R5 are
triggered due to this operation.
Example 4.5. Add to the staff class the method net-salary(i )
which deducts taxes at rate i from the salary.
staff U Iprojectstaff ;
{net-salaryi : f o; i o salary p 1 2 i} The message net-salary(i ) with 0 # i # 1, could be used to
invoke the new method added to the staff class to
implement the function f(o,i ) where o is an object
variable bound to objects in Winstances(staff ), i.e. indicates
the receiver of the message. This method is automatically
implemented; it is out of the scope of this paper.
(4) Drop from class c one or more methods, corresponding messages being {m1 ; m2 ; ; mi }:
c U Projectc; Wbehavior c 2 {m1 ; m2 ; ; mi }
This way, all message expressions that have their first
message being drawn from the set {m1 ; m2 ; ; mi } cease to
be considered in Mexpression(c ). Also, rule R5 is triggered to
activate any change in inheritance priority.
Finally, it is important to indicate that the instance

55

variable deletion schema evolution function is recognized to


be a special case of this schema evolution function.
(5) Add to the class hierarchy a class c with instance
variables iv1, iv2,, ivn and their corresponding domains
being objects in classes c1, c2,, cn, respectively.
A new class may either have OBJECT as a direct
superclass or else other existing classes in its superclass list.
Further, a new class may have zero, one or more subclasses.
The non-empty class constraint and class lattice invariant
should be maintained. According to the distinct names
invariant, the name of the new class should not match with
any of the existing classes. So, rules R4, R5 and R6 are
triggered to guarantee the correctness of this operation.
Thus,
cU
Nestivn ; Nestiv2 ; Nestiv1 ; OBJECT; c1 ; c2 ; ; cn
The OBJECT class is used to have the new class c as a direct
subclass of the root. If class c is desired to be a direct
subclass of an existing class, say cp, OBJECT is replaced by
cp in the above formulation. For the case of having more
than one superclass, schema evolution function #7 is
utilized, given next. All the example classes given in
Section 2 could be defined by utilizing this schema
evolution function. This is illustrated in the following
example where it is shown how the department class is
defined.
Example 4.6. Assume that the class department was not in
Fig. 1. The following query shows how to add department
with instance variables name and head having domains
string and staff, respectively.
department U
Nesthead; Nestname; OBJECT; string; staff
Notice that the department class is a subclass of the root
OBJECT class with:
Lattributes department {name : string; head : staff }
Lbehavior department {name; head}, to handle
instance variables name and head, respectively.
M
department 
Sexpression department Lbehavior
headMexpression staff
Winstances department Linstances
department f;
because of considering department as a new class with
no subclasses.
(6) Drop an existing class c from the hierarchy:
When all the definition and contents of class c are
dropped, then class c should be deleted as an empty class, to
preserve the non-empty class invariant. The immediate
supers of class c replace it in the inheritance mechanism for
not to leave its subclasses dangling and violate the class
lattice invariant. After the deletion, the inheritance priority

56

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757

may change according to the inheritance priority rule and


the name conflict resolving rule. Rules R1, R5 and R6 are
triggered to guarantee the correctness of this operation. So,
c U Projectc; {}
Class c will be automatically dropped according to rule R1
and due to being empty in order to maintain the non-empty
class invariant.
(7) Add a class ci to Cp(c ); ci has instance variables iv1,
iv2,, ivn with corresponding domains being objects in
classes c1, c2,, cn, respectively.
A class added to the superclass list of another existing
class should not violate the class addition/deletion invariant.
Consequently, to guarantee the correctness of this operation,
rules R0, R5, R6 and R9 are triggered. This is handled as:
c U Nestivn ; Nestiv2 ; Nestiv1 ; c; c1 ; c2 ; ; cn ci U
Projectc; {m1 ; m2 ; ; mn }
where{m1 ; m2 ; ; mn } are the messages corresponding
to the instance variables of class ci.
This is true for the case of ci being a new class. However,
if ci is an existing class, the following is done:
S c U OLprojectNestiv; c; ci ; Wbehavior c 
{mWbehavior ci }
where m() is the message added to the result of the Nest to
handle the value of the instance variable whose domain is
objects in class ci.
ci U Projectc; Wbehavior ci
(8) Remove class ci from Cp(c ):
c U Projectc; Wbehavior c 2 Wbehavior ci
Due to this operation and to adjust inheritance priority, rules
R5 and R6 are triggered.
The eight schema evolution functions introduced so far
illustrated how to utilize the object algebra. Other basic
functions can be represented and handled the same way, if
any. In addition, some other schema evolution functions are
derivable in terms of the basic ones as illustrated next.
Change the name of an instance variable of a class is
derivable as 2 followed by 1
Change the domain of an instance variable of a class is
derivable as 2 followed by 1
Change the name of an existing method is derivable as 4
followed by 3
Change the name of a class is derivable as 6 followed by
5

5. Conclusions
The approach emphasized in this paper is based on an
object algebra developed for object-oriented databases.
Having the object algebra properly maintaining closure and
facilitating the proper placement of a query result as a class
in the hierarchy were the basic properties that led to the
study described in this paper. Reusability information
provided by the query model form the basis for further
adjustments of the class hierarchy towards the maximization
of reusability. Furthermore, enforcing multiple inheritance
in a data model helps in increasing reusability by having the
flexibility of adjusting superclasses when possible in a way
to increase the facilities that a class inherits and hence
decrease its locally defined facilities. Thus, it is proved that
a wide variety of schema changes could be handled using
the object algebra without requiring a stand-alone language
to serve the purpose. Different rules and constraints were
derived to detect and resolve conflicts and inconsistencies
due to any schema change. Hence, the approach followed is
different than other approaches described in the literature.

References
[1] S. Abiteboul, A. Bonner, Objects and views, Proceedings of ACMSIGMOD (1991) 238247.
[2] R. Alhajj, F. Polat, View maintenance in object-oriented databases,
Lecture Notes in Computer Science, Springer, New York, 1996,
DEXA Conference, Zurich.
[3] R. Alhajj, F. Polat, Closure maintenance in an object-oriented query
model, Proceedings of ACM CIKM, MD (1994).
[4] R. Alhajj, M.E. Arkun, A query model for object-oriented database
systems, Proceedings of IEEE ICDE, Vienna (1993).
[5] R. Alhajj, F. Polat, Proper handling of query results towards
maximizing reusability in object-oriented databases, Information
Sciences 107 (1998) 247 272.
[6] J. Banerjee, W. Kim, H.-J. Kim, H.F. Korth, Semantics and
implementation of schema evolution in object-oriented databases,
Proceedings of ACM-SIGMOD, San Francisco, CA (1987).
[7] E. Bertino, M. Negri, G. Pelagatti, L. Sbattella, Object-oriented query
languages: the notion and the issues, IEEE Transactions on Knowledge and Data Engineering (1992) 4.
[9] V.M. Crestana-Jensen, A.J. Lee, Consistent schema version removal:
an optimization technique for object-oriented views, IEEE Transactions on Knowledge and Data Engineering 12 (2000) 261280.
[11] M. Kifer, W. Kim, Y. Sagiv, Querying object-oriented databases,
Proceedings of ACM-SIGMOD, San Diego, CA (1992).
[12] W. Kim, Object-oriented databases: definition and research directions,
IEEE Transactions on Knowledge and Data Engineering 2 (1990)
327 341.
[13] S.-E. Lautemann, A propagation mechanism for populated schema
versions, Proceedings of IEEE International Conference on Data
Engineering, Birmingham UK (1997) 67 78.
[14] B. Lerner, A. Habermann, Beyond schema evolution to database
reorganization, SIGPLAN Notes (1990) 25.
[15] F. Manola, U. Dayal, PDM: an object-oriented data model,
Proceedings of the International Workshop on Object-Oriented
Databases, Pacific Grove, CA (1986) 18 25.
[16] S. Monk, I. Sommerville, A model for versioning of classes in objectoriented DBMS, Proceedings of the British National Conference on
Databases (1992).

R. Alhajj, F. Polat / Knowledge-Based Systems 16 (2003) 4757


[17] J. Mylopoulos, V.K. Chaudhri, D. Plexousakis, A. Shruji, T.
Topaloglou, Building knowledge base management systems, The
VLDB Journal 5 (1996) 238263.
[18] S. Nestorov, S. Abiteboul, R. Motwani, Extracting schema from
semistructured data, Proceedings of ACM-SIGMOD (1998) 295306.
[19] S. Nestorov, J.D. Ullman, J.L. Wiener, S.S. Chawathe, Representative
objects: concise representations of semistructured, hierarchial data,
Proceedings of IEEE ICDE, Birmingham, UK (1997) 7990.
[20] J. Peckham, F. Maryanski, Towards the correctness and consistency of
update semantics in semantic database schema, IEEE Transactions on
Knowledge and Data Engineering 8 (1996) 503 508.
[21] D.J. Penney, J. Stein, Class modification in the GemStone objectoriented database management systems, Proceedings of ACM
International Conference on Object-Oriented Programming Systems,
Languages and Applications, Orlando, FL (1987).
[22] F. Polat, A. Guvenir, A Unification-based Approach for Knowledge
Base Verification, Expert Systems 8 (1991) 251259.
[23] F. Polat, R. Alhajj, A Multi-agent tuple-space based problem solving
framework, Journal of Systems and Software 47 (1999) 11 17.
[24] Y.-G. Ra, E.A. Rundensteiner, A transparent schema-evolution
system based on object-oriented view technology, IEEE Transactions
on Knowledge and Data Engineering 9 (1997) 600624.
[25] M.A. Roth, H.F. Korth, A. Silberschatz, Extending algebra and
calculus for nested relational databases, ACM Transactions on
Database Systems 13 (1988) 389 417.
[26] D. Sjoberg, Measuring schema evolution, Technical Report, Glasgow
University, UK (1992).
[27] A.H. Skarra, S.B. Zdonik, Type evolution in an object-oriented

[28]

[29]

[30]
[31]

57

database, in: B. Shiver, P. Wenger (Eds.), Research Directions in


Object-Oriented Programming, MIT Press Series in computer
Systems, 1987, pp. 393 415.
S.L. Vandenberg, D.J. DeWitt, Algebraic support for complex objects
with arrays, identity and inheritance, Proceedings of ACM-SIGMOD
(1991).
A. Woodruff, M. Stonebraker, Supporting fine-grained data lineage in
a database visualization environment, Proceedings of IEEE ICDE
(1997) 91102.
M. Wooldridge, N. Jennings, Intelligent agents: theory and practice,
Knowledge Engineering Review (1995) 10.
R. Zicari, A framework for schema updates in an object-oriented
database system, Proceedings of IEEE ICDE, Kobe (1991).

Further Reading
M.J. Carey, D.J. Dewitt, The architecture of the EXODUS extensible
DBMS, Proceedings of IEEE International Workshop on ObjectOriented Database Systems, Pacific Grove, CA (1986) 52 65.
D.H. Fishman, et al., IRIS: an object-oriented database management
system, ACM Transactions on Office Information Systems 5 (1987)
48 69.

Вам также может понравиться