Parallel Controlled Conspiracy Number Search

Parallel Controlled Conspiracy Number Search *
Ulf Lorenz, Valentin Rottmann

Department of Mathematics and Computer Science
University of Paderborn
Germany
(extended abstract)
ABSTRACT
This paper deals with a parallelization of our Controlled Conspiracy Number Search (CCNS) algorithm.
CCNS is marked by the fact that there are two kinds of pieces of information: Values are updated
bottom up with minimax rules, and security demands, so called targets, control the selective search
top down.
As CCNS is a best rst search procedure, all nodes are kept in memory. We present a method that
maps these nodes onto several processors. The resulting load and space sharing problem is solved half
dynamically and half statically. Our solution leads to good speedup results.
1 Introduction
The purpose of our CCNS algorithm is to examine game trees which cannot be explored completely.
Such game trees arise in games like chess for example. For several reasons chess forms an excellent eld
of application. Firstly chess denes a strictly limited 'world' of some easy rules. Secondly it is complex
enough so that it will never be completely examined. Last but not least, it thirdly is assumed as a test
for a person's ability of strategic and tactical thinking, and thus it is considered as a rened challenge
on the eld of articial intelligence.
1.1 Conspiracy Numbers
If a game tree is built in a way that leaves may be expanded and become internal nodes, some of the
nodes may change their values. Some of these changings may eect the minimax value or the decision
at the root. That is the subject of McAllester's (1988) conspiracy number theory. The aim is to ght
against some drawbacks of the - algorithm: The - algorithm in its basic form cuts the search at a
certain level inside the tree, independent of the importance of the current path in the tree. Moreover,
at the worst, the decision at the root is based on a single evaluation, i.e. if one special leaf has got a
wrong value, the decision may be disastrous. There is no guarentee of fault tolerance.
Denition 1-1
The conspiracy number (cn or conspiracy) of the root of a game tree T = (V; E; h) for some value
x is dened as the least number of terminal nodes of T that must change their value to x in order to
change the minimax value (with regard to T) of the root to x: 2
Examples and a simple method for computing the conspiracy numbers can be found in papers by
Schaeer (1990) and van der Meulen (1990).
1.2 Conspiracy Number Search (CNS)
Now we are going to describe how conspiracy numbers are used in a search. As they represent proba-
bilities respectively a degree of fault tolerance, the idea is to search (i.e. expand suitable leaves) until
* This work was partly supported by the DFG research project Selektive Suchverfahren under grant Mo 476/99 and
by the Leibniz award fund of B. Monien from the DFG (German Research Association) under grant Mo 476/99
we are sure that the value of the root | respectively the resulting decision | is of good quality. The
aim of the search is not only to compute a minimax value of the root in a xed, given game tree, but
to nd a suitable subtree (which we call search tree) of the theoretical game tree and to evaluate this
subtree. Moreover, the root value should be stable with a certain security.
A conventional CNS algorithm can be described by the following three basic steps:
Selection: The task of the selection is to nd a suitable leaf for expansion in a given game tree.
It nds this leaf by following a path P = (v1 = ; :::; vn) from the root of the game tree to a
leaf. We call this path the selection path. When a selection decides that a node vi is part of P
it will nd vi+1 of P in the set of the successors of vi . Whether a successor of vi becomes part
of P or not is decided at vi locally. The rules for nding vi+1 only depend on information of the
successors of vi . Thus a leaf for further expansion is found.
Expansion: A leaf vn of the game tree is given, all successors of vn must be generated and
evaluated. We presume that the values of the newly generated successors of vn are determined
by a quiescence search.
Backup: The results of an expansion have to be incorporated into the information found in the
tree searched.
Selection Expansion Backup
The search is guided in a best rst manner, that is why the search tree is kept in memory.
Dierent versions of the CNS algorithm have been implemented by Schaeer (1990) and by van der
Meulen (1990). Their algorithms search very selectively, deep at some forced variations, less deep
at others. Unfortunately, the algorithms often expand lines to unnecessary depths, trying to show
something which is not possible to prove. So the convergence can be quite slow, or if you allow trees
with unlimited long paths the algorithm may never converge.
Both implementations show good tactical play but suer from some drawbacks:
It is quite expensive to use an - quiescence search for the evaluation of leaves because no
- bounds are available for the quiescence search.
At each node of the tree, information about the conspiracy numbers has to be stored. The size
of these data grows linear either in the number of the possible values of the evaluation function
or in the maximum conspiracy.
Positional play is bad, since the CNS algorithms become unstable when using an evaluation
function of ne granularity.
Resources are wasted for determination of the exact root value, when only a move decision is
required.
Both implementations are inherently sequential, because at each step only one single leaf node is
selected for the expansion.
Like for all best rst search algorithms, the search tree is kept in memory, what results in a space
requirement, linear in the search time.
We state the following observations at a conventional CNS.
Observation 1-2
Let P = (v1 = ; :::; vn) be the selection path at time t. Then vn is expanded next. Very often there
is a vj in the path P (1 j n), the conspiracy of which is not in uenced by the expansion of vn .
Therefore v1 ; :::; vj will be part of the selection path of the next selection phase. This local trait is not
2
completely used in the scheme of the phases: selection, expansion, backup. 2
Observation 1-3
The aim of conspiracy number search is to nd a subtree of the game tree so that the value of the
root is secure with a given conspiracy. That can be achieved by a subtree which forces all cn's (for
all possible values in the root) to be bigger than the used threshold. Nevertheless, after the search is
ended, for most of the possible values the cn in the root is bigger than necessary. Obviously, a) there
are successors generated by expansions which are super uous and b) many leaves give an exact value
to their predecessors although a bound would be sucient. 2
Observation 1-4
Often only a decision is needed at the root. E.g. you search for a good move in the game of chess. The
absolute quantity of the minimax value is not important in such a case. It is sucient that the minimax
value of one root successor is relatively better than the minimax values of the other root successors.
This is well known from algorithms like B (Berliner 1979) and PB (Palay 1985). 2
The observations are important for the improvement of CNS to CCNS.
1.3 Sequential Controlled Conspiracy Number Search
1.3.1 The CCNS Scheme
In this section we shortly present our CCNS algorithm. It is decisively based on the observations of
the conventional CNS algorithms, given at the end of the last section.
The good quality (which is shown by experimental results in Lorenz, Rottmann, Feldmann, and Mys-
liwietz 1995) is substantiated by eliminating nearly all drawbacks of the CNS algorithms.
Before we give a schematic description of the CCNS algorithm we shortly dene the so called targets.
Denition 1-5
Let IN 0 be the set of natural numbers, including 0. Let n 2 IN be the number of possible values of a
node, w1; : : :; wn the possible values. The class of general CN targets of dimension n is dened as
Cn := fu 2 IN n0 j u = l|{z}
: : :l 0 r| :{z: :r} = li 0rj ; i; j 2 IN 0 ; i + j = n ? 1g:
i j
2
E.g. 0413 2 C7 and 22035 2 C8 . When a CN target u = (li 0rn?i?1) is assigned to a node v, the search
below v is done in order to fulll u for v. I.e. we want to nd a tree below v, of a kind that for all
i 2 f1; : : :; ng at least ui leaves of the very same tree have to change their values in order to change
the value of v to wi . We introduced this kind of target because it is close to the use of conspiracy in
the conventional sense. For each possible value you have a certain threshold. However, as we want to
force all cn's to be bigger than one prescribed threshold all cn's to the left (to the right) of the minimax
value are the same, namely l (r). When we additionally presume that the number of possible values is
xed, we can abbreviate the target (li 0rn?i?1) by (l; wi+1 ; r).
We use these targets in order to inform each node about the sense of its examination. So most of the
time it will be possible to use fast - quiescence searches for expanding a leaf. Furthermore, internal
nodes often do not need to examine all their successors.
A target is given from a father to a child. E.g. (0,5,3) means that the father says to his child: I have
the impression that your value is less than or equal to 5. Please let me know whether that is correct.
If NOT, tell me as soon as you get another value. If YES, tell me as soon as you are sure with cn 3.
For further details of the use of such targets and especially how a target is split to the successors of
a node | by the so called splitting function or splitting heuristic | the reader should take a look at
Lorenz, Rottmann, Feldmann, and Mysliwietz 1995.
Observation 1-4 leads us to our extended denition of conspiracy numbers:
Denition 1-6 (Extended Conspiracy Numbers)
3
The extended conspiracy number of the root of a game tree T = (V; E; h) for the best move m is
dened as the least number of terminal nodes of T that must change their value in order to change the
decision at the root to another move m0 : 2
1.3.2 Description of the CCNS Scheme

Scheme of the CCNS-algorithm:
int control-loop()
do iterations f
do f
guess a CN target for v = root(T);
ccns(v, CN target of v);
g while the CN target at root(T) is not fullled;
g until the result is secure enough;
ccns(v; u)
= Let v be a node and u its CN target. =
if u is directly fullled at v f
update value information;
return 'OK';g
while (u looks fulllable) f
split target u wisely to the successors of v;
for all successors v:i of v do ccns(v:i,u:i);
update value information;
if all successors said 'OK' return 'OK'; g
return 'NOT OK';
Figure 1: The Scheme of CCNS

In order to give an insight into this algorithm let us have a look at a small example. For a more
detailed description we refer to (Lorenz, Rottmann, Feldmann and Mysliwietz 1995).
We try to nd a move which is secure with cn 2. Let us start at the root v and expand it. The
values are shown inside the nodes.
v
5
s1 s3
(2,5,0) s2 (0,5,2)
v.1 (0,5,2) v.2 v.3
5 5 3
Figure 2: Expanded Root

This little search gives us the impression that s1 is a good move. Nevertheless, we are not sure about
it. We were sure if we knew that s1 is a correct move, even if we had incorporated one faulty leaf-value.
In other words: We are satised when at least two (awfully selected) leaves below v:1 must change their
4
values in order to decrease the value of v1 below 5, and at least two leaves must change their values to
increase the value of v2 above 5, and the same with v3. Thus we assign the target (2,5,0) to v1 and the
target (0,5,2) to v2 and v3. In order to keep the example small we start the search at node v2. Let v2
have three successors. We examine the rst, i.e. we generate it and evaluate it with an - quiescence
search, = 5, = 1. Let the resulting value be less than or equal to 5. Then we examine the second
successor of v2, and if its value is less or equal 5, too, we have the following situation:
v.2 (0,5,2)
5
v.2.1 v2.2 v.2.3

5 5 ?
Figure 3: v2 Fullls Its Target

That is all. At least two leaves must change their values in order to increase the value of v2 above 5.
Moreover, relating to observation 1-3 v2:3 is not generated and v2:1 and v2:2 could be evaluated with
the help of fast quiescence searches. The relationsship to the observations 1-2 and 1-4 is clear.
However, if e.g. the value of v2:2 is 6, the further search depends on the value of v2:3. At least v2:3
must then be examined. Now let us suppose that neither v2:1 nor v2:2 nor v2:3 get a value of less or
equal to 5. As we have evaluated them with a quiescence search with window = 5 and = 1 the
nodes v:2:1, v:2:2, v:2:3 might have got the values 6, 7 and 8. Thus v:2 gets the value 6 (minimax
value) and sends the answer NO to his father v. New targets must be constructed for the successors
of v. That is done by the splitting heuristic.
Instead of a value is less or equal to x we call the value (LEQ,x), (the same with (GEQ,x)), or
(EXT,x) if we think a value is x. This has formal reasons, and becomes only clear if you examine a
large example. Then e.g. (EXT,5) means: There exists a subtree with root v, whose minimax value is
less or equal to 5, and there exists a subtree with root v, whose minimax value is greater or equal to 5.
These subtrees need not be the same. Moreover, they need not be explicitely visible.
Observation 1-7
Obviously the quality of every CCNS algorithm mainly depends on the quality of the splitting of
targets into the successors of a node. 2
2 Parallel Controlled Conspiracy Number Search

Let fP1; : : :; PN g be a set of N, so called (working) processors, and H be the host processor. The working
processors are connected according to an undirected communication graph G = (fP1 ; : : :; PN g; L) to
the working network. The host processor H is connected to P1 .
A single processor Pi provides as much memory capacity to hold at most k nodes of the search tree. It
uses this capacity to keep in memory several subtrees of the search tree. Each processor Pi can perform
a CCNS algorithm and is allowed to communicate with processors it is connected with.
2.1 Important Terms
Denition 2-8 (present variation)
Let T = (V; E; h) be a game tree, let be the root of T. Let a processor P work at T after a call
of the procedure ccns. Let t be the present stack pointer of the recursion. Each stack entry ai of
S() = (a1 ; a2; :::; at) includes a move list and a node vi which is examined at time . The sequence,
consisting of the nodes v1 ; :::; vt of the stack entries a1; :::; at at time is called the present variation of
5
P at time . 2
Denition 2-9 ((sub)problem)
Let T = (V; E; h) be a game tree, let v 2 V , u 2 Cn. A (sub)problem p with root v and target u is
specied as follows:
Given: A node v and a target u.
Searched for: A subtree T(v) = (V (v); E(v); hjV v ) of T with root v and u is fullled at v, or the
answer that u is unlikely to be fullled. 2
( )
E.g. ccns solves a problem, i.e. a call of the ccns function solves such a problem.
Denition 2-10 (task)
A task t is a 5 + x-tuple (p; S; s; line; result; g1; :::; gx). p is a problem, S a stack+ structure, s 2
factive,waitingg, line is the value of the system program counter, result and g1; :::; gx are the global
variables which are used to work out the problem p. Thus a task represents a static conguration of a
CCNS algorithm, at an arbitrary point of time.
When s = active, the algorithm, which t is a conguration of, can do useful work, otherwise it cannot. 2
2.2 Description of the Parallel CCNS

2.2.1 Distribution of CN-Nodes
The Idea
The procedure ccns uses an articial stack (a so called stack+ ) for controlling the search. We are able
to manipulate such a stack+ from outside in order to share work and in order to integrate results
from other processors. Let us assume that ccns examines a move list from left to right. For achieving
this, a present variation is stored on the stack at any point of time. All nodes left from the present
variation are already examined or need not be examined. The idea of parallelizing such a tree search
is to give away as many right siblings of the present variation for parallel working as possible. Thus
many processors start a tree search on a subproblem. These processors create stacks by themselves,
the stacks holding a present variation each. The new stacks themselves lead to sets of right siblings,
these being suited for external examination. So we get a scheme of a parallel algorithm, the eciency
of which mainly depends on ecient solutions of the following problems:
1. Initiating of a worker/employer relationship:
Each working processor creates subproblems. These new subproblems can be given away for
external treatment. Therefore such a processor is a potential employer. A processor that is not
busy is a candidate for being a worker. Now it must be achieved that a working processor provides
an idle one with a subproblem. A worker/employer relation must be initiated. The eciency (this
happens with) mainly determines the average load of the processors.
2. The splitting heuristic:
As already described, our sequential CCNS algorithm mainly works in a depth rst search manner,
with the help of a stack. On this stack there are some nodes of the present variation, which are
marked by the splitting heuristic for further research. These nodes are supplied with a non-trivial
target. If the heuristic were perfect and never failed all marked nodes would be researched with
the applied target. However, it is not perfect. Therefore the quality of the splitting heuristic will
mainly determine the parallel search overhead.
The parallel algorithm has to embed the search tree (called Tm ) in the network of processors. Since
we do not want to send complete subtrees from one processor to another, it seems reasonable to map
nodes to processors in the following way: A leaf v of a tree, which is kept in memory by a processor P ,
can become the root of a subtree below v. This subtree can be established by any processor. When,
however, a node v, not being a leaf, once is mapped onto a processor Q, this node v must be examined
by Q whenever v must be examined. Hence there are two kinds of initiating worker/employer relations:
1. An unemployed processor tries to nd some work. Receiving a REQUEST by such an unemployed
processor a working processor can send a subproblem (v; u) to it, v being a leaf and u being the
6
CN-target assigned to v. This is the idea of work stealing.
2. A processor P assigns a target to a remote node v. A node is remote from a processor P when it is
placed onto another processor and when the predecessor of v is xed on P . P sends a subproblem
(v; u) to the processor which holds v, v being an internal node of Tm and u being a target.
Thus we have got a dynamic load sharing mechanism for newly generated nodes and a so called static
embedding of the game tree in the network of processors for internal nodes. In other words: Newly
generated nodes are placed onto the processors dynamically, and this rst placement gives a functional
connection for further accesses to the nodes. Unfortunately, this implies that only leaves of Tm supplied
with non-trivial targets are candidates for a new employer/worker relation.
Observation 2-11
Due to the combination of a requesting load sharing system and a static embedding of Tm it may
occur that a processor P must work at several problems at the same time. I.e. problems may reach a
processor P , when P is already at work. 2
Handling of Several Problems
In order to avoid deadlocks we have implemented a method that is able to manage several problems at
the same time. It uses the stack+ structure. Each processor shall get a set of tasks: For ordering the
tasks we use a list L, the basic type of which consists of tasks.
Let us presume that the procedure ccns does not use the system stack for procedure calls and local
variables, but a stack+ structure, which is managed by our algorithm. Moreover, we reorganize the
procedure ccns in a way that it is possible to perform single steps of the ccns. Thus we have available
a procedure, called ccns-step.
In co-operation with the procedure of gure 4
foreverf
if there is a non-waiting task available f
select a task t;
t.line := ccns-step( t.line, t );
g
communicate();
g
Figure 4: Stepping Loop

we are able to operate to and fro the tasks of the tasklist L.
Messages for Dynamic Distribution of the Game Tree
1. Start of a worker/employer relation:
In the initial state no processor has got any work. The host processor H sends the initial problem
to a special processor P 1 of the network. All processors which are not employed send a REQUEST-
message (for work) to another, arbitrarily chosen processor.
Let us assume that processor P s , which has already got some work, receives this REQUEST.
Now P s tries to nd a suitable free node v to the right of its present variation.
A successor v:j of a node v 2 Tm of the present variation is called a free node if no processor is
examining v:j, v:j is not searched yet, j > 1, v:j is a leaf of Tm in the memory of a processor P ,
v:j is supplied with a non-trivial target, and last but not least the game-theoretical value of v:j
is not known.
If P s nds such a node it creates a subproblem with the help of v and with the help of the target
which belongs to v. Then P s sends this subproblem to the sender of the REQUEST message.
7
With a PROBLEM-message a sender s supplies a receiver r with an important subproblem rooted
at a node v, inspected at stack level d of processor P s .
If a receiver of a REQUEST cannot dispatch work, it sends back a NO-WORK-message.
2. Finishing the worker/employer relation by the worker:
A processor P s, having solved a subproblem all by itself or by the help of other processors, sends
the result (in form of a RESULT-message) to its employer. The RESULT-message consists of all
pieces of information which the employer needs to integrate the result.
Now this worker/employer relation is nished. The processor P s , which has nished it, tries to
select another non-waiting task. If there is no such task it is without work again. Note: At
parallel CCNS a processor is able to send away a subproblem without getting a REQUEST.
With the help of an identication code, the receiver of the RESULT checks whether the result
still belongs to a valid subproblem of its own problem. It may be that another result has already
made the subproblem obsolete. If the result is not dated and the result indicates an unsuccessful
computation and the root of the dispatched subproblem is in level d, it updates its stack elements
from the level of the present stack-pointer (which is in level d) up to level d ? 1. This leads to
an unsuccessful result in level d ? 1 or at least to a re-splitting in level d ? 1.
3. Finishing the worker/employer relation by the employer.
If a value x causes a re-splitting or an unsuccessful result at a processor P s in level d, all workers
of levels greater than or equal to d get a SHUT-DOWN-message which indicates that they have
to interrupt the computation of their problem, if it is supplied with a still valid subproblem of
the re-split problem. In this case the workers do not send any results.
Messages for Controlling the Game Tree Statically Embedded
1. When a processor P s must examine a node v which is already placed on processor P r , P s sends an
ACTIVATE-message to P r . An ACTIVATE-message is just the same as a PROBLEM-message,
but for sending an ACTIVATE-message the processor must know the address of v at P r . This
address is part of the message.
2. With a MEM-NR-message a processor P s sends the address of its problem to its employer.
We give two examples of the communication structure:
Example
Figure 5 a) shows how two processors P 1 and P 2 work at a problem rooted at v. Let P 1 have supplied
the nodes v:1, v:2, and v:3 by non-trivial targets. Now it receives a REQUEST-message from P 2 . As
v:2 is a free node, P 1 sends a subproblem p | mainly consisting of v:2 and the target (0,5,3) | to
P 2 . While p is on its way to P 2 , P 1 starts a search below v:3. Then P 2 receives p and initializes the
search. After that it sends the address a of v:2 at P 2 to P 1. Now P 1 and P 2 work at v:2 and v:3
simultaneously.
When P 2 nishes the search at v:2 it sends a RESULT-message to P 1 , the message consisting of a new
value for v:2 and an acknowledgment that the search below v:2 has ended successfully. At the moment
when the result reaches P 1 , this processor is still searching below v:3. P 1 integrates the result into its
own v:2. The search below v:3 runs on.
Example 5 b) shows the mechanism of activating and shutting down a remote problem p. Let v:2
be already mapped to P 2 . P 1 has supplied v:2 with the target (2,5,0). Therefore P 1 sends p to P 2 .
Before P 2 can nish the search below v:2, P 1 ends the search below v:1 unsuccessfully. Therefore a
re-splitting must be done at v, and the computation of P 2 at v:2 is without importance. P 1 sends a
SHUT-DOWN-message to P 2 and performs a re-splitting at v. P 2 ends the search immediately without
giving a result to P 1 . After the execution of a re-splitting, v:2 is supplied with the target (4,5,0). Again
P 1 sends an activation | in form of an ACTIVATE-message | to P 2 .
8
a) b)
time
P1 P2 P1 P2
time
REQUEST(P2,P1)
(0,5,3) (4,5,0)
v v
(5,LEQ) (5,GEQ)
ACTIVATE p = (P1,P2,v.2,(2,5,0)))).,......
(2,5,0)
v.1 v.2 v.3 (5,GEQ) v.1 v.2 v.3
(5,LEQ) (5,LEQ) (2,5,0) (3,EXT)

(5,GEQ)
active P2,a
solved (2,5,0)
PROBLEM p = (P1,P2,v.2,(0,5,3)))))) v.2
(5,LEQ) (5,GEQ)
(4,5,0) active
(0,5,3)
v v
(5,LEQ) (5,GEQ)
(0,5,3)
v.2
(5,LEQ)
active at address a SHUTDOWN(...)
(0,5,3)
v.1 v.2 v.3 v.1 v.2 v.3
(5,LEQ)
P2 P2,a (3,EXT)
P1 MEM_NR(...,...,a,...,...)
solved solved (2,5,0)
(5,LEQ) active (4,EXT) v.2 (5,GEQ)
v resplitting
(0,5,3)
v.2
(5,LEQ)
(4,5,0)
v
(0,5,3) (5,GEQ)
v.1 v.2 v.3
(5,LEQ)
solved
P2,a (4,LEQ)
v.1 v.2 v.3
ACTIVATE p = (P1,P2,v.2,(4,5,0)))),.....
P2,a (3,EXT)
solved active RESULT(...,(4,LEQ),YES,...) (4,5,0)
solved (5,GEQ)
Update of the value of v (4,EXT)
(x,LEQ) means: there exists a subtree whose minimaxvalue is less or equal to x

(x,GEQ) means: analogously
(x,EXT) means: there exists a subtree whose minimaxvalue is less or equal to x
and there exists a subtree whose minimaxvalue is greater or equal to x
Figure 5: Examples of Communication
The Load Sharing Mechanism of the Best Version

We tested some modications of the load sharing mechanism, and our best version uses an additional
modication: One disadvantage of the version described is that free nodes must be leaves of the search
tree. Let P be a processor, let v be a free node, w a node which is not a leaf, but fullling all other
demands of a free node. The main version often examines v before w and so prevents v from being
examined by another processor. Now the parallel algorithm is allowed to create a new task rooted at w.
Thus the algorithm rst works at w and keeps v as a free node for a while. When a request reaches P it
can send v to the requesting processor. For avoiding a non-eective number of tasks such a procedure
is only allowed at certain instants.
2.2.2 Distributed Expansions
When the number of available subproblems is too small. the load sharing mechanism does not work.
Then we are forced to distribute quiescence searches and expansions, too.
In quiescence searches we use parallelism without care. The aim is to increase the load (denition
follows), without regard for the so called search overhead.
In distributed expansions we are a bit more careful: Let v be a node which is to be expanded next.
Furthermore, an - quiescence search shall be used for the evaluation of nodes. Let u be a target for
node v. This target gives us a hint how many successors of v must be evaluated, and to all moves it
oers the initial windows [; ] which are to be used for the evaluating quiescence searches.
All these successors are examined in parallel with the help of a parallel - quiescence search. The
9
distributed expansion is based on R. Feldmann's and P. Mysliwietz's works (Feldmann et al. 1991,
Feldmann 1993).
This combination led to our best results.
2.3 Experimental Results
We tested our parallel CCNS algorithm on a transputer system consisting of T805 processors of INMOS
enterprises. Transputers have been developed for use in Multiple Instruction Multiple Data (MIMD)
architectures. Every transputer has a frequency of 30 MHz and is supplied with 4MB main memory.
It performs 12.5 MIPS and 1.5 MFLOPS. We used up to 127 processors which are connected as a
two-dimensional grid. The program is written in C, so be were able to compile it on a Sparc10/60 as
well, without communication functions.
2.3.1 Preliminaries
Let be a set of test positions, n the number of processors, p a problem, wn (p) be the sum of all times
which n processors need for carrying out all the subproblems which they get during their working at
p. Let k1(p) be the number of nodes (CN-nodes + quiescence search nodes) the sequential algorithm
examines, and let kn(p) be the number of nodes (CN-nodes + quiescence search nodes) which are
examined by n processors during their work at p. We judge the eciency of our parallel algorithm
with the help of common denitions of speedup, load, search overhead, and performance (cf. Feldmann
1993).
Denition 2-12 (speedup) Denition 2-13 (load of the network)
P t1(p) P wn(p)
SPE(n) := P t (p)
p2 LOAD(n) := 100 n Pp2 t (p)
1
p2 n p2 n
2 2
Denition 2-14 (search overhead) Denition 2-15 (performance)
Pp2 kn(p) Pp2 kn(p) Pp2 t1(p)
SOVD(n) := 100 ( P k (p) ? 1): PERF(n) := 100 P w (p) P k (p) :
p2 1 p2 n p2 1
2 2
The most important denition is that one of speedup. It is in uenced and caused by the others. The
search overhead puts into relation the number of nodes which are examined by the sequential version
and the number of nodes which are examined by the parallel version. Additionally, the average load
oers (in percent) the pure working time of the processors, relatively to the total computing time. The
average performance provides the number of nodes searched per second by the sequential version in
relation to the nodes searched per second by a parallel version, concerning the pure working time only.
2.3.2 Behaviour of the Best Version
Now we examine the behaviour of the parallel CCNS algorithm using up to 127 processors. All results
are taken from the Bratko Kopec test set ( := fB01; :::; B24g).
Since all CN-nodes examined must be stored, our parallel CCNS algorithm can only run for a few
minutes on each problem when we use T-805 processors. Thus it is not possible to measure the speedup
by comparing the speed of one T-805 processor to the speed of several ones. Therefore we measured
the speedup with the help of a Sparc10/60 MHz supplied with more than 400 MByte RAM.
We compare our sequential CCNS algorithm (cf. Lorenz and Rottmann 1995) to the parallel version.
The sequential and the parallel program use the same source les concerning ccns, quiescence searches
etc. So the main dierence is that the sequential program has no communication functions and it is
compiled for another machine. The 'speedup' of a problem p now results from the ratio of time t1(p)
the sequential algorithm needs for p (on a Sparc10/60), and the time tn(p) a parallel algorithm needs,
10
running on n transputers.
This causes new problems: The programs run on dierent processors, are compiled by dierent compilers
and run on dierent operating systems. E.g. we could not nd a loss of performance, which usually
occurs in parallel programs. It is caused by so called communication overhead. Nevertheless: We think
that our results are realistic.
There are two aspects which support our measurements of the speedup: One is the correlation between
speedup, load and search overhead. The other one is that we can compare | at least in a small time
window { e.g. a 2 processor result with a 5 processor result.
We divide the test set in four classes of time:
At the test set T(1) the Sparc10/60 needs 9.2 seconds on the average per position.
At the test set T(2) it needs 34.3 seconds on the average per position.
Using T(1) we measured that the Sparc10/60 is 41.2 times faster than one T-805 transputer in per-
forming CCNS. We estimated the time one processor would need (if it were possible) for a problem
by taking the Sparc10/60 time multiplied by 41.2. This does not falsify the result data but leads to a
more intuitive presentation of the data. We are aware of the fact that we did not really use a T-805
processor, but a Sparc10/60 for our one-processor measurements.
With T(1),...,T(4) we avoid the problems you get if you try to use a single conspiracy threshold for all
testset positions. Each single position of each test set is supplied with its own conspiracy threshold and
all these demands are proved to be computable in reasonable time.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
T(1) - 3 2 2 1 5 1 5 3 2 5 4 3 2 5 3 2 2 1 3 4 1 5 3
T(2) - 5 2 4 2 7 3 7 4 4 5 5 4 3 5 3 2 3 3 5 5 3 5 4
T(3) - 5 3 5 2 7 4 7 4 5 5 6 4 3 5 4 3 3 5 5 7 4 5 5
T(4) - 5 3 5 3 9 5 9 4 5 5 6 5 5 6 5 3 4 5 5 8 4 6 5
Table 1: Detailed CN's for T(1) .. .T(4)

Unfortunately, that does not guarantee that all used processor networks are able to fulll their demanded
conspiracy for all positions. So some positions are taken out of consideration. Position B01 is taken out
of the test sets, too, because the CCNS algorithm only searches up to conspiracy 1, i.e. one quiescence
search. Positions are as well taken out of consideration when the sequential and the parallel versions
do not prove the same move to be the best one. We are of the opinion that in such a case we cannot
compare a parallel running to a sequential one, because in general we do not know anything about the
qualities of the dierent moves. Without these restrictions to the test sets we think that the results
would be of no worth. The minimum of considered positions is 18 of 24.
2.4 Qualitative Results
Now we are going to explain the measured speedup by observing load, search overhead and performance.
Moreover, we try to give some reasons for the observed data.
2.4.1 Search Overhead
Figure 7a) shows the search overhead of the dierent numbers of processors concerning T(1) ,...,T(4).
Obviously the search overhead increases in a xed test set with a growing number of processors, but
decisively decreases with growing time when xing the number of processors. It is not directly obvious
why the search overhead decreases with growing time. We think that this behaviour mainly depends
on two reasons:
1. When a processor works for a short time only, in general the achieved conspiracy threshold is
small. Therefore the security of the decision at the root and the security of value information of
the nodes are low. Low security of node information can increase the search overhead.
11
50
Figure 6 shows the results investigated for

136.4
45
40
1,2,5,8,15,31,63, and 127 processors. The
35 180.4 small numbers at the graphs give the aver-
age time that a processor network needs for
the test set. E.g. 63 processors examine T(4)
30
in averagely 180.4 seconds per position. Not

25 132.4
238.7
surprisingly, the speedup achieved grows for

20
181.8 77.5
each number of processors with averagely in-

15
103.2
creasing working time. The reason becomes

282 testset
10
148.5 T (1)
clear by gure 7a) and gure 7b). With grow-

56.3 T (2)
5
256
73.5 T (3)
100.4 T (4)
ing time the average load increases and the

213
377.3
0
2 5 8 15 31 63 processors 127
search overhead decreases.

Figure 6: Speedups for T(1) ,...,T(4)
a) Search Overhead b) Load
% 90 % 105
testset testset
(1)
T (2) T (1)
T (3) 100
T (2)
80 T (4) T (3)
T T (4)
95
70
90
60
85
50
80
40
75
30
70
20
65
10
60
0 55
-10 50
2 5 8 15 31 63 processors 127 2 5 8 15 31 63 processors 127
Figure 7: Position B12: Search Overhead and Load for T(1),...,T(4)
2. When many processors quarrel for only a few problems, the processors increasingly try to share
quiescence search problems. In quiescence searches, however, the parallel CCNS algorithm uses
parallelism less carefully.
2.4.2 Load
Larger problems make an easier load sharing possible. That is why the average load increases with
growing working time. That, however, does not clarify why on the average the load is surprisingly low
at all classes of time T(1) ,...,T(4). E.g. 31 processors do not exceed the 80 percent load line although
the problems examined seem to be large enough. In addition to that we are not under the impression
that 63 processors will ever exceed the 70 percent load line.
Observation 2-16
Figure 8a) shows a typical course of the load, computed on the Bratko Kopec position B12, examined
by 15 processors. (There are two graphs in the diagram. The upper graph belongs to the sum over 15
processors, the graph below belongs to processor P 1.) A negative peak that leads to the bottom line
(at about 2, 20, 35, 60, 80 seconds) shows a nished iteration. Obviously it is dicult to share the load
during the rst ve iterations. Then we observe a nearly perfect load, up to the moment when the next
iteration comes to an end. Of course there is a time of decreasing load, but that time is surprisingly
long. 2
12
Observation 2-17
The average load in the network is rather equally distributed over the processors. On the average all
processors have a load between 70 and 83 percent, examining B12-T(3) with 15 processors. Obviously
the processors which have work and delay the others are not the same ones all the time. 2
a) b)
B12 # free problems B12
Load * 15
1600
%
15 processors 15 processors
1400 1 processor 1 processor
1400
1200
1200
1000
1000
800
800
600
600
400
400
200 200
0 0
0 100 200 300 400 500 time [s] 600 0 100 200 300 400 500 time [s] 600
Figure 8: Position B12: Load of 15 Processors and Number of Free Nodes
Observation 2-18
Figure 8b) shows the number of free nodes, i.e. the number of available CN-subproblems in the network
in the course of time. (There are two graphs in the diagram. The upper one belongs to the sum over
15 processors, the one below belongs to processor P 1 .) When problems arise they are distributed over
the network, but at the end of an iteration there is no work for distribution. 2
We trace the long lasting low load at the end of an iteration

to the static embedding of the search tree. For illustrating
P1 P2
p1 we oer gure 9:
Let P 1 and P 2 be processors, let p1 , p2 , and p3 be problems
p2 p3
on P 1 and P 2 . In an earlier computation P 2 got p2 from

P 1 . As p2 was small P 2 got p3 as well. Now p2 becomes
large. A good load sharing mechanism would make P 1 work
Figure 9: Load Sharing Problem at p3 and would make P 2 work at p2. Unfortunately p3 is
xed to P 2 . Hence P 2 must completely examine p2 before
p3 can be examined. Although P 1 helps P 2 at its work, a good load sharing is not achieved, because
only small parts of p2 are examined by P 1 . Later many small pieces of p2 will be found on P 1 . Some
of them may become large, and the situation will become worse than before.
Conclusion
In this paper we presented the parallelization of our Controlled Conspiracy Number Search (CCNS)
algorithm. CCNS is marked by the fact that there are two kinds of pieces of information. Values are
updated bottom up with the help of minimax rules, and security demands, so called targets, control
the selective search top down.
As CCNS is a best rst search procedure, all nodes are kept in memory. We present a method that
maps these nodes onto several processors. The resulting load and space sharing problem is solved half
dynamically and half statically. This solution leads to good speedup results. Further research continues
in order to rene the conspiracy idea itself. A task of more general interest is to nd an improved space
balancing mechanism.
13
Acknowledgements
We thank Burkhard Monien for his long-lasting support and backing. In addition, thanks to Rainer
Feldmann and Peter Mysliwietz for a lot of good hints in many discussions.
References
H. Berliner. The B* tree search algorithm: A best-rst proof procedure. Articial Intelligence,
12(1):23{40, 1979.
I. Bratko and M. Gams. Error analysis of the minimax principle. In M.R.B. Clarke, editor,
Advances in Computer Chess 3, pages 1{15. Pergamon Press, 1982.
R. Feldmann. Spielbaumsuche mit massiv parallelen Systemen. 1993. Doctoral-Thesis, University
of Paderborn, Germany.
R. Feldmann, P. Mysliwietz, and B. Monien. A fully distributed chess program. In D.F. Beal,
editor, Advances in Computer Chess 6, pages 1{27. Ellis Horwood, 1991.
U. Lorenz, V. Rottmann. Parallel Controlled Conspiracy Number Search. 1995. Master-Thesis,
University of Paderborn, Germany.
U. Lorenz, V. Rottmann, R. Feldmann, and P. Mysliwietz. Controlled Conspiracy Number Search.
ICCA Journal, 1995.
D.A. McAllester. Conspiracy Numbers for Min-Max searching. Articial Intelligence, 35(1):287{
310, 1988.
A.J. Palay. Searching with Probabilities, 1985.
J. Schaeer. Conspiracy numbers. Articial Intelligence, 43(1):67{84, 1990.
M. van der Meulen. Conspiracy number search. ICCA Journal, 13(1):3{14, March 1990.
14

Parallel Controlled Conspiracy Number Search

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Parallel Controlled Conspiracy Number Search

Загружено:

Авторское право:

Доступные форматы

Parallel Controlled Conspiracy Number Search *

Ulf Lorenz, Valentin Rottmann

Selection Expansion Backup

1.3.2 Description of the CCNS Scheme

Figure 1: The Scheme of CCNS

Figure 2: Expanded Root

v.2.1 v2.2 v.2.3

Figure 3: v2 Ful lls Its Target

2 Parallel Controlled Conspiracy Number Search

2.2 Description of the Parallel CCNS

Figure 4: Stepping Loop

(5,LEQ) (5,LEQ) (2,5,0) (3,EXT)

(x,LEQ) means: there exists a subtree whose minimaxvalue is less or equal to x

Figure 5: Examples of Communication

The Load Sharing Mechanism of the Best Version

Table 1: Detailed CN's for T(1) .. .T(4)

Figure 6 shows the results investigated for

in averagely 180.4 seconds per position. Not

surprisingly, the speedup achieved grows for

each number of processors with averagely in-

creasing working time. The reason becomes

clear by gure 7a) and gure 7b). With grow-

ing time the average load increases and the

search overhead decreases.

Figure 7: Position B12: Search Overhead and Load for T(1),...,T(4)

Figure 8: Position B12: Load of 15 Processors and Number of Free Nodes

We trace the long lasting low load at the end of an iteration

on P 1 and P 2 . In an earlier computation P 2 got p2 from

Вам также может понравиться

Figure 3: v2 Fullls Its Target