07prisoner's Dilemma

A SOLUTION TO THE PRISONERS DILEMMA USING AN ECLECTIC GENETIC ALGORITHM Angel Kuri M.
CENTRO DE INVESTIGACION EN COMPUTACION INSTITUTO POLITECNICO NACIONAL Blvd. Adolfo Lpez Mateos Col. Lindavista Mxico D.F. 729-6000 ext. 56547 akuri@pollux.cenac.ipn.mx
Abstract. In this paper we describe the solution of the so-called Prisoners Dilemma with the use of a genetic algorithm which incorporates a number of innovations which allow us to approach the Idealized Genetic Algorithm reported in the literature. In part 1 we discuss the problem to solve; in part 2 we describe the way we solved the problem in general terms; in part 3 we make a brief discussion of those cases where Genetic Algorithms have been found to be lacking in terms of deception and spurious correlation and how we attempted to solve them via an Eclectic Genetic Algorithm; in part 4 we describe how we utilized the EGA to solve the Prisoners Dilemma; in part 5 we give our conclusions. The Prisoners Dilemma. The problem we address in this paper is the so-called Prisoners Dilemma, which was discovered in 1950 by Dresher and Flood of the RAND Corporation [1]. It may be described as follows: assume that two persons have committed a crime and that both have been apprehended and are awaiting trial. The prosecutor offers both prisoners the following deal. If both prisoners plead innocent, both will receive 2 years of jail. If one of the prisoners confesses the other prisoners guilt, then the confessing prisoner will be set free, while the other prisoner will receive a 5 year sentence. If both prisoners confess the other prisoners guilt, then both will receive a 4 year sentence. The two prisoners are not able to communicate. Furthermore, both prisoners know that they have been offered the same deal. There lies the dilemma: if both prisoners claim innocence, then both will receive, for sure, a sentence. If, on the other hand, one of the prisoners decides to betray the other one, he will be set free. However, if, being overambitious, both prisoners choose to betray their accomplice then both will be condemned. What should one do when confronted with this situation? The problem may be described by the following matrix:
Cooperate Cooperate Defect -2,-2 0,-5
Defect -5,0 -4,-4
Original Payback Matrix for the Prisoners Dilemma In this matrix, the inner cells correspond to the payback the prisoners will get for each of the four possible situations. The first column refers to prisoner A, while the first row corresponds to prisoner B. The Cooperate heading means that the subject is not betraying the other subject; the Defect heading means the opposite: that the subject is willing to chance a longer sentence in exchange for the possibility that the other party will not betray him.
The problem is usually posed not in terms of one instance of the problem (which will be termed a "play") but rather to a series of consecutive instances ("plays"). That is, we ask for the best strategy we should follow given that we are presented with a sequence of similar situations. We want to minimize our losses. In 1979, Robert Axelrod of the University of Michigan conducted a survey to search for the best possible strategy with the following modified payback matrix[2]:
Cooperate Cooperate Defect 3,3 5,0
Defect 0,5 1,1
Axelrods Payback Matrix for the Prisoners Dilemma In Axelrods matrix, the problem is posed in terms of gain, rather than loss. That is, the participants are said to benefit from their cooperation, rather than be punished from their defection. In such case our strategy should attempt to maximize our benefits. Axelrod sent invitations to a number of professional game theorists telling them that he wanted to pit many strategies against one another in a round-robin Prisoners Dilemma tournament with the overall goal to amass as many points as possible. He asked for strategies to be encoded as computer programs that could respond to the C or D of another player taking into account the remembered history of previous interactions with that same player. A program should always answer with a C or a D, of course, but its choice need not be deterministic. Proposed Solution. The first issue to solve in attempting to solve this problem is to figure out a way to encode the strategies. This we did as follows. To every strategy we associate a string which represents the move (C or D) which we are to perform given a past history. For instance, given that the last three moves were m1r1m2r2m3r3 we have to establish what is the move that we select on our next turn. Here mi stands for our "opponents" i-th move and ri stands for our reply to the i-th move of our opponent. For the purpose of this example, let us assume that, indeed, we are keeping a record of the last three moves. In such case, we have 22 n 22 3 26 64 possible "histories" (where n = 3 stands for the number of past moves we consider). Representing with "D" a defection and by "C" a cooperation, we have the following alternatives: if past history = DDDDDD then r0 if past history = DDDDDC then r1 ... if past history = CCCCCC then r63 Clearly, then, we have 64 possible histories and the number of possible strategies is given by 22 22 264 1.84 1019 . This is a number much to large for an exhaustive search. We, therefore, encode one possible strategy with a 64 bit string. Thereafter, we have to evaluate the performance of the proposed strategies (represented in a 64 bit genome) when confronted with other possible strategies. During Axelrods experiment, it was found that a very simple strategy performed remarkably when pitted against the others. This strategy is called TIT-FOR-TAT and is noteworthy for its simplicity. TIT-FOR-TAT is as follows: a) Cooperate the first time, b) Repeat the opponent s last move . In other words, TIT-FOR-TAT "trusts" its opponent the first time. Thereafter it simply responds by repeating its opponents behavior. In Axelrods case, reported above, he selected the best proposed strategies and confronted them against each other to establish the validity of each one of them in, as already pointed out, a round-robin tournament. Here we do not consider any prior knowledge of which strategies to use to begin with. Therefore, our method was as follows.
2n 6
Algorithm PD. 1. Set S = number of strategies to include in the RR tournament 2. Generate 6S-1 random strategies. 3. Add the TIT-FOR-TAT strategy. These are indexed by , that is 1 , 2 , ... , 6 S . 4. Do 6 6S times Select strategy S and S where and are set at random. Play strategy S vs. S strategy a specified number of times. Grade strategy S and S as per their gains. enddo 5. Select the best S strategies. The set of best strategies constitutes the "opponents" against whom the proposed strategies are to be tested to extract the best fit individual of the Genetic Algorithm. An Eclectic Genetic Algorithm. Genetic Algorithms are evolutionary processes where a solution is evolved by applying the so-called genetic operators [3]. In various works [4],[5],[6] a Canonical Genetic Algorithm has been described and discussed. Here we describe a non-standard GA. This GA we call eclectic and incorporates the following techniques. a) Elitism. When one does keep a copy of the best individual we may guarantee global convergence for an optimization problem. There are variations in elitist models which we have denoted as partial elitism and full elitism. By partial elitism we mean that in a population of size n we keep a copy of the < n best individuals up to generation k. By full elitism we mean that we keep a copy of the best n individuals up to generation k. In other words, given that we have tested nk individuals by generation k, our population will consist of the best n up to that point. In figure 1 we depict the case of full elitism. Notice that in generation k we have selected the best possible n individuals out of the nk total individuals considered up to that point. b) Deterministic Selection. In deterministic selection we do not rely on the individual fitness to determine the most desirable descendants. Rather, we propose to emphasize genetic variety by imposing a strategy which enforces crossover of predefined individuals. There are two contrasting points of view. In one of these, we encourage the genes of the best individuals to cross between themselves; in the other we encourage the best individuals to cross with the worst ones. One of these two strategies is called the Nietzsche model (NM), where the best elements of the population intermix in an effort to preserve the "best" genes. The other strategy is called the Vasconcelos model (VM), where the "best" individuals are intermixed with the "worst" individuals in an effort to explore the widest range of the solution space. c) Vasconcelos Model. In this model we adopt the strategy of deterministically selecting individual i to cross it with individual ni+1. As in the NM, we assume full elitism. Hence, here we adopt a strategy which, superficially, destroys the good traits of the best individual by deliberately crossing it with the worst individual. However, when taken in conjunction with full elitism this strategy leads to the implicit analysis of a wider variety of schemas (i.e. it maximizes the exploration of the solution landscape). The exploration of such vast landscape is focused via the full elitism implicit in the model.
GENERATION 1 Best n Individuals 2 ... k
1 ... n ...
...
Figure 1. Full Elitism.
d) Self-Adaptation. When running a GA there are several parameters that are to be set a priori: Three of these are the most common: a) The crossover rate (Pc) , b) The mutation rate (Pm), and c) The size of the population (N). In many cases the user tries to fine tune these parameters by making a number of runs on different "representative" case problems (see, for instance [7], [8]). In a self-adaptive GA the three parameters are included as an extension of the genome in such a way that the parameters evolve along with the individual. The idea behind self-adaptation is that not only does the GA explore the solution landscape but the parameter landscape as well. In this way, the genome is divided in two sub-genomes: a strategy genome and a solution genome. In terms of a classical GA, the solution genome corresponds to what has simply been referred to as the genome. Both sub-genomes are subject to the genetic operators. We should consider the parameter sub-genome as a set of three sub-genomes which are functionally independent. The self-adaptive genome is shown in figure 2. Notice that the size of the population N is implicitly dealt with by considering not the populations size itself, but rather the number of descendants product of crossover. In this case of self-adaptation the way the operators affect the performance of the GA takes into consideration the population (for any given generation) as a whole. In a sense, this self-adaptive GA is aimed at improving the mean values of the population rather than the values of each individual. a) Probability of mutation. As in the individual self-adaptive scheme, pm is encoded in every individual. Here, however, the mutation rate for the whole population in the k-th generation gk is calculated as follows: ( p m )k 1 N
N
( p m )i
i 1
Therefore, the mutation operators rate is fixed for all the individuals during gk.
Pm Mutation Probability l pm
Pc Crossover Probability l pc
N Number of Descendants n
Individuals Encoding
STRATEGY SUB-GENOME
SOLUTION SUB-GENOME
FULL GENOME Figure 2. Self-Adaptive Genome b) Probability of Crossover. pc is, as before, encoded in the genome of every individual. Now, in generation gk the crossover rate is given by ( p c )k 1 N
N
( p c )i
i 1
Here, again, the crossover operators rate is fixed for all the individuals during gk. c) Number of Descendants. As before, n, the number of descendants, is encoded in the genome of every individual and, as before, the number of descendants nk in the k-th generation is given by nk 1 N
N
ni
i 1
As in the two preceding cases, the number of descendants is fixed for all the individuals during gk. Utilizing a self-adaptive strategy one is freed from offset arbitrary parameter selection, to a certain extent. It is usual to set upper bounds on the possible values encoded in pm, pc and n. In that sense, there is still an arbitrary selection of initial parameter values. However, individuals which represent the better parameters for the particular problem are allowed to learn from the problem that is being solved. The self-adaptive alternative has been shown to compare favorably [9] with traditional fixed parameter GAs. At this point we have all the elements to propose what we shall call a Universal Eclectic Genetic Algorithm. Eclecticism here refers to the fact that we are willing to adopt the strategies we consider best regardless of the problem. In fact, we arrive at a mixed algorithm: strictly not merely a GA any more. Universal is meant to stress the fact that the variation of the GA to be discussed is applicable to a wide range of problems without the need for special considerations.
e) Annular Crossover. In annular crossover, the genome (as shown in figure 3) is no longer seen as a linear collection of bits but rather as a ring whose leftmost bit is contiguous to its rightmost bit. When applying annular crossover, there are two parameters to consider for each interchange: a) The starting crossover locus. That is, where the segment to be extracted from the individual starts. b) The length of the semi-ring. That is, how many genes of the individual are to extracted. Clearly, for a genome of length l there are l possible locus and l-1 possible lengths. Figure 3. Annular Genome.
Figure 4. Annular Crossover.
When extracting the first individuals genes, however, we must no longer concern ourselves with position encoding dependencies. An example of annular crossover is shown in figure 4. As already mentioned, this algorithm approaches the behavior of an idealized Genetic Algorithm. In including a self-adaptive behavior it modifies its behavior without impairing the desirable characteristics. f) Adaptive Hill Climbing. The Random Mutation Hill Climber (HC) is capable of outperforming the SGA in certain functions[10]. To take advantage of the hill climber in cases as this one we include a RMHC as part of the algorithm. That is, the EGA consists of a self-adaptive GA plus an HC. How do we determine when the HC should be activated instead of the GA proper? We do this with a self-adaptive mechanism which we describe next. a) The first step is to define two bounds: 1) The minimum Hill Climber percentage ( ). 2) The maximum Hill Climber percentage ( ). where
0 < < 1 The HC algorithm will then be activated, at least, of the time and, at most, of the time. b) As a second step, we must define two other bounds: 1) The minimum number of function evaluations to be performed by the HC upon invocation ( N ). 2) The maximum number of such function evaluations ( N ). These two bounds are given as a percentage of the populations size N. The actual percentage of HC function evaluations (relative to N) is included in the strategy sub-genome and is subject to the genetic operators. The actual number of evaluations upon HC invocation in generation k is given by ( N )k 1 N
N
( N )i
i 1
c) At generation gk the hill climbers effectiveness is evaluated from where 1 if individual was found by the HC 0 otherwise 1 N
N
( )i
i 1
To be able to determine s value we must include a new element in the genome. This element is of type boolean. It will be set when the individual has been found by the HC and reset otherwise. The genome for the EGA is shown in figure 5. In it we may find all the elements for the self-adaptive scheme and the HC algorithm. Pm Mutation Probability l pm Pc Crossover Probability l pc N Number of Descendants n Originated by HC HC Function Evaluations N
Individuals Encoding
STRATEGY Figure 5. Self-Adaptive Genome for Eclectic Genetic Algorithm. d) Denoting the probability of invoking the HC by , we have: if < if if >
SOLUTION
e) Generate a random number K , where 0 < K 1 . Prepare to invoke the HC if K . f) Once the HC is scheduled to start, the string upon which it will operate is selected randomly from the first five in the population. Recall that the individuals in the population are ordered from best (individual 1) to worst (individual N). Therefore, to select the string upon for the HC to operate, we make
R5
where R is a random number uniformly distributed and 0 < R 1. The result of the strategy just outlined is to guarantee that the HC will be active whenever it has proved to be effective. The probability that the HC will override the GA proper is, however, bounded above by . This prevents the HC from taking over the whole process. On the other hand, the HC will be invoked with probability which, in turn, avoids the possibility that due to poor performance of the HC at some generation gk, the HC is shut out for the rest of the process. In essence, therefore, the EGA incorporates an HC process which is selfadaptive on two accounts: a) Because its activity is determined by its effectiveness, and b) Because the adequate number of function evaluations is evolved as the GA unfolds. The name "Hill Climber" refers to the fact that these processes are thought to zeroin on optimality points when the algorithm has reached a neighboring space in the solution landscape. Here, however, the HC serves a double purpose: a) It does indeed zeroin to local, close maxima, and b) It enforces population variety by exploring new schemas which the GA would otherwise pass by. The most striking fact about this mixed mode (GA-HC) algorithm is that it is the GA which actually does the fine search of local optima, with the HC serving as a triggering agent which locates suboptimal solutions quite efficiently. In the literature two are the recognized causes for a given GA to perform poorly: a) Deception. b) Spurious correlation. We refer the interested reader to the references for a discussion of these two elements [11],[12]. Mitchell et al [13] have discussed a GA called an "Ideal" GA from which she was able to conclude that, for a GA to perform adequately in the vast majority of cases it should comply with certain characteristics. The VM allows the algorithm to trap the best schemas without restricting the search space in a sensible way. In this fashion we are able to approximate the said IGA. We now make a brief review of the demands for any GA to approach an IGA: 1) No single locus is fixed at a single value in a large majority of the strings of the population. This is clearly achieved because we are deterministically disrupting the troublesome locus by best-worst crossover. 2) Selection has to be strong enough to preserve desired schemas that have been discovered but also should prevent significant hitchhiking on some highly fit schemas. Because we are working with full elitism we preserve desired schemas and, as before, we avoid hitchhiking by crossing over dissimilar individuals. 3) The crossover rate has to be such that the time for crossover that combines two desired schemas is small with respect to the discovery time for such schemas. Full elitism guarantees that, even in the Vasconcelos strategy, the worst individual for population k is in the top 1/k percentage. For example, if n = 50 and we look at the 20th population, the individuals in such population are among the best 5% of the total individuals analyzed. This characteristic is incrementally exposed as the GA proceeds. 4) In the analysis of the hill climber it was assumed that the fitness function consists of N adjacent blocks of K ones each. The expected number of function evaluations, ( K , N ), for RMHC to find the optimal string of all ones is 2K N ( ln N ) . On the other hand, the expected number of function evaluations for the IGA is N 2K ( ln N ). The string has to be long enough so that the speedup factor N in the last equation is significant. This is the only element we cannot guarantee. It seems that for relatively small Ns hill climbers may outperform our Genetic Algorithms. However, by introducing the mixed mode GA-HC as already described, we are able to circumvent this last limitation.
Implementation. A universal eclectic algorithm was implemented along the lines already described. The system consists of three steps: a) Definition of the prisoners dilemma parameters. b) Definition of the EGA parameters. c) Interpretation of the results. The main windows for the aforementioned processes are shown in the following four figures:
In the figure we can see the settings: a) The number of back plays. Here we may specify how many plays back we want to consider. In this case we have restricted the plays to three, as stated in the main text. The length of the genome is calculated automatically. As pointed out, it has a length of 64. b) The rewards are to be specified below. These rewards have been kept as in Axelrods experiments. These may vary depending on our settings. c) We also specify the number of strategies to test. That is, we may vary the size of the round robin. These strategies are calculated as described in the text. d) The seed to the random generator is finally set. Here we arbitrarily chose the number 3.1416. Figure 6. Prisoners Dilemma Parameter Settings.
Figure 7. Eclectic Algorithm Parameters Settings.
In the figure on the left we may see how the parameters for the GA were set. On the column on the left we established the bounds for the crossover (Pc), mutation (Pm), number of descendants (n) and hill climber percentage ( N ). Lower and upper bounds are specified. For crossover and hill-climber the number should be divided by 100. For mutation it should be divided by 1000. The parameters on the right column are self-explanatory, except for Windows/DOS and the
Data Format. These refer to whether the external function (in this case the Prisoners Dilemma program) is to run in DOS or Windows mode; the data format specifies whether the information that passes from the main module of the EGA is assumed of type ASCII or type .Dbf1
Once the program runs, we have to interpret the genome of the best individuals. In this case the encoding was simple enough: every D ("Defect") play was encoded as 0; every C ("Cooperate") play was encoded as 1. In the window on the left we have specified the name of the file where the results are to be found. The best strategys fitness is a reported 2.50. We have to take in consideration that this fitness, as described above, is a function of the selected canonical strategies resulting from Algorithm PD.
Figure 8. Prisoners Dilemma Main Results.
Figure 9. The best Strategy.
The best strategy is depicted in the figure above. The play for the sequence "DDDDDD" corresponds to the leftmost character in the string (in this case, a "D"); the play for the sequence "DDDDDC" corresponds to the next character to the right (in this case, also a "D"), etc. Conclusions. As shown, a non-numerical problem is susceptible to be solved using a genetic algorithm. Furthermore, the applied methodology avoids the need to establish a priori selection of subjectively determined "best" strategies. In
.DBF files conform to the Dbase standard.
10
this regard, it is interesting to point out that one alternative, yet to be explored, is the one of forcing the strategies to co-evolve. This strategy has been explored in [14] but in a rather different setting. Here, the setting would be as follows: a) Select a set of strategies as per algorithm PD. Call this set C. b) Trigger the genetic algorithm. c) From the evolving population P and C select the best N strategies (where N is the number determined in the parameter (first) window. d) Go to (b) while convergence is not reached. In this fashion, the contesting strategies will co-evolve along with the individuals of the population of the genetic algorithm. This methodology proved, in the cited paper, to be an effective way to increase the adaptive capacities of both the host (best strategy) and the parasite (set C). In any case, an eclectic algorithm as the one above guarantees that the whole process will converge efficiently without unduly concern as to the particularities of the Prisoners Dilemma.
References. 1. Hofstadter, Douglas, The Prisoners Dilemma, Computer Tournaments and the Evolution of Cooperation, in "Methamagical Themas", pp. 715, Bantam Books, 1983. 2. Michell, M., An Introduction to Genetic Algorithms, Ch. 1., MIT Press, 1996. 3. Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Publishing Company, 1989. 4. Mitchell, M., An Introduction to Genetic Algorithms, Complex Adaptive Systems Series, MIT Press, 1996. 5. Vose, M., Generalizing the notion of schema in genetic algorithms, Artificial Intelligence, 50, (1991), 385-396. 6. Goldberg, D., Op. Cit. Ch. 2. 7. , De Jong, K. & Spears, W., An Analysis of the Interacting Roles of Population Size and Crossover in Genetic Algorithm, Naval Research Laboratory Article Repository, 1996. 8.Kuri, A. & Galavz, J., A Self-Adaptive Genetic Algorithm for Function Optimization, Proceedings ISAI/IFIPS, Nov. 1996. 9. Spears, W. & Anand Vic, A Study of Crossover Operators in Genetic Programming, MIT Press, 1995. 10. Mitchell, M., Royal Roads in Genetic Algorithms, San Jos State University, 1996. 11. Mhlenbein, Heinz, How Genetic Algorithms Really Work I. Mutation and Hillclimbing, Proc. of the Fourth International Conference on Genetic Algorithms, 1991. 12. Goldberg, D., Op. Cit., Ch. 5. 13. Mitchell, M., Op. Cit. pp. 153. 14. Hillis, W.D., Co-evolving parasites improve improve simulated evolution as an optimization procedure, Phisica D 42: 228-234.
11

07prisoner's Dilemma

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

07prisoner's Dilemma

Загружено:

Авторское право:

Доступные форматы

A SOLUTION TO THE PRISONERS DILEMMA USING AN ECLECTIC GENETIC ALGORITHM Angel Kuri M.

Cooperate Cooperate Defect -2,-2 0,-5

Defect -5,0 -4,-4

Cooperate Cooperate Defect 3,3 5,0

Defect 0,5 1,1

GENERATION 1 Best n Individuals 2 ... k

Figure 1. Full Elitism.

Figure 4. Annular Crossover.

Figure 7. Eclectic Algorithm Parameters Settings.

Figure 8. Prisoners Dilemma Main Results.

Figure 9. The best Strategy.

.DBF files conform to the Dbase standard.

Вам также может понравиться