Вы находитесь на странице: 1из 7

Full Paper

Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011

A Dynamic Reinforcement Learning Strategy for Seller Agents in e-market


Vibha Gaur, Neeraj Kumar Sharma, Punam Bedi Department of Computer Science University of Delhi, Delhi, India 3.vibha@gmail.com, neerajraj100@gmail.com, pbedi@du.ac.in
Abstract Agent mediated e-commerce provides a platform where buyers and sellers exchange goods and services in emarket through software agents. The success of an agent mediated e-commerce system depends on the underlying reputation system that provides mechanisms to improve the quality of services in the e-market. Reputation systems induce sellers to exert high effort to show honesty for increased buyer satisfaction. But there is scant focus on improving seller satisfaction. This paper proposes a strategy where seller agents allocate reputation rating to buyer agents using reinforcement learning in order to increase satisfaction of seller as well buyer agents by building long term relationships with buyers. The sellers strategy is dynamic as it monotonically increases the reputation rating of a buyer agent with the value of a transaction and the sellers profit from that transaction. This strategy leads to improved buyer satisfaction as during a transaction a buyer is allowed to redeem a part of its accumulated reputation rating as an incentive for being a faithful buyer. It also improves sellers satisfaction as it encourages long term buyer-seller relationship by offering larger redemption value to buyers who are involved in repeated transactions thus helping a seller to maximise its long term gains. Index Terms Software Agent, Reputation, e-market, Reinforcement Learning

I. INTRODUCTION Reputation mechanisms are an important component of electronic markets, helping build trust and elicit cooperation among loosely connected and geographically dispersed economic agents. Reputations are transmitted in electronic markets where players repeat transactions but rarely with the same player [5]. Hence, it is a challenge to induce buyers into repeated transactions with the same seller that results into long term gains for the seller. The objective of reputation systems is to induce sellers to exert high effort to show honesty as often as possible [6]. The definition and meaning of reputation varies with different applications and contexts. From an objective view, reputation is expressed as a quantity derived from the underlying social network which is globally visible to all members of the network [11] or, a perception that an agent has of anothers intentions and norms [5]. Existing reputation systems ensure satisfaction of buyer agents by exerting high effort to protect buyers from
2011 ACEEE

dishonest sellers [7, 8, 9, 10] but there is scant focus on maintaining a balance by ensuring the satisfaction of both buyer and seller agents. The e-market environment in which these agents operate is open, that means agents can join or leave the marketplace at any time; uncertain, because the true worth of a good can be judged only after purchase; and un-trusted, that is e-market comprises of honest as well as dishonest agents [14]. The e-market is populated with self interested buyer and seller agents that try to maximise their respective gains. The process of buying and selling in e-market can be viewed as an auction which is initiated with a buyer announcing the need for a good. In response, a set of sellers bid to sell that good. A buyer then calculates the expected value of the offered good by each seller that places a bid [1]. On the basis of the reputation of seller agents and the expected value of the good offered by each seller, a buyer decides to purchase the good from a seller offering the good with the highest expected value. Once a good is purchased, the buyer and seller update the reputation of each other on the basis of their respective experience during the transaction. The sellers strategy for buyer-seller satisfaction proposed in this paper uses reinforcement learning (RL) techniques which provide a general framework for sequential decision making problems and they are proved efficient for many important applications [15]. RL deals with how an agent should take actions, at every state that it can be, in order to have the maximum long term reward. RL is learning what to do, how to map situations to actions, so as to maximize a scalar reward signal. The learner is not told which action to take, but instead must discover which actions yield the most reward by trying them. In some cases, actions may affect not only the immediate reward, but also the next situation and all subsequent rewards. These two characteristics, i.e. trial-anderror search and delayed reward are the two most important distinguishing features of RL. This paper proposes a RL strategy for sellers to increase the satisfaction of both buyer and seller agents by building long term relationships with buyers. A seller allocates reputation rating to a buyer that reflects its experience with that buyer during previous transactions using RL. Initially,

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011 every new buyer is allocated a reputation rating of zero. The sellers strategy involves the following activities. Firstly, a seller updates the reputation rating of the buyer on the basis of its experience of the current transaction. Secondly, it gives an incentive to the buyer by allowing the buyer to redeem a part of its accumulated reputation rating during future transactions with the same seller. A seller makes available, a greater percentage of accumulated profit for redemptions to a buyer with each successive transaction as a reward for being a faithful customer. This encourages long term relationship of the seller with that buyer culminating into increased chance of repeated business opportunities with the buyer. Thirdly, the reputation rating assigned to a buyer is monotonically related to the value of a transaction and, the amount of profit earned by the seller from that transaction. Finally, in case a seller loses a particular bid to sell a product to a buyer, then in future it offers the same good with reduced price or enhanced attributes to that buyer in order to meet the expectations of that buyer and to win future bids. As in the proposed sellers strategy, the reputation of a buyer monotonically increases with the sellers profit from the transaction and the value of transaction. Therefore, high reputation rating of a buyer being allocated by a seller is a reflection of long term buyer-seller association that has resulted into greater economic gains for the seller in the form of high accumulated profit from that buyer during previous transactions. According to proposed strategy, a seller can offer various versions of the same product to different buyers. Further, in case of an unsuccessful bid, a seller can offer the same good to the same buyer but with enhanced attributes or/and reduced price in the quest to win future bids with that buyer. Thus seller agents learn to maximise long term profit by consolidating their relationship with buyers. The rest of this paper is organized as follows. Section II gives an overview of related work. Section III presents the dynamic seller s strategy for improved buyer-seller satisfaction. Section IV includes a case study. Finally, section V concludes with directions for future work. II. RELATED WORK A number of reputation models [7, 8, 9, 10] in the literature primarily focus on the satisfaction of buyer agents only by allocating reputation to seller agents. Few of the reputation models that also focus on the satisfaction of seller agents by giving reputation to buyer agents are discussed below in this section in brief. Reputation models [1, 3] are based only on direct evidence to maximise the gains of buyer and seller agents. In the reputation model for increasing user satisfaction [1, 2], buyer agents assign the reputation to seller agents and seller
2011 ACEEE

agents improve their satisfaction by adjusting the price and quality of their goods to maximise their profit. A reputation model [3] is described for an e-market that computes reputation of both buyer and seller agents by using multifacet reputation concept where reputation is computed using quality, price and delivery time of goods. The most popular online auction site eBay includes feedback forum as a reputation mechanism. In this system, a buyer can rate a seller in terms of feedback as positive, negative or neutral i.e. +1, -1 or 0 respectively [5]. A seller can also rate a buyer using positive or neutral feedback [13]. In addition, buyers or sellers can also leave comments about their satisfaction. In the reputation model [4], buyers and sellers allocate reputation rating to each other, with the purpose to make effective bids in the marketplace. It proposes a sellers bidding strategy that provide more attractive goods to reputable buyers, in order to build their own reputation. The seller algorithms used in the reputation systems in the literature described above [1, 2, 3, 4, 5, 13] are relatively static as in allocating reputation to buyers these do not incorporate the value of a transaction that is an important parameter in the e-market environment [14]. For sellers strategy to be dynamic, reputation gain in a larger transaction should be more as compared to a smaller transaction, as a large transaction leads to increased business and possibly greater profit for the seller agent. Further, in user satisfaction reputation model [1, 2], a seller penalises a faithful buyer i.e. the buyer from whom the seller is getting repeated business, by offering the same good to the buyer in future transactions with lower quality. As it is extremely difficult to get repeated business in the e-market environment, a dynamic sellers strategy must reward instead of penalising faithful buyers. II. REINFORCEMNT BASED DYNAMIC SELLERS STRATEGY FOR IMPROVED BUYER-SELLER SATISFACTION The sellers strategy proposed in this section to increase buyer-seller satisfaction is dynamic, as the reputation of a buyer agent increases monotonically with the value of a transaction as well as with the amount of profit earned by the seller from the transaction. The proposed strategy makes it attractive for buyers to indulge in repeated transactions with the same seller in future as the buyer gets special treatment by getting an opportunity to redeem a part of its accumulated reputation points as a reward for being a faithful customer. Further, the proposed strategy offers greater percentage of accumulated profit as redemption value to customers who engage in repeated transactions as reward with the purpose to enhance long term gains by building buyer-seller relationships.

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011 The proposed sellers strategy is based on the e-market model having a set of buyers and sellers. In this model, sellers are divided into four categories, namely, reputed, nonreputed, dis-reputed and new sellers [1]. Buyers allocate reputation rating to sellers in the range [0,1]. At any given time a buyer preferably selects a seller from the list of reputed sellers but in no case, it selects a dis-reputed seller [14]. Before purchasing a good, the buyer computes expected value of the offered good by each seller and places an order to a seller that is offering the good with the highest expected value. Once the buyer receives the good after purchase, it computes the actual value of that good. If the actual value of the good is more than its expected value, buyer increases the sellers reputation; otherwise it decreases sellers reputation. As per the proposed sellers strategy, after each transaction, a seller agent allocates reputation rating to the buyer. The amount of increase in the reputation of a buyer is monotonically related to the value of the transaction and the profit earned from the current transaction with the intention to increase the satisfaction of the seller by maximising its long term gains. This paper does not allocate negative reputation ratings to buyers based on the assumption that all buyers are honest and no sellers are interested in losing their customers. Let S be the set of sellers, G be the set of goods and B be the set of buyers and in one transaction only a single good is bought or sold. Let be the cost price of a good g with attributes a for the buyer b and let be the selling price of the good g with attributes a for the buyer b that is used ahead in abbreviated form as . Assume that the seller s has received a request from buyer for The effect of the change in value of a transaction and sellers profit from a transaction is illustrated in Fig. 1. good . A seller can produce different versions of the good g based on buyers requirements. These versions may differ in terms of the price offered and the attribute set of the good g. A seller sets a minimum price of an item over and above the cost price to cover operational costs. Let function represents the expected profit of a buyer b from good g based on the existing market factors. On the basis of the expected profit function , buyer b sets a maximum price for a good g. Let and be the minimum and maximum selling price respectively for a good g and are calculated as, where, is the lower bound on the percentage of profit transaction with the buyer b, using RL, seller s updates the reputation rating of a buyer b at time t +1 as,

Where x is the value of the transaction, which in case of a single good being purchased in a transaction is equal to the price p of the good g, is a constant in the range [0,1] and e is a constant with an assumed value of 1.01. Equation (5) ensures that is monotonically increasing with x in the range [0,1] and, as a result in (4) and the reputation rating in (3) also increases monotonically with the increase in value of transaction. The monotonic increase in reputation rating of a buyer along with the increase in value of a transaction x and greater increase for higher profit by assuming previous reputation rating i.e. and, minimum and maximum prices 5% and 30% more than cost price is shown in Table I.

expected over the cost price of good g, for the seller s, and, where, is the upper bound on the percentage of profit expected over the cost price of good g for the seller s. Let be the reputation rating allocated by seller s to buyer b. Initially,
2011 ACEEE

Figure 1. Monotonic increase in reputation rating of a buyer with the increase in value of a transaction (x) and higher sellers profit

Once a seller s offers to sell a good g to buyer b, there can be two situations: Either the seller wins the auction and gets the order to supply good g or, The seller loses its bid in the auction. Then it would try

and with each successive

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011 to attract the same buyer in future auctions for that good with an improved offer. In the first situation, if the buyer b purchases good g from seller s, the seller s awards an increase in reputation rating of buyer b. After the transaction, buyer b is allowed to redeem a part of its accumulated reputation rating with the seller s. The maximum reputation rating allowed/open for redemption in a single transaction is based on the satisfaction of three conditions. First, the part of reputation rating open for redemption should be more than a minimum threshold represented by where 0 . Second, a buyer is allowed to redeem only up to a certain fraction of the value of the current transaction represented by . The computation of is shown in (6) to ensure that the seller recovers at least the cost price of the good in each transaction. (6) The minimum threshold for reputation rating that must be maintained and hence cannot be redeemed in a single transaction depends on the type of good being traded and are decided by the domain experts. The minimum threshold on the reputation ratings below which redemption is not allowed, facilitates long term buyer-seller relationships because to avail maximum benefit of its existing reputation ratings, buyer b must purchase again from seller s. But, in case, buyer b decides to purchase from a different seller, it must earn the minimum amount of reputation ratings from that seller over which it can avail the benefit of redemption. Third, to get the benefit of redemption, buyer b should be involved in at least one transaction with seller s in a maximum allowed time period which can be decided by the domain experts. This condition encourages more frequent transactions between the buyer-seller pair. Moreover, the maximum limit on redemption allowed in the current transaction ensures that even after offering redemption, the seller is able to cover the cost price of the good in each transaction. The reputation rating that is allowed to be redeemed as discount is shown in (7). Where, rep_units represent reputation units and, is the fraction of total profit which can be redeemed as discount. As the reputation rating of a buyer increases beyond the minimum threshold (say ), the value of is initially fixed to a minimum (say 0.05). For each successive transaction, the value of increases by a small amount (say0.001) upto a maximum value equivalent to a certain percentage of transaction value and is decided by domain experts. This ensures that a faithful buyer having large number of previous transactions with the same seller is allocated
2011 ACEEE

a greater percentage of accumulated profit for redemption as compared to a relatively new buyer. The monotonic increase in the percentage of accumulated profit available for redemption with increase in number of transactions between a buyerseller pair is shown in Table II.

To encourage long term buyer-seller relationships, the monotonically proportional relationship between the percentage of accumulated profit offered for redemption by a seller and the number of transactions is illustrated in Fig. 2.

Figure 2. Increase in percentage of accumulated profit available for redemption to encourage long term buyer-seller relationships

The accumulated profit,

earned over all the

preceding transactions is computed in (8) as,

As per the condition of retaining a minimum reputation threshold, rep_units are available for redemption and (7) is used to convert rep_units to price units and viceversa. Let be the part of reputation rating, which the buyer b wants to surrender for redemption based on the satisfaction of the three necessary conditions for redemption i.e. the minimum threshold of reputation rating, the upper bound on the percentage of the value of current transaction and at least one previous transaction in the maximum allowed past time. Once a part of reputation rating is redeemed, the buyer b gets a discount equivalent to the value of the redeemed units, the discounted price to pay is shown in 10 that the buyer b has

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011 This algorithm ensures greater satisfaction of both buyer as well as seller agents because of the following reasons. Buyer agents are encouraged to buy from the same seller again as they can get the benefit of redemption by using their surplus reputation rating i.e. amount of reputation rating more than the minimum threshold. Further, satisfaction of a seller agent is increased as to get the maximum benefit of redemption, a buyer is encouraged to purchase from the same seller repeatedly. This helps the seller agent in building long term relationship with a buyer agent, resulting in increased business, greater profits and increased satisfaction for the seller. In the second situation i.e. if the seller loses a bid, the seller agent has two choices. It may choose to either reduce the price of the good g to buyer b or, to improve the attributes of the good by incurring increased cost price in future transactions. If a seller decides to offer a good at reduced price to the buyer, it results in reduced profit for the seller as shown in (12) ahead. (12) In (12), represents reduced price factor resulting in decrease in the price of good g, p represents current price and represents reduced price for a future transaction with that buyer. Alternatively, the seller may decide to provide the good g with enhanced / increased attributes to buyer b in future transactions resulting in greater cost as shown in (13). (13) In (13), is the enhanced attribute factor resulting in increased cost of the good g, and hence reduced profit to the seller s. In both the choices, the seller has to be satisfied with reduced profit but in turn it increases its chances of gaining business from buyer b in future transactions. The proposed sellers strategy to enhance the satisfaction of both buyers as well as sellers is summarized below: 1. If the seller wins a bid, perform steps 2 to 9 otherwise go to step 10. 2. After selling the good g, compute the updated reputation of buyer b after incorporating the increase in reputation using reinforcement learning as per (3), (4) and (5). 3. Compute the value of each reputation unit i.e. rep_unit in terms of price units according to (7) where the percentage of accumulated profit open for redemption is monotonically increasing with the increase in the number of buyer-seller transactions. 4. Compute the reputation units open for redemption based on satisfaction of the following three conditions: a. Reputation units open for redemption should be over and above the specified minimum reputation threshold.
2011 ACEEE

Reputation units open for redemption should be less than or equal to the reputation units equivalent to the maximum percentage of the value of transaction open for redemption after covering its cost price. c. There should be at least one previous transaction in the maximum allowed time period. 5. Compute the maximum transaction value open for redemption in the current transaction that is based on the minimum of the value allowed as per conditions 4a and 4b above. 6. Obtain the amount of reputation units that the buyer is willing to surrender in the current transaction. 7. Set the final redemption value for the current transaction as the minimum of the transaction values found in conditions 5 and 6 above. 8. Offer the good to buyer b at the discounted price after subtracting the redemption value from the value of transaction as per (10). 9. According to (11), update the reputation value of the buyer b after subtracting the redeemed reputation units from the previous reputation value. 10. If the seller loses a bid, offer the good to buyer b in future bids with reduced price or/and enhanced attributes as per (12) and (13). The sellers strategy proposed above is relatively dynamic as it increases the reputation rating of a buyer agent monotonically with the value of a transaction and the percentage of profit earned by the seller from that transaction. This strategy leads to improved buyer-seller satisfaction as during a transaction a buyer is allowed to redeem a part of its accumulated reputation rating as an incentive for being a faithful buyer. It also improves sellers satisfaction by helping a seller to maximise its long term gains as it encourages long term buyer-seller relationship by offering larger redemption value to buyers who are involved in repeated transactions and hence are instrumental in providing greater profit to that seller. IV. CASE STUDY To illustrate the application of proposed reputation framework, a case study was conducted by simulating an electronic marketplace having four buyers and six sellers, i.e. B = {bi where i = 1...4} and S = {si where i = 16}, where B is the set of buyers and S is the set of sellers in the marketplace for a good g. Some scenarios in the marketplace are described ahead. In a particular scenario, buyer b1 wanted to purchase a good g. Sellers s1 and s3 responded to sell good g to buyer b1. Let =400, =428, =520, =0.03, =0.36 and, =3500, =0.1, =0.001,

b.

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011 =0.4. The minimum thres-hold of reputation rating above which redemption was allowed i.e. =0.2 and, the upper limit on the percentage of the value of transaction open for redemption was 25 percent i.e. =0.25. The buyer b1, selected seller s3 as the winner of the auction at price p=460 and sent a message to all sellers who had sent their bid to sell good g to buyer b1. The strategic behaviour of s1 and s3 is shown below.

V. CONCLUSION AND FUTURE WORK This paper proposed a reinforcement learning strategy for seller agents to increase the satisfaction of both buyer and seller agents by maximising their long term gains. The seller strategies in the literature [1, 3] are relatively static in nature as these do not give any importance to the value of a transaction and exercise no effort to gain repeated transactions from a particular buyer in the e-market while allocating reputation to that buyer. The proposed strategy ensures long term buyer-seller association by assigning reputation rating to a buyer where the change in reputation rating is monotonically related to the value of transaction and the accumulated profit from all the successful transactions with that buyer. A seller associated in long term relationship with a particular buyer is allowed to redeem a larger percentage of the accumulated profit as compared to a relatively new buyer thus again encouraging long term buyer-seller relationships. The proposed strategy is dynamic, as a larger transaction and more profit for a seller leads to greater increase in reputation rating of a buyer as compared to a smaller transaction and lesser profit. In this strategy, a seller can adjust the price and attributes of the good to enhance its chances of being selected by a particular buyer. Future directions to this work involve computing the best selling price between the maximum and minimum price range and developing seller selection algorithm for a buyer to decide from which seller it should purchase a good. REFERENCES
[1] Tran T., and Cohen R., Improving user satisfaction in agentbased electronic marketplaces by reputation modelling and adjustable product quality, Proceedings of AAMAS04, pp.828-835 (2004). [2] Tran T., and Cohen R., A Strategy for Improved Satisfaction of Selling Software Agents in E-Commerce, SpringerVerlag Berlin, LNAI 2671, pp. 434-446, 2003. [3] Omid Roozmand, Mohammad Ali Bematbaksh, Ahmad Barrani, An Electronic marketplace based on Reputation and Learning, Journal of Theoretical and Applied Electronic Commerce Research, ISSN 0718-1876, VOL2/ISSUE 1/ APRIL 2007/1-17. [4] Jie Zhang, Robin Cohen, Seller Bidding in a Trust-Based Incentive Mechanism for Dynamic E-Marketplaces, Springer-Verlag, LNAI 5032, pp. 368-379, 2008. [5] Resnick P., Zeckhakuser R., Friedman E., Kuwabara K., Reputation Systems, Comm. ACM 43-12, pp. 45-48, 2000. [6] Chrysanthos Dellarocas, Reputation Mechanism Design in Online Trading Environments with Pure moral Hazard, Information Systems Research, Vol. 16, No. 2, pp. 209-230, 2005.

Using (7), 0.4014 rep_units = 0.1 * 3560 price units, so, 1 rep_unit = 356/0.4014 = 886.9 price units, and, 1 price unit = 0.4014/356 = 0.00113 rep_units. According to minimum reputation threshold condition, reputation units open for redemption were 0.4014 0.2 = 0.2014 rep_units which as per (7) was equivalent to 0.2014* 886.9 = 178.6 price units. Further, 25 percent of the value of transaction open for redemption in the current transaction was 0.25 * 460 = 115 price units, which was equivalent to 115 * 0.00113 = 0.12995rep_units. Choosing the lower value among the two conditions, buyer b1 was allowed to redeem a maximum of 0.12995 rep_units or equivalently 115 price units in the current transaction. The buyer b 1 wanted to surrender 0.1 rep_units for redemption, so the equivalent discount he received was 0.1 * 886.9 = 88.69 price units that was less than the allowed limit of 115 price units. Therefore, the discounted price that buyer b1 paid was, = 460 88.69 = 371.31 price units. Using (11), the reputation units left with the buyer b1 were, 0.4014 0.1 = 0.3014 rep_units. Seller s1 could not succeed in selling good g to buyer b1 at price p resulting in a conclusion that the bid of s1 was not the most satisfactory for the buyer b1, among all the bids which buyer b1 received. So, seller s1 was left with the option of either decreasing the price (depending on the profit margin he was charging in the last bid) or enhancing the attributes of the good g, or both. In this situation, by using (13), seller s1 decided to enhance the attributes in future offers of good g to buyer b1, resulting in its increased cost, equivalent to the enhanced attribute factor = 0.06 as, (1 + 0.06) * 400 = 424 price units. This decision resulted in reduced profit for seller s1 in possible future transactions with buyer b1 for good g, but at the same time increased the chances of seller s1 being chosen by buyer b1.
2011 ACEEE

DOI: 01.IJRTET.05.01.142

Full Paper
Int. J. on Recent Trends in Engineering & Technology, Vol. 05, No. 01, Mar 2011
[7] Patel J., Teacy W.T.L., Jennings N. R., and Luck M., A Probabilistic Trust Model for Handling Inaccurate Reputation Sources, Springer-Verlag Berlin Heidelberg 2005, LNCS 377, 193-209, (2005). [8] Sabater J., Sierra C., REGRET: Reputation in Gregarious Societies, Proceedings of the Fifth International conference on Autonomous Agents. Montreal, Canada, 194-195, (2001). [9] Yu B. and Singh M.P., An evidential model of Distributed Reputation management, In Proceedings of AAMAS02, New York: ACM Press, 294-301, (2002). [10] Zong B., Xu F., Jiao J. and Lv. J. A Broker-Assisting Trust and Reputation System Based on ANN, IEEE, (2009). [11] Josang A., Ismail R., and Boyd C., A survey of Trust And Reputation Systems for Online service provision, Decision Support Systems, vol. 43, no. 2, 618-644, (2007). [12] Huang H., Zhu G, Jin S., Revisiting Trust and Reputation in Multi-Agent Systems, Proceedings of ISECS International Colloquium on Computing, Communications, Control, and Management, 424-429, (2008). [13] http://www.ebay.com/. [14] Vibha Gaur, Neeraj Kumar Sharma, Punam Bedi, Evaluating Reputation Systems for Agent Mediated e-Commerce, ACEEE conference International Conference on Advances in Computer Science, ACS 2010, Kerala, India, December 2010. [15] George Boulougaris, Kostas Kolomvatsos, and Stathes, Building the Knowledge Base of a Buyer Agent Using Reinforcement Learning Techniques, WCCI 2010 IEEE World Congress on Computational Intelligence, July, 18-23, 2010 CCIB, Spain.

2011 ACEEE

DOI: 01.IJRTET.05.01.142

Оценить