Вы находитесь на странице: 1из 6

2010 IEEE 21st International Symposium on Personal, Indoor and Mobile Radio Communications Workshops

On Spectrum Sharing With Underlaid Femtocell


Networks
Mehdi Bennis

Merouane Debbah

Centre for Wireless Communications


University of Oulu
Oulu, Finland
Email: bennis@ee.oulu.fi

Alcatel-Lucent Chair
SUPELEC
Gif-sur-Yvette, France
Email: merouane.debbah@supelec.fr

AbstractIn this paper, we investigate the problem of spectrum


sharing where a macro base station (MBS) is underlaid with
multiple femto base stations (FBSs). This problem is investigated
from a game theoretic perspective where two games are herein investigated. First, in the non-cooperative case, the MBS and FBSs
(i.e., players) behave selfishly aiming at improving their respective
payoffs (achievable rate), whereas in the second case a hierarchy
is introduced among players as a means to improve the overall
network efficiency. This problem is cast as a hierarchical game
with a leader-follower approach in which the MBS is designated
as the leader whereas FBSs are the followers. Furthermore, in
the case of incomplete game information, a learning mechanism
based on information exchange among players is investigated
in which the leader builds its own estimate of other playerss
strategies and strategically adapt its decisions to maximize its
expected utility.
Numerical results corroborate the fact that the hierarchical
model outperforms the non-cooperative approach in terms of
achievable rate and optimal number of deployed small cells (or
femtocells) in the network. Additionally, learning mechanisms
based on public information exchange are shown to outperform
the private information case as well as the selfish/myopic case.

I. I NTRODUCTION
Due to the ever increasing need for higher data rates, it
is envisaged that a two-tier cellular networks, comprising of
a central macrocell underlaid with short range femtocell hotspots offer an economical way to improve cellular capacity [1].
Therefore, wireless operators are on the verge of augmenting
the macrocell network with femtocell networks through which
higher throughput gains, better indoor coverage and lower
transmit power are expected.
Before femtocells can be rolled out by mobile operators, a
number of technical issues should be tackled such as interference avoidance, security, billing etc. Clearly, interference is
detrimental in femtocell networks due to cross-tier interference
in which macrocell users experience interference coming from
different femto BSs (in the downlink) while femtocell user
perceives significant interference from loud macrocell users
within the vicinity of the femtocell. Moreover, two types of
femtocell deployments are envisioned: closed access (CA)
where a macrocell user cannot be served by a femtocell base
and open access (OA) where a macrocell user can be handed
in to the femtocell. This work assumes CA which means only
licensed home users within radio range can communicate with
their own femtocell.

978-1-4244-9116-2/10/$26.00 2010 IEEE

In [2], the authors propose a power control game for CA


where macro-users are not allowed to connect to femto base
stations thus interfering with them. The authors derive a
fundamental relation providing the largest feasible macro-cell
Signal-to-Interference-Plus-Noise-Ratio (SINR) given any set
of feasible femtocell SINRs. Each femtocell maximizes its
individual utility consisting of a SINR-based reward less an
incurred cost (interference to the macrocell). In [9]- [10], the
authors propose that secondary users limit their transmission
powers for reducing interference to primary users. Sahai et
al [9] propose a no-talk region to protect primary users and
cooperative sensing amongst neighboring cognitive radios. In
contrast, enforcing a protected region for a macro-user or
inducing cooperation amongst neighboring femtocells may be
difficult due to scalability and security issues. In [10], the
authors propose a joint power and admission control scheme,
but provide little insight on how a cognitive radios data-rate
is influenced by a primary userss rate. From a system-level
perspective [4], the authors study the impact of femtocells
underlay deployment sharing radio frequency resources with
urban macrocells. Due to the random and uncoordinated
deployment of femtocells, the femtocell entities potentially
cause destructive interference to macrocell entities and viceversa. The authors consider both OA and CA. It is also mentioned therein that power control and more intelligent resource
scheduling in femtocells were needed to further reduce the
interference that femtocell entities cause to macrocell entities.
Game Theory (GT) [5] is a mathematical tool to analyze
the strategic interaction between (rational) decision takers. GT
has been used in several areas under the umbrella of wireless
communication problems. In this paper, a specific branch of
GT is studied to analyze the spectrum sharing problem, namely
hierarchical (also known as Stackelberg) games. Stackelberg
games [19] have been applied in the context of cognitive radios
where the desirability of outcomes depends not only on their
own actions but also on other cognitive radios. Stackelberg
formulation naturally arises in some contexts of practical
interest such as when users have access to the medium in
an asynchronous manner. Stackelberg is based on a leaderfollower approach where the leader plays his strategy before
the follower and then enforces it. In [13], a game theoretic
framework was proposed for fading multiple access channel

185

where the base station is the designated game leader with


the purpose of having a distributed allocation strategy (power
vectors) approaching all corners of the capacity region. In
[14], a two-level Stackelberg game is studied for distributed
relay selection and power control for multi-user cooperative
networks. The objective is to jointly consider the benefits
of source and relay nodes in which the source node is
modeled as a buyer and the relay nodes as the sellers. In [25],
inter-operator spectrum sharing is investigated from a game
theoretic perspective, in which operators are shown to benefit
from the concept of hierarchy through which better payoffs
are obtained. In [3], the authors study a similar problem in the
context of small-cells in which multiple transmitters communicate with multiple receivers. The assumption in their paper
is based on the fact that a particular user receives information
from several base stations simultaneously. This assumption
is however hard to justify in the context of femtocells. In
addition, the authors study only the non-cooperative setting
and no-learning mechanisms were investigated therein.
The main contributions of this paper are as follows: (a) the
strategic interaction of a macrocell underlaid with femtocells
in the same spectrum band, is modeled1 as a hierarchical
game, (b) The optimal number of base stations maximizing
the sum-rate of the network with and without hierarchy is
computed lending evidence to a paradox (see in the sequel),
(c) to alleviate the burden of channel state information in the
case of a game with incomplete information, the leader of
the hierarchical game learns the strategies of the other players
through efficient learning mechanisms. Finally, in this paper,
we focus on the downlink scenario with an emphasis on the socalled dead zones or coverage holes which have received
a lot of attention from academic, industrial and standardization
bodies.
Figure 1 depicts a cell with a macro-BS and two underlaid
femtocells. Closed Access (CA) deployment is assumed where
macro-user m1 (at the cell edge of femto-cell 1 interferes with
femto-BSs B2 and B3 , on the uplink. Moreover, two types of
interference are considered:
1) Interference at macro-BS B1 : femto-users f2 and f3
create interference at B1 upon communicating with their
respective femto-base stations.
2) Inter femto-cells interference: since we operate under the
CA scenario, a remote macro-user m1 located indoor is
not allowed to connect to the femto-BS, hence interferes
with femto-base stations B2 and B3 . On the other hand,
femto-BS receives interference from neighboring femtousers f2 and f3 .
The remainder of this paper is organized as follows: Section
II presents the system model used throughout the paper.
Section III presents the hierarchical game theoretical framework used to model the spectrum sharing problem. Section
IV investigates different types of learning mechanisms for
macrocell and femtocell coexistence. Finally, conclusions are

Fig. 1. Closed Access (CA) scenario in which the interference inflicted from
macro-users to femto-BS is accounted for and vice-versa. Here, only K = 2
femto-cells are illustrated.

drawn in Section V highlighting future work.


II. S YSTEM M ODEL
We suppose that 1 macro base station is underlaid with K1
femto base stations sharing a frequency band composed of N
carriers, where each base station transmits in any combination
of channels, as illustrated in Figure 2. On each carrier n =
1,
N , base station i = 1, ..., K sends the information xni =
...,
n
pi sni , where sni represents the transmitted symbol and pni
denotes the corresponding transmitted power of BSi on carrier
n.
The received signal at the user i in carrier n can be
expressed as:
rin =

K


hnji xnj + win ,

i = 1, ..., K

n = 1, ..., N

(1)

j=1

where hnji is the fading channel on the nth carrier from


transmitter j to receiver i. In addition, the noise process win
is characterized by its received noise power on each carrier n,
by n2 .
For base station i, the transmit power pni is subject to its
power constraint:
N


pni Pi ,

i = 1, ..., K

(2)

n=1

At the receiver, the signal to interference plus noise ratio


(SINR) in carrier n is expressed as:
SIN Rin =

pni |hnii |2

K
2 + j=1,j=i pnj |hnji |2

(3)

Furthermore assuming a Gaussian codebook, the maximum


achievable rate at receiver i is given by:

1 Due

to their multi-tier hierarchical deployment nature, Stackelberg games


are very suitable to model the macro-femtocell spectrum sharing paradigm.

186

Ri =

N

n=1

log2 (1 + SIN Rin )

(4)

the solution of which is the Nash Equilibrium (N.E.) obtained


through Water-Filling technique [27].
B. Hierarchical spectrum sharing game

Fig. 2. The spectrum sharing problem boils down to an interference channel


with K base stations and N carriers.

III. G AME THEORETIC FORMULATION


In what follows, two spectrum sharing scenarios are investigated. First, in a non-cooperative game G1 every BS
(macro+femto) selfishly maximizes its own achievable rate.
In the second game G2 , the concept of hierarchy is accounted
for where the macro-BS poses as a leader and femto-BSs as
followers.
A. Non-cooperative spectrum sharing game
In the non-cooperative game G1 , each BS i {1, ..., K 1}
T
chooses its own power vector pi = [p1i , p2i , ..., pN
i ] subject
to its total power constraint given in (2), in order to selfishly
maximize the sum-rate Ri . The non-cooperative game G1 is
defined as: G1 = {K, {P}iK , {ui }iK } where:
K = {1, ..., K} denotes the player set
{P1 , ..., PK } denotes the strategy set where the strategy
of player i is:
N
Pi = {pi : pni 0, n, n=1 pni = Pi }
{u1 , ..., uK } is the utility or payoff function set where:


N

|hnii |2 pni
ui (pi , pi ) =
log 1 + 2 
(5)
+ j=i |hnji |2 pnj
n=1
where i refer to the player(s) other than i.
The optimization problem in the non-cooperative sharing
game for BS i is written as:

max Ri = max
pi

pi

s.t.

N


N



log2

n=1

|hnii |2 pni
1+

K
2 + j=i |hnji |2 pnj

pni Pi

n=1

pni 0
(6)

=
A
hierarchical
(Stackelberg)
game
G2
{K, {P}iK , {ui }iK } is proposed to model the spectrum
sharing problem. Motivated by the fact that interference
coming from femto-BSs at the macro-users has to be
minimized, we adopt the framework of hierarchical games
wherein more priority is given at the macro-BS. The spectrum
sharing game is modeled as a Stackelberg game in which
the macro-BS is the leader and the femto-BSs are the
followers. The rationale behind is that the macro-BSs being
the primary networks are first deployed in the network
whereas (secondary) femto-BSs are more of a random and
uncoordinated nature.
The Stackelberg Equilibrium (S.E) [19] is the best response
where a hierarchy of actions exists between players. Backward
induction [5] is applied assuming that players can reliably
forecast the behavior of other players and that they believe
that the other player can do the same.
Assuming that MBS 1 is cast as leader and the FBSs (FBS
2,...,FBS K) as followers in the game, the optimal strategy
of the interferers (femto-BSs) is taken into account in the
optimization problem. Moreover, the optimization problem for
the Macro BS is written as:


N

|hn11 |2 pn1
log2 1 +
(7)
max

p1
2 + j=1 |hnj1 |2 pn,SE
j
n=1
s.t.

N


pn1 P1

n=1

p1 0
pn,SE
j

= BRj (pn,SE
, .., pn,SE
and
1
j ) and is a function of p1
(hence the objective function in (7) is non-convex), BRj (.) is
the best response function of player j.
The leader (MBS 1) maximizes its achievable rate while
taking into account the strategy of the followers (i.e., femtoBSs) and the spectrum sharing game G2 boils down to solving
(7). First, the femto-BSs locally optimize their utility function
using waterfilling technique, the solution of which is given as:


+
2 + i |hnii |2 pni
1

i = 2, ..., K n = 1, ..., N
pni =
i
|hnii |2
(8)
where (x)+ = max{x, 0} and i > 0 is the Lagrangian
multiplier chosen to satisfy the power constraint for femtoBSs i.
It turns out that finding the equilibrium of the leader is
not straightforward because its utility function is non-convex
(7). Nevertheless, there exists sub-optimal and low-complexity
methods to solve the problem. To this end and motivated by
the work of [17], we use lagrangian duality theory wherein the
duality gap [18] provides a nice tool for solving non-convex
optimization problem.

187

The Lagrangian of (7) is given by:

= max
p1

N

n=1

g() = max L(p1 , )


p1

n 2 n
|h11 | p1
1+ 2 
+
+ j=1 |hnj1 |2 pnj


N

n
P1
p


log2

(9)

n=1

where is the lagrangian dual variable associated with the


power constaint.
Consequently, solving the Stackelberg problem is done by
locally optimizing the Lagrangian function (9) via coordinate
descent [18]. For each fixed set of , we find the optimal
2
p11 while keeping p21 , ..., pN
1 fixed, then find the optimal p1
n
keeping the other p1 (n = 2) fixed and so on. Such process is
guaranteed to converge because each iteration strictly increases
the objective function. Moreover, is found using sub-gradient
[17] method. Finally, it is worth noting that the existence and
uniqueness of the equilibria in both games were treated in [3],
[25], [23], respectively.

is determined by the spectrum regulator, which is different


from this paper in which no regulator exists. [26] discusses
correlated equilibrium and achieves it by no-regret learning,
i.e. minimizing the gap between the current reward and optimal
reward. In this approach, mutual communication is needed
among the secondary users. In [24], the authors investigate a
multi-agent Q-Learning channel selection problem for multiuser cognitive radio systems, for a two-player and two-channel
case.
denote the action of the follower j F at
Let Aj,t
f
time t defined as the transmit power over a given carrier
(F being the follower players set). At each time-slot t, the
leader first observes the public information feedback Ifj,t =
{Hfj,t1 , Afj,t1 } which is fed back by each follower (Hfj,t1
is the channel set of follower j). The leader then calculates
t
the propensity rfj,t (Aj,t
f |Aleader ) of the follower j at time t,
which represents the number of times that each follower j
t
takes action Aj,t
f given that the leader took action Aleader .
t
Then, the leader builds its own belief Sfj,t (Aj,t
f |Aleader ) about
the strategy of follower j F given as:
t
rfj,t (Aj,t
f |Aleader )
j,t
j,t
t

Sf (Af |Aleader ) =  j,t


t
A rf (A|Aleader )

IV. L EARNING M ECHANISMS F OR M ACRO -F EMTOCELL


C OEXISTENCE
In the previous section, we have shown that through adopting a hierarchical strategy, the leader and followers improve
their respective achievable rates. In this section, we touch
on the learning aspects of the game, which are paramount
for achieving the expected gains (achievable throughput) with
reasonable complexity in terms of channel state information.
We will also address the worst case scenario where no-learning
mechanisms are considered.
A. Case 1: public exchange of information
In the hierarchical problem formulation (7), the leaderfollower optimization setting is based on the fact that the
leader knows all channel state information of the followers.
This assumption may not be feasible in practice, especially
with the foreseen massive femtocells deployment. To alleviate
this, a learning framework [20], [21] is proposed in which
the leader (i.e., macro-BS) applies different strategic learning
methods to evaluate its expected utility. Different information
types drive the learning approach of the players which in
turn result in a different expected utility. For this purpose,
both private and public information are investigated where the
former considers the case where the aggregate interference
coming from the followers is measured at the leaders receiver
side, whereas in the latter, explicit information feedback2 about
the followers actions enables the leader to directly model the
other users strategies in a more efficient manner.
It is worth noting that existing research on learning in
cognitive radio networks have been carried out in [24]- [26]. In
[24] and [26], the authors focused on the resource competition
in a spectrum auction system, where the channel allocation

(10)

which is in other words, the empirical frequency (or probability) that follower j takes action Aj,t
f given that the leader takes
action Atleader . Furthermore, whenever the leader takes action
Atleader , the propensity vector rfj,t (Af |Aleader ) is updated as
follows:
j,t1
j,t1
t
rfj,t (Aj,t
(Afj,t1 |At1
) (11)
f |Aleader ) = rf
leader )+I(Af

where I(Afj,t1 ) is an indication vector such that:



1, if A = Aj,t
j,t
f
I(A = Af ) =
0, if A = Aj,t
f
Finally,
the leader maximizes
its utility function


j,t

ESj,t uleader (Aleader |Sf )


based on the estimated
f
belief of the followers computed in (10).
B. Case 2: no public exchange of information (also known as
reinforcement learning)
Unlike the case with information exchange between the
leader and the followers, there is no explicit exchange of information between players. Nevertheless, a player can implicitly
learn the behavior of other players based on its perceived
SIN R from the environment. In what follows, we borrow
tools from machine/reinforcement learning [8] where players
learn their strategies by interacting with the environment
through trials and errors, until reaching convergence.
Player3 i {1, ..., K} models its best-response strategy Sit
at time t as:

2 In a femtocell scenario, this information can be exchanged through the


gateway, or eventually through the backhaul.

188

3 Note

that in this case, all players are non-cooperative/myopic.

rit (Ati )
t
t
AAi ri (Ai )

Sit (Ati ) = 

35

(12)
30

rit (Ati )
Ati at

)I(At1
)
rit = rit1 + (1 )ui (At1
i
i

sumrate of the network

where
represents the tendency of player i choosing
time-slot t. In addition, player i updates the
action
)
tendency vector based on the experienced utility ui (At1
i
is taken in time-slot t 1 as follows:
when action At1
i
(13)

where is the learning parameter.

N = 10
N=5
N = 15

25

20

15

10

V. S IMULATION RESULTS

4 This is referred to as the Braess paradox [22], [23], which implies that
increasing the space of strategies of each player, i.e., the number of BSs each
player can use, ends up degenerating the global performance of the network

10
15
number of base stations

20

25

Fig. 3. Total sum-rate of the network versus the number of base stations K,
for different values of carriers N .

Centralized

12

Stackelberg
10
sumrate of the network

In this section, numerical results are provided to validate


our theoretical claims. We consider frequency-selective fading
channels. The maximum power constraint for each player i is
assumed to be identical and normalized as Pi = 1.
Figure 3 depicts the average total sum-rate of the network
for the non-cooperative sharing game G1 after solving (6), for
channel noise variance 2 = 0.1.
A number of key observations can be made: the total sumrate of the network increases with increasing the number of
base stations before leveling of. Moreover, it is seen that for
a given geographical area and an arbitrary number of carriers
N , there is an optimal4 number of base stations M  which can
be deployed in the network, after which the total sum-rate of
the network starts decreasing due to the interference-limited
regime. The latter is essentially due to the selfishness nature
of the femto base stations.
Figure 4 illustrates the total sum-rate of the network for the
non-cooperative, hierarchical and centralized approach for the
case of N = 5 carriers. It is clearly seen that through adopting
hierarchy the performance of the network outperforms the
selfish approach and bridges the gap between both the noncooperative and fully centralized one. In other words, it is
possible to deploy more femto BSs through the concept
of hierarchy. Finally, as expected the centralized approach
outperforms both approaches.
Figure 5 plots the achievable rate of the leader in the
hierarchical game with different learning strategies. K = 3
base stations and N = 5 carriers are considered herein. In
the case of learning with public information exchange (case
1), the leader builds an estimate of other playerss strategies
after which he maximizes his expected utility. In the case of
learning with no information exchange (case 2), player i adopts
a reinforcement learning strategy in which he implicitly learns
the strategies of the other players by taking into account his
past history and perceived payoff from the environment. On
the other hand, as a benchmark scenario, the no-learning case
is presented in which player i behaves selfishly, unwilling to
learn the strategy of its potential interferers, hence yielding a
poor outcome.

Selfish

10
15
number of base stations

20

25

Fig. 4. Comparison of the total sum-rate for the non-cooperative, hierarchical


and centralized approach, where it is shown that the hierarchical approach
bridges the gap between the selfish and fully centralized approaches. Here,
N = 5 carriers are considered.

VI. CONCLUSION
In this paper, we studied the problem of coexistence between a macro-cell and underlaid femto-cells sharing the same
spectrum. The problem is investigated from a game theoretic
perspective where each base station is modeled as a player
in the game who decides in a distributed way the strategy of
allocating its total power across a set of carriers. Both of the
non-cooperative and hierarchical spectrum sharing scenarios
are investigated. It turns out that for a given region with
an arbitrary number of base stations, there exists an optimal
number of base stations to be deployed in the network. In
addition, it is shown that the hierarchical approach outperforms
the non-cooperative approach, hence bridging the gap between
the selfish and centralized approach. The second contribution
of the paper looks at the framework of machine learning with
public information exchange among players, in which case the
leader computes an estimate of the strategies of the followers

189

20
withlearning (case 1)
withlearning (case 2)
nolearning

18

Achievable rate for the leader

16
14
12
10
8
6
4
2
0

10
time

15

20

Fig. 5. Achievable rate of the leader with and without learning. By exchange
public information (case 1), the leader improves his payoffs compared to the
no-learning case as well as case 2 (reinforcement learning).

and improves thereby his payoff as well as the followerss


payoffs.
Future work will investigate the extension case where multiple leaders and multiple followers share the same spectrum.
Moreover, other applications of machine learning will be
investigated, in the context of femtocell networks.
R EFERENCES
[1] V. Chandrasekhar, J. G.
Andrews and A. Gatherer, Femtocell
Networks: A Survey, IEEE Communications Magazine, Vol. 46, No. 9,
pp. 59-67, September 2008.
[2] V. Chandrasekhar, T. Muharemovic, Z. Shen, J. G. Andrews and
A. Gatherer, Power Control in Two-Tier Femtocell Networks, IEEE
Transactions on Wireless Communications, vol. 8, no. 8, Aug. 2009.
[3] G. He, S. Betz, and M. Debbah, Game-Theoretic Deployment Design
of Small-Cell OFDM Networks, 3rd ICST/ACM International Workshop
on Game Theory in Communication Networks, October 2009.
[4] Z. Bharucha, H. Haas, I. Cosovic and G. Auer, Throughput Enhancement
Through Femto-Cell Deployment, Technical Report, 2009.
[5] D. Fudenberg and J. Tirole, Game Theory, Cambridge: The MIT Press,
1991.
[6] T. Alpcan, T. Basar and S. Dey, A Power Control Game Based on Outage
Probabilities for Multicell Wireless Data Networks, IEEE Transactions
on Wireless Communications, vol. 5, no. 4, April 2006.
[7] L. Qian, X. Li, J. Attia, and Z. Gajic, Power control for cognitive radio
ad hoc networks, in IEEE workshop on local and metropolitan area
networks, June 2007.
[8] J. Watkins and P. Dayan, Technical note: Q-learning, Machine Learning,
vol. 8, pp. 279-292, 1992.
[9] N. Hoven and A. Sahai, Power scaling for cogntive radio, in International conference on wireless networks communications and mobile
computing, June 2005.
[10] N. Hoven and A. Sahai, Fundamental tradeoffs in robust spectrum sensing for opportunity frequency reuse, [online]. Available:
www.eecs.berkley.edu/ smm/CognitiveTechReport06.pdf
[11] M. Bloem, T. Alpcan and T. Baser, A stackelberg game for power
control and channel allocation in cognitive radio networks, The Second
International Conference on Performance Evaluation Methodologies and
Tools, Nantes, France, 2007.
[12] J. Papandriopoulos and J. S. Evans, Low complexity distributed algorithms for spectrum balancing in multi-user DSL networks, IEEE
International Conference on Communications, vol. 7, pp. 32703275,
June 2006.
[13] L. Lai and H. El Gamal, The water-filling game in fading multiple
access channels, IEEE Transactions on Information Theory, vol. 54, no.
5, pp. 2110 - 2122, May 2008.

[14] B. Wang, Z. Han and K. J. Ray Liu, Stackelberg game for distributed
relay selection and power control for multiuser cooperative communication networks, to appear IEEE Transactions on Mobile Computing,
2009.
[15] A. Ozgur, O. Leveque and D. Tse Hierarchical Cooperation Achieves
Optimal Capacity Scaling in Ad Hoc Networks, IEEE Transactions on
Information Theory, vol. 10, October 2007.
[16] W. Saad, Z. Han, M. Debbah and A. Hjorungnes, A Distributed Merge
and Split Algorithm for Fair Cooperation in Wireless Networks, in proc.
IEEE ICC, May 2008.
[17] W. Yu and R. Lui, Dual methods for nonconvex spectrum optimization
of multicarrier systems, IEEE Transactions on Communications, vol. 54,
No. 7, 2006.
[18] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[19] V. H. Stackelberg, Marketform und Gleichgewicht, Oxford, University
Press, 1934.
[20] L. Busoniu, R. Babuska and B. D. Schutter, A comprehensive survey
of multiagent reinforcement learning, IEEE Trans. Systems, Man and
Cybernetics, Part C, vol.38, no.2, pp.156172, March 2008.
[21] J. Hu and M. P. Wellman, Multiagent reinforcement learning: Theoretical
framework and an algorithm, in Proc. 15th Int. Conf. Mach. Learn.
(ICML), July 1998.
[22] D. Braess. Uber ein paradoxon aus der verkehrsplanung. Unternehmensforschung, 24(5):258-268, May 1969.
[23] S. Medina Perlaza, E. V. Belmega, S. Lasaulce and M. Debbah, On
the base station selection and base station sharing in self-configuring
networks, Fourth International Conference on Performance Evaluation
Methodologies and Tools, Pisa, Italy, Oct. 2009.
[24] H. Li, Multi-agent Q-Learning of Channel Selection in Multi-user
Cognitive Radio Systems: A Two by Two Case, IEEE Conference on
System, Man and Cybernetics (SMC), 2009.
[25] M. Bennis, S. Lasaulce and M. Debbah, Inter-operator spectrum sharing
from a game theoretical perspective, EURASIP Journal of Advances
in Signal Processing (JASP), Special issue on dynamic spectrum access for wireless networking, Aug. 2009, 1-12, Article ID 295739,
doi:10.1155/2009/29573.
[26] Z. Han, C. Pandana and K. Liu, Distributive opportunistic spectrum
access for cognitive radio using correlated equilibrium and no-regret
learning, in Proc. of IEEE Wireless Communications and Netwowrking
Conference (WCNC), 2007.
[27] W. Yu, W. Rhee, S. Boyd and J. Cioffi Iterative Water-filling for
Gaussian Vector Multiple Access Channels, IEEE Transactions on
Information Theory, vol. 50, no. 1, pp.145-151, January 2004.

190

Вам также может понравиться