Multitask Spectrum Sensing in Cognitive Radio Networks Via Spatiotemporal Data Mining

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO.
2, FEBRUARY 2013
809
Multitask Spectrum Sensing in Cognitive Radio

Networks via Spatiotemporal Data Mining
Xin-Lin Huang, Member, IEEE, Gang Wang, Member, IEEE, and Fei Hu, Member, IEEE
AbstractRecently, compressive sensing (CS) and spectrum

sensing have been two hot topics in the signal processing and
cognitive radio network (CRN) fields, respectively. Due to the
sampling rate limitation of the analog-to-digital converter in
spectrum-sensing circuits, some works have proposed integrating
these two techniques to achieve low-overhead spectrum sensing
in CRNs. These works aim to minimize spectrum reconstruction
errors based on linear regression methods, and 1 -norm is typically used to make a tradeoff between spectrum sparseness and
reconstruction accuracy. However, since the interference range
of primary users is limited, multiple clusters in the CRN may
not share a common sparse spectrum, and thus, the 1 -norm
may not be appropriate to handle all clusters in CS inversion.
Hence, we propose a novel multitask spectrum-sensing method
based on spatiotemporal data mining methods. In each cluster,
we assume that the spectrum sensing is executed in a synchronized way. The cluster head (CH) manages the operations, and a
common sparseness hyperparameter is used to make a consensus
decision. Among multiple clusters, synchronized CS sampling is
not required in our scheme; instead, the Dirichlet process prior
is employed to make an automatic grouping of the spectrumsensing results among different clusters with a common sparseness
hyperparameter shared inside each group. To exploit the timedomain relevance among consecutive CS observations, a hidden
Markov model is employed to describe the relationship between
the hidden subcarrier states and the consecutive CS observations,
and the Viterbi algorithm is used to make an accurate spectrum
decision for each secondary user. Simulation results show that our
proposed algorithm can successfully exploit the spatiotemporal
relationship to achieve higher spectrum-sensing performance in
terms of normalized mean square error, probability of correct
detection, and probability of false alarm, compared with a few
other related works.
Index TermsCognitive radio network (CRN), Dirichlet process (DP), hidden Markov model (HMM), spatiotemporal data
mining, spectrum sensing.
Manuscript received November 27, 2011; revised March 22, 2012, June 30,
2012, and September 6, 2012; accepted October 5, 2012. Date of publication
October 10, 2012; date of current version February 12, 2013. This work was
supported in part by the National Natural Science Foundation of China under
Grant No. 61201225 and in part by the Natural Science Foundation of Shanghai
under Grant No. 12ZR1450800. The review of this paper was coordinated by
Prof. B. Hamdaoui.
X.-L. Huang is with the Department of Information and Communication
Engineering, Tongji University, Shanghai 201804, China (e-mail: xlhuang@
tongji.edu.cn).
G. Wang is with the Communication Research Center, Harbin Institute of
Technology, Harbin 150001, China (e-mail: gwang51@hit.edu.cn).
F. Hu is with the Department of Electrical and Computer Engineering, The
University of Alabama, Tuscaloosa, AL 35487 USA (e-mail: fei@eng.ua.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVT.2012.2223767
I. I NTRODUCTION
ODAY, the spectrum assignment policy in wireless communications is regulated by governmental agencies. The
huge band wireless spectrum is segmented and authorized to
licensed holders or services. With the dramatic increase of highdefinition audio/video applications through wireless access,
hundreds of megahertz to many gigahertz of wireless bandwidth
are required, which causes scarcity of the limited wireless spectrum resource. On the other hand, according to a report from the
Federal Communications Commission (FCC) [1], the temporal
and geographical variations in the utilization of the licensed
spectrum are from 15% to 85%. This means that much of the
spectrum is not efficiently utilized. The increasing high-quality
service requirement, limited available spectrum, and inefficient
spectrum utilization necessitate a new communication pattern
to exploit the existing wireless spectrum opportunistically [2],
[3]. Dynamic spectrum access (DSA) has been proposed to
solve the spectrum inefficiency problems and is implemented
in cognitive radio networks (CRNs) [4]. In a CRN, through
the opportunistic use of free spectrum (also called spectrum
holes), a device can gain access to more wireless bandwidth,
which is the main goal of the FCC regulations [5].
Cognitive radio (CR) techniques provide the capability of
detecting spectrum holes and sharing the spectrum in an opportunistic manner. DSA techniques can select the best available
channel from the spectrum pool for CR devices to operate [4],
[6]. More specifically, CR enables secondary users (SUs) to
perform a series of operations as follows: 1) spectrum sensing
to predict what spectrum is available and recognize the presence
of the primary user (PU) when a PU reoccupies the licensed
channel; 2) spectrum management to select the best available
channel from the spectrum pool for special services; 3) spectrum sharing to coordinate access to all available channels with
other SUs; and 4) spectrum mobility to vacate the channel as
soon as possible when a PU is detected [4]. Spectrum sensing
is one of the most important components in the cognition cycle
(see Fig. 1).
In Fig. 1, the spectrum-sensing module helps the SUs to
recognize the radio environment, i.e., identifying the spectrum
occupancy states of both PUs and other SUs. The spectrum
information can be further used by spectrum analysis and spectrum decision modules to analyze the available channel quality
and then make a channel assignment decision, respectively [4].
Recently, many signal processing techniques have been developed for spectrum sensing, and these can be classified as
either noncooperative detection or cooperative detection. In
noncooperative detection methods, different spectrum-sensing
methods can be chosen by individual SUs [8], such as match
0018-9545/$31.00 2012 IEEE
810
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013
Fig. 1. Basic cognition cycle [4], [7].
filter detection [9], energy detection [10], and cyclostationary

feature detection [11]. These noncooperative detection models
are designed for individual SUs to process their local spectrum
observations and thus are easy to implement. However, without
consideration of spatial diversity information [12], their performance is limited. Hence, many schemes [12][18] resort to
cooperative detection. It is more accurate than noncooperative
detection since the uncertainty in the individual SUs detection
can be significantly reduced [12]. In the cooperative detection
model, two typical data fusion methods are widely used today:
1) centralized and 2) decentralized fusions [13][18]. In the
centralized fusion model, all observations from different SUs
are collected by a fusion center through multihop transmissions
in CRN [13], [14]. The final result of centralized fusion can
be globally optimal due to the spatial information considered
in the decision process [12]. However, it also results in high
communication cost since it requires the transmission of local
information to a fusion center as well as the broadcast of decision information to each SU. Moreover, the fusion center needs
to have powerful computation resources to process the huge
amount of information quickly. Hence, decentralized fusion
methods are proposed in [15][18] to reduce power consumption and operation load in the fusion center. Through local
sensing and decision information exchange with neighboring
SUs, each SUs local computation can eventually converge
to a global decision. Much research work has been done on
cooperative decentralized fusion [19], [20].
In [21][23], cooperation spectrum-sensing methods are proposed for a simple two-SU network and a multi-SU network,
respectively, to reduce the detection time and improve the probability of correct detection in distributed networks. Reference
[21] also assumes that the positions of all PUs are known to
SUs. In [24] and [25], the Markov chain is used to describe the
channel state transition, and a decentralized strategy is proposed
for SUs to decide which channels to sense and access, to improve the network throughput. In [26], the distributed detection
theory is used to realize cooperative spectrum sensing in an
indoor environment to improve the radio awareness of CRNs.
In [27], a censoring method with quantization is proposed to
reduce the average number of sensing bits for low-overhead
information exchange. In [12], a decentralized fusion solution

to cooperative compressive sensing (CS) is proposed to obtain
global optimality and make a consensus decision. The 1 -norm
has been used widely and is verified to be one of the best inversion methods to make a tradeoff between signal sparseness and
reconstruction errors [28]. The orthogonal property between
the spectrum statistics of PUs and SUs is used to distinguish
between sparse common and sparse innovations in [12], where
sparse common and sparse innovations are defined as the global
spectrum occupation of PUs and the local interference from
neighboring SUs, respectively.
The works most related to our proposed multitask spectrum
sensing are discussed in [29] and [30]. These works are designed for signal reconstruction after sub-Nyquist sampling
of multitask signals (here, the local spectrum sensing in a
specific SU is called a task). Those schemes can reduce reconstruction errors through a centralized fusion. In [29], the
multitask CS (MT-CS) is investigated. It estimates a common
sparseness hyperparameter after collecting local observations
from all users. Based on the estimated common sparseness
hyperparameter and local observations, each user reconstructs
the original signals. In [30], all local observations from SUs
are assumed not to share the same sparseness hyperparameter.
Hence, the cooperative fusion is only executed in some specific
users (called one group) rather than the whole network. The
Dirichlet process (DP) prior is proposed to realize automatic
nonparametric grouping and inversion of observations (i.e.,
using CS signals to recover the original Nyquist-rate-sampled
signals) within each group [30]. In this paper, we will use the
multitask Bayesian CS (BCS) model [29] to solve the spectrumsensing issues in each cluster. Meanwhile, a DP prior [30] will
be used to realize automatic nonparametric grouping for the
CS data collected from different clusters. We further integrate
the DP-based spectrum-sensing model with the hidden Markov
model (HMM) to exploit the temporal-dependent information.
Moreover, we also consider the effects of sub-Nyquist sampling
and the deep-fading characteristic of intersymbol interference
(ISI) channels in PUs and SUs.
The main contributions of this paper include the following
three aspects:
1) A cooperative and decentralized BCS inversion is
proposed as a spectrum information sharing mechanism through a common sparseness hyperparameter. A
hierarchical sparseness prior is employed for sharing
information based on all local CS observations. The
complex-valued observations and unknown ISI channels
are also considered in the spectrum sampling process.
2) We extend our work from the preceding sharing mechanism to a nonparametric grouping mechanism. Since the
CS observations from different clusters may not be appropriate for sharing due to the limited radio coverage range
of each PU, a DP-based hierarchical model is proposed to
realize simultaneous grouping and BCS inversion of the
CS observations.
3) After the spatial spectrum diversity information is exploited, the HMM is used in each cluster to exploit
the internal relationship among temporal-dependent CS
HUANG et al.: MULTITASK SPECTRUM SENSING IN CRNs VIA SPATIOTEMPORAL DATA MINING
811
observations. The Viterbi algorithm is used to choose the

optimal hyperparameter, which is finally used to make a
local spectrum decision in each SU.
The rest of this paper is organized as follows. In Section II,
the challenges of spectrum sensing and the system model are
described. In Section III, we first consider the case of only
one cluster that implements CS sampling and inversion. We
assume that all member nodes (also called SUs in this paper)
in the same cluster can share spectrum information, and a
hierarchical sparseness prior is employed to derive the common
sparseness property of the spectrum. In Section IV, we extend
the one-cluster case to the whole CRN (i.e., multicluster case)
and use the DP-based hierarchical model and HMM to realize
spatiotemporal data mining. In Section V, simulation results are
provided to show the efficiency of our cooperative distributed
spectrum-sensing method in terms of normalized mean square
error (MSE), probability of correct detection, and probability of
false alarm. Section VI concludes this paper.
Fig. 2. Cooperative decentralized spectrum sensing for cluster-based CRN.
II. P ROBLEM S TATEMENT

Spectrum sensing is a critical task in a CRN that adopts DSA
to improve network spectrum utilization [31]. The spectrum
sensing in an SU aims to accurately identify the spectrum
occupancy status in both PUs and other SUs to facilitate the
utilization of idle spectrum holes while at the same time strictly
protect the PUs transmissions and avoid harmful interference
among SUs. To reduce the impact of deeply fading ISI channels
and improve the spectrum detection performance, we adopt
collaborative spectrum sensing that exploits the spatial diversity
information among multiple SUs. It stands out as an effective
approach to alleviate the problem of detection failure as well as
the hidden terminal problem. Another advantage of cooperative
sensing is that the effective signal-to-noise ratio may increase
proportionally as the number of cooperating SUs increases [12].
This can reduce the spectrum-sensing cost of each individual
SU (i.e., the number of CS measurements).
To enable cooperative decentralized spectrum sensing in a
huge band CRN, several major challenges have to be addressed.
First, according to the Nyquist sampling theorem, spectrum
sensing over a wide frequency band requires a high spectrum
sampling rate, which is a bottleneck of the analog-to-digital
converter (ADC). Second, conventional cooperation spectrumsensing algorithms require a fusion center to collect observations from all SUs and make centralized sensing decisions [32].
This centralized processing may incur high communication
costs and render the entire network vulnerable to node failure
[12]. Third, spatially distant SUs might not be ideally synchronized during sampling in the sensing stage, and the observations
from distant SUs might not be appropriate for sharing since
the transmission range of PUs may not cover all the clusters.
As a result, all SUs might not share one common sparseness
spectrum decision.
Existing works assume the following: 1) All CRs stay silent
such that only the PUs are emitting power during the SUs
spectrum sensing [18], [32], and 2) the transmitting power of
PUs is high enough such that it can be heard by all SUs. Hence,
a common sparseness spectrum is shared among all SUs. The
first assumption imposes a stringent requirement on the synchronization among all SUs, which is difficult to implement in a
large-scale CRN. The second assumption imposes that the PUs
are very powerful or the whole CRN is deployed in a small
area, and no interference and deep fading exists in the radio
environment. Those conditions may not be realistic in many
CRN applications.
In this paper, we consider the cluster-based CRN shown
in Fig. 2. In Fig. 2, the SUs are deployed in a wide area,
and the clusters are formed based on some metrics, such as
location, mobility, etc. (please refer to our previous work [3]
on CRN clustering strategies). This paper assumes the same
clustering criterion as in [3]. Therefore, we can assume that
all cluster members share the same spectrum map due to their
close distances to each other. However, due to spectrum-sensing
noise and errors in each cluster member, there could be minor
discrepancies among their sensing results. Thus, an efficient
spectrum fusion algorithm is needed to reach a consensus in
terms of the entire clusters spectrum patterns. In the spectrumsensing stage, all SUs in the same cluster first individually
operate CS sampling in synchronization mode based on the
spectrum management commands sent from the CH. The CH
then collects the local result from each cluster member to make
a fusion result and then broadcasts the fused information to the
CHs in other clusters.
To realize our cooperative decentralized spatiotemporal data
mining, we assume the following: 1) Sampling synchronization
in the whole CRN is not required, and synchronized sampling
only occurs inside each cluster. When a cluster starts spectrum sensing, it may also receive the signal from neighboring
clusters. Such signal can be seen as the interference from the
viewpoint of spectrum sensing. Hence, we need to identify
which channels are occupied by the PUs or SUs from the neighboring clusters. 2) The transmission power of PUs may not be
high enough to cover the whole CRN area, and different clusters may have different spectrum occupancy status due to the
geographical-dependent PUs. Hence, the sparseness spectrum
is determined by the geographical-dependent PUs as well as the
812
distant SUs in other clusters. 3) The CS algorithm is adopted in

each SU during spectrum sensing since the wideband spectrum
sensing requires high sampling rate and high cost on the ADC
circuit according to the Nyquist sampling theorem.
III. M ULTITASK BAYESIAN C OMPRESSIVE S ENSING
M ODELING W ITH H IERARCHICAL P RIOR :
O NE -C LUSTER C ASE
For simplicity, we first consider the cooperative spectrum
sensing in one cluster only and then extend it to the multicluster
case in the next section. In each cluster, the CH periodically
broadcasts the spectrum-sensing commands, and all member
nodes execute signal sampling and exchange information with
their CH. In the one-cluster case, we assume that all member nodes share one common sparseness spectrum. In this
section, we propose using MT-CS modeling with hierarchical
prior to detect spectrum holes in a cooperative manner. Unlike
conventional cooperative spectrum-sensing schemes [12], [29],
[33][35], here we consider the following: 1) unknown channel
impulse response (CIR) between PUs and SUs; 2) complexvalued sampling signals (including horizontal component and
orthogonal component); and 3) automatic identification of PUs
and SUs spectrum occupancy states based on the spectrum
assignment records from its neighboring CHs.
We assume that there are M subcarriers (also called channels) and N SUs in the target cluster. Through Nyquist sampling, the received signals in the SU j can be represented as
rj (n) = rjP (n) + rjC (n) + wj (n),
I
n = 0, 1, . . . , M 1 (1)
where rjP (n) = i=1 hi,j (n)xi (n) and rjC (n) = i gi,j (n)
yi (n) correspond to the received signals from a total of I PUs
and the interference from neighboring clusters, respectively.
hi,j (n) is the CIR between PU i and SU j (j = 1, 2, . . . , N ),
and () denotes convolution. gi,j (n) is the CIR between a
neighboring cluster i and SU j. xi (n) and yi (n) correspond
to the original transmitted signal from PU i and neighboring
cluster i, respectively. wj (n) represents the additive white
Gaussian noise.
After M -point discrete Fourier transform (DFT) for the
observed signals, (1) can be further rewritten as [12]
Rj (k) =
I

i=1
Hi,j (k)Xi (k) +
constituted randomly [29], and FM is the M M DFT matrix

[12]. In (3), j is the time-domain sampling signal and can be
further rewritten as
j = j RjP + j RjC + j Wj

= j Re RjP + RjC + Wj + sqrt(-1)

j Im RjP + RjC + Wj .
Since the Re{} part is orthogonal to the Im{} part and

both of them have the same structure [see (4)], we only show the
analysis of the Re{} part here (to save space), and the Im{}
part can be analyzed in the same manner. Any analysis result
from Re{} can be symmetrically extended to the Im{} case
eventually. Hence, we have
j = j j + j
where mj is the number of measurements. It is much smaller

than the Nyquist sampling rate M in (1).
The parameters j are assumed to be drawn from a product
of zero-mean Gaussian distributions that are shared by the SUs
in one cluster, and therefore, the N tasks are statistically related
to each other. Specifically, letting j,k represent the kth element
of the vector j , we have
p{j |, 0 } =
M

N j,k |0, 01 k1
1 P
1 C
1
Rj + j FM FM
Rj + j FM FM
Wj
j = j FM FM
(3)
where j is a mj 1 vector (mj M ) of the CS observa

tions, and RjP = Ii=1 Hi,j (k)Xi (k) is orthogonal to RjC =

P T C
i Gi,j (k)Yi (k) (i.e., (Rj ) Rj = 0). j FM is the observation matrix, where j is a mj M matrix with elements
(7)
k=1
where = {1 , 2 , . . . , M } is a hyperparameter. To promote

sparseness over j , a Gamma prior can be placed on the hyperparameter 0 , i.e.,
p{0 |a, b} = Ga(0 |a, b) =
where Hi,j (k), Xi (k), Gi,j (k), Yi (k), and Wj (k) (k =

0, 1, . . . , M 1) are the complex-valued frequency-domain
discrete versions of hi,j (n), xi (n), gi,j (n), yi (n), and wj (n),
respectively. Due to the sampling rate limitation of ADC and
the sparse nature of the received signals in (2), we use the CS
technique in spectrum sampling as follows:
(5)
where j = Re{RjP + RjC } is a M 1 vector that represents

the spectrum occupation states, and j is a mj 1 vector
whose components are independently identically distributed
(i.i.d.) Gaussian variables. We employ a hierarchical MT-CS
model for our cooperative spectrum-sensing scheme based on
(5). Since j is i.i.d. draws of a zero-mean Gaussian distribution
with unknown precision 0 , the likelihood function for the
parameters j and 0 , based on the observations j , can be
represented as

0
p{j |j , 0 } = (2/0 )mj /2 exp j j j 22 (6)
2
Gi,j (k)Yi (k) + Wj (k) (2)
(4)
ba a1
exp(b0 ).
(a) 0
(8)
Thus, the posterior probability of j based on the observed

signals j and hyperparameter can be represented as
p{j |j , }

= p(j |j , , 0 )p(0 |a, b)d0

p(j |j , 0 )p(j |, 0 )

=
p(0 |a, b)d0
p(j |j , 0 )p(j |, 0 )dj
(a+M/2)

1
(j j )T 1
(a + M/2) 1+ 2b
j (j j )
=
(a)(2b)M/2 |j |1/2
(9)
813
where k = {1 , 2 , . . . , k1 , k+1 , . . . , M }. sj,k , qj,k ,

and ej,k are defined as
where
j = j Tj j

1
.
j = Tj j + A
(10)
N
log p(j |)
j=1
N

p(j |j , 0 )p(j |, 0 )p(0 |a, b)dj d0
log
j=1
1
2
N
(11)
In (11), A is a diagnose matrix (A = diag(1 , 2 , . . . , M )).

From (9)(11), to obtain the posterior probability of j , we
should first seek the point value of the hyperparameter based
on the N observations.
The maximum likelihood (ML) function of the observations
from N SUs can be written as (see the Appendix for detailed
deduction steps)
() =
1
sj,k = Tj,k Bj,k
j,k

(mj +2a) log jT Bj1 j +2b +log |Bj |
j=1
+ Const.
1
qj,k = Tj,k Bj,k
j
1
ej,k = jT Bj,k
j + 2b.
To update k in each iteration, we fix the other hyperparameter k as the latest value, differentiate the likelihood function
(k ) with k , and set the result to zero, i.e.,
(k )
k

2
2
N s

j,k sj,k qj,k /ej,k /k (mj +2a)qj,k /ej,k +sj,k

=
2 /e
2(k +sj,k ) k +sj,k qj,k
j=1
j,k
= 0.
(19)
Since k is the precision of the Gaussian distribution, we have
k > 0. We assume k sj,k (this is an empirical result stated
in [29]), and thus, k + sj,k sj,k in (19). Then, we can derive
the new k from (19) as
(12)
if
Here
Bj = E + j A1 Tj
Tj
Bj = E + j A

=E +
n1 j,n Tj,n + k1 j,k Tj,k
n=k
= Bj,k + k1 j,k Tj,k
(14)
where Bj,k =E + n=k n1j,nTj,n(k = 1, 2, . . . , M ). Hence,

the determinant and inversion of matrix Bj can be expressed as

1
(15)
|Bj | = |Bj,k | 1 + k1 Tj,k Bj,k
j,k
1
Bj1 = Bj,k
1
1
Bj,k
j,k Tj,k Bj,k
1
k + Tj,k Bj,k
j,k
(16)
Then, the contribution of the basis vector j,k in the likelihood function (12) can be separated from others, i.e.,
N

1
1
j + 2b
(mj + 2a) log jT Bj,k
2 j=1

+ log |Bj,k | + Const

N

1
log 1 + k1 sj,k
2 j=1

2
/ej,k
qj,k
+ (mj + 2a) log 1
k + sj,k
() =
= (k ) + (k )
(17)
N
2

(mj + 2a)qj,k
/ej,k sj,k
>0
2 /e
sj,k (sj,k qj,k
j,k )
j=1
(13)
where E is the identity matrix. According to [33], Bj can be

decomposed as
(18)
N
N
2 /e
(mj +2a)qj,k
j,k sj,k
j=1
2 /e
sj,k (sj,k qj,k
j,k )
(20)
else
k = .
(21)
2
Hence, the SU calculates ((mj + 2a)qj,k
/ej,k sj,k )/
2
(sj,k (sj,k qj,k /ej,k )) in each iteration and broadcasts such
value to its CH. From (20) and (21), we can update the hyperparameter k (k = 1, 2, . . . , M ) after each iteration. After reaching the upper bound of iteration times, or if the increment value
of the likelihood in (17) is less than a threshold (which means
that we almost reach the maximum value of the likelihood),
the CH obtains the spectrum decision for its cluster. In (21),
k = means j,k = 0 (j = 1, 2, . . . , N ), and the subcarrier
k is available to SUs.
From the foregoing analysis, one can see that: 1) the member
nodes in one cluster seek a consensus spectrum map based on
the multitask BCS model, and 2) the information exchanged
among member nodes can be used to derive the shared hyperparameter = {1 , 2 , . . . , M }. An advantage of our proposed
hierarchical prior [see (7) and (8)] is to collect the spatial
contribution from all member nodes to derive the common
sparseness spectrum and thus remove ISI channel fading.
After several iterations, the result = {1 , 2 , . . . , M }
will converge, and we can then make a binary spectrum decision
of dPU and dSU [to be discussed in (57) and (58)], which
represents the spectrum occupancy states of PUs and SUs.
IV. D ISTRIBUTED I NFORMATION E XCHANGE AND

S PATIOTEMPORAL DATA M INING : M ULTICLUSTER C ASE
In this section, we will extend the one-cluster spectrumsensing case to a multicluster case, i.e., the whole CRN with
814
observations from different clusters. In Section III, one cluster is assumed to share one common sparseness spectrum.
However, the CRN may be deployed over a large-scale area,
and the sparseness spectrum decisions may vary in different
positions due to the geographical-dependent PUs and signal
attenuation along a path. Hence, different clusters may not be
statistically interrelated to each other, and the CS observations
from different clusters may not be appropriate for sharing. For
example, one cluster may be located near a high TV tower (base
station), which makes less IEEE 802.22 channels available for
SUs. In the multicluster case, we should design an efficient
algorithm that first groups the CS observations from different
clusters (multiple clusters CS observations may belong to the
same group as long as they obey the same spectrum statistics)
and then uses the multitask BCS model (see Section III) in
each group to discover the common sparseness spectrum within
each group. For this purpose, we introduce a DP prior to the
hierarchical BCS model that has been discussed in Section III.
The DP prior [30] has shown a powerful capability of automatically classifying different samples into groups based on their
statistical patterns. In our application, the DP prior will be used
to realize both spectrum grouping and CS inversion.
Equation (26) clearly shows the important sharing property

of DP distribution: a new sample i prefers to select a group
k with a large population ni

k .
In (22), the distribution G can be generated by the stickbreaking process, which introduces two independent random
variables k and k (k = 1, 2, . . . , ), i.e.,
G=
k k
(27)
k=1
where
k = k
k1

(1 n )
(28)
n=1
k | Beta(1, )
k |G0 G0 .
(29)
(30)
In (27) and (28), k and k are drawn i.i.d. from a Beta

distribution [(29)] and base distribution G0 [(30)], respectively.
To promote sparseness over j , we assume that G0 is a multiplication of Gamma distribution [29], i.e.,
G0
A. DP
M

Ga(c, d).
(31)
k=1
DP is a distribution over probability measure and has two parameters: 1) precision and 2) base distribution G0 [30]. In the
multicluster spectrum-sensing case, different clusters may have
different hyperparameters, that is, i = {i1 , i2 , . . . , iM },
the cluster ID i = 1, 2, . . . , C (C is the total number of clusters
in the CRN). We assume {i , i = 1, . . . , C} is drawn identically from distribution G, which is a random draw from the DP,
i.e.,
i |G G,
i = 1, . . . , C
(22)
G DP (, G0 )
(23)
E(G) = G0 .
(24)
Equation (22) is the likelihood function for G, and the hyperparameter i has been derived in the multitask BCS model (see
Section III). Equation (23) is the prior knowledge of G.
When we integral out G according to (22) and (23), i
obeys the base distribution G0 . In our cluster-based CRN, when
one cluster collects the hyperparameter information i =
{1 , 2 , . . . , i1 , i+1 , . . . , C } from other clusters, the base
distribution G0 is updated, and we have [36]
1
G0 +
p(i | , , G0 ) =
+C 1
+C 1
i
C
k (25)
k=1,k=i
where k represents a mass point concentrate at k with

probability 1/( + C 1). {
k }K
k=1 (K C) represents a set
of distinct hyperparameters in {k }C
k=1 . We assume that there
number
of
clusters
that
choose
k in {
k }K
are ni
k=1 . Then,
k
(25) can be further written as
p(i |i , , G0 ) =
1
G0 +
+C 1
+C 1
K

k=1
ni
k . (26)
k
In (27), we can see that the number of mass points is infinite.

However, the total number of unique values of k is finite.
Hence, we can use finite approximation to represent DP via a
modified distribution G, i.e.,
G=
J
lk k
(32)
k=1
where lk represents the weight of mass point k , and

lk = 1. Hence, (22) can be rewritten as
J
k=1
J

p(i |G) = p i |{lk }k=1,J , {k }k=1,J =
lk k (33)
k=1
where J is the number of unique values of the hyperparameter,

obviously, J C. Moreover, {l1 , l2 , . . . , lJ } obeys the Dirichlet distribution
{l1 , l2 , . . . , lJ } Dir(1 , 2 , . . . , J ).
(34)
B. Automatically Grouping and Distributed

Information Exchange
Based on the preceding DP, the hidden model shown in (5)
can be defined as

j |j , 0 N j j , 01 E , j = 1, 2, . . . , C
, k = 1, 2, . . . , M
j,k |zj ,k N 0, 01 z1
j ,k
0 Ga(a, b)
j |{lk }k=1,J , {k }k=1,J
J
lk k
k=1
zj Categorical(l1 , l2 , . . . , lJ )
{l1 , l2 , . . . , lJ } Dir(1 , 2 , . . . , J )
(35)
where zj is an index variable to indicate to which group

the cluster j belongs. In the DP model, we are interested in
{k }k=1,J and {zj }j=1,C , which are the required information
for spectrum decision [see (57) and (58)]. The lower bound of
the marginal log-likelihood function can be written as

q(z, l)[log p(, z, l| , )log q(z, l)] dzdl

( , ) =

=
q(z, l)[log p(, z, l| , )log q(z, l)] dzdl

=
q(l)
C

q(zj )
j=1
log p(l|) +
C

log p(zj |l) + log p j |zj
j=1
log q(l)
C
log q(zj )
j=1
dzdl.
( ) =
k (k ) =
J

k=1
C
k (k )
C

j,k log
Bj,k = E +
M
1
1
k,t
j,t Tj,t + k,n
j,n Tj,n
t=1,t=n
1
= Bj,k,n + k,n
j,n Tj,n
(41)
where Bj,k,n (n = 1, 2, . . . , M ) is used to denote the accumu
, k,2
, . . . , k,n1
, k,n+1
, . . . , k,M
}. In
lated effects of {k,1
(41), we separate the contribution of k,n from other items. The

matrix determinant and inverse identities in (38) can then be
rewritten as

1 T
1
(42)
j,n Bj,k,n
j,n
|Bj,k | = |Bj,k,n | 1 + k,n
1
1
Bj,k
= Bj,k,n
1
1
Bj,k,n
j,n Tj,n Bj,k,n
B 1
k,n
j,k,n j,n
(43)
Substituting (42) and (43) into (38), we can further solve

(38) as
k (k )
=

C

1
1
j,k (mj + 2a) log jT Bj,k,n
j + b
2 j=1

+ log |Bj,k,n | + const

C

1
1
j,k log 1 + k,n

sj,k,n + (mj + 2a)
2 j=1

2
qj,k,n
/ej,k,n
log 1
k,n + sj,k,n
(37)
j,k log p (j |k )
j=1
obtain the optimal values = {1 , 2 , . . . , J }, we decompose the matrix Bj,k in the same way as what we have done in
(14), i.e.,
(36)
We can use the variational Bayesian inference, i.e.,

a variational posterior distribution q({zj}j=1,C ,{lk}k=1,J) = C
j=1 q(zj)
q({lk }k=1,J ), to approximate the true posterior p({zj }j=1,C ,
{lk }k=1,J |{j }j=1,C ) [30]. In (36), estimation of and can
be obtained by maximizing the lower bound ( , ) via the
expectation-maximization (EM) algorithm as follows:
1) In the E-step, is estimated by maximizing ( , )
given = {k }k=1,J as the latest estimated values.
Specifically, q(l) and q(z) are updated separately by
maximizing the lower bound in (36) given other q()
values and .
2) In the M-step, the values of are estimated by maximizing (36) given the most current values of , q(l), and
q(z). Letting j,k = q(zj = k), (36) then becomes
815
+ k k,n
= k k,n
(44)
where
1
sj,k,n = Tj,n Bj,k,n
j,n
p(j |j , 0 )p (j |k , 0 )
p(0 |a, b)dj d0

C

1
1
=
j,k (mj + 2a) log jT Bj,k
j + b
2 j=1

+ log |Bj,k | + const
(38)
where
T
Bj,k = E + j A1

k j
.
Ak = diag k,j
j=1,M
1
qj,k,n = Tj,n Bj,k,n
j
j=1
(39)
(40)
From (37) and (38), we can see that the elements of =

{1 , 2 , . . . , J } are independent to each other and can thus
be obtained separately by maximizing (38) in the M-step. To
1
ej,k,n = jT Bj,k,n
j + 2b.
(45)
Equation (44) indicates the dependence of k (k ) on the
, which can be isolated from all the other

hyperparameter k,n
parameters k,n . We assume that cluster j has Nj nodes,

and the sub-Nyquist sampling rates of these Nj nodes are
) in (44) can be fur{mj,1 , mj,2 , . . . , mj,Nj }. Hence, k (k,n

ther rewritten as
Nj
C

1
1
j,k
sj,l,k,n
log 1 + k,n
k k,n =
2 j=1
l=1

2
/ej,l,k,n
qj,l,k,n
+(mj,l + 2a) log 1
. (46)
k,n + sj,l,k,n
816
According to [30], j,k and (1 , 2 , . . . , J ) can be updated

in the E-step for cooperative CS inversion. We extend [30]
to our multicluster CRN application and consider the contributions of all member nodes in each cluster. We assume that
all member nodes in cluster j share the same membership
j,k (k = 1, 2, . . . , J). Then, we have [30]
Nj
erj,l,k
l=1
Nj
J
j,k =
(47)
erj,l,m
m=1 l=1
1
+
j,k
J j=1
C
k =
(48)
where

rj,l,k = (k )
J

m
m=1

1
T
1
(mj,l + 2a) log j,l
Bj,l,k
j,l + b
2

(49)
+ log |Bj,l,k |
(x) =
log (x)
.
x
(50)
Maximizing k (k,n
) in (46) (i.e., (k (k,n
))/k,n
= 0)
cannot be solved directly in a close-loop format because the
denominator of each factor is a second-order polynomial
of
. Moreover, the entire equation is the sum of C

k,n
j=1 Nj
factors. Hence,
we will obtain a complex
equation with the orC

der of 2( C
j=1 Nj 1) + 1 = 2
j=1 Nj 1, which cannot
be solved in close loop. As what we have done in (20) and (21),
sj,l,k,n (this is an empirical

here we also assume that k,n
) in the M-step,
result stated in [29]). By maximizing k (k,n
we get
if
C
j,k
j=1
Nj
2

(mj,l + 2a)qj,l,k,n
/ej,l,k,n sj,l,k,n

>0
2
l=1 sj,l,k,n sj,l,k,n qj,l,k,n /ej,l,k,n
C
k,n
Nj j,k
j=1
C
j,k
j=1
Our proposed DP-based hierarchical BCS algorithm:
and the corresponding j,n (k =

(1) Initialize k , k,n
1, 2, . . . , J, j = 1, 2, . . . , C, and n = 1, 2, . . . , M ).
(2) The member node l in cluster j updates rj,l,k according
2
/ej,l,k,n
to (49) and (50), and calculates ((mj,l + 2a)qj,l,k,n
2
sj,l,k,n )/(sj,l,k,n (sj,l,k,n qj,l,k,n /ej,l,k,n )) based on its local
observations. Those two values will be collected by the CH in
cluster j.
(3) The CH in cluster j updates the membership j,k and k
according to (47) and (48), respectively.
(4) For k = 1, 2, . . . , J, the CH in cluster j selects a candi
according to (51) and (52).
date basis j,l,n and updates k,n
Here, we choose the element k,n with the maximal increment
) [see (46)] in each iteration.

k (k,n
N j
((mj,l +
(5) The CH broadcasts the fusion result j,k l=1
2
/ej,l,k,n sj,l,k,n )/(sj,l,k,n (sj,l,k,n
2a)qj,l,k,n
2
/ej,l,k,n )) to its neighboring CHs.
qj,l,k,n
(6) Check algorithm terminating criterion, which could be
a) an upper bound of iteration times or b) the increment of
( , ) in each iteration being less than a threshold. (Note:
when ( , ) cannot be increased much, that means we almost
reach the maximum of the likelihood. Then we can stop the
iterations since our goal is to seek the ML). If either of them
meets, then stop; otherwise, go back to step (2).
Nj
2
/ej,l,k,n sj,l,k,n
(mj,l +2a)qj,l,k,n
l=1
2
sj,l,k,n (sj,l,k,n qj,l,k,n
/ej,l,k,n )
(51)
else
= .
k,n
Reference [37] pointed out that if the joint distribution of

hidden variables belongs to a curved exponential family, then
the EM algorithm can find a stationary value of the likelihood
function. In our case, p(z, l| , ) = p(z|l)p(l|), where p(z|l)
follows a Categorical distribution and p(l|) follows Dirichlet
distribution. Since Categorical distribution and Dirichlet distribution both belong to exponential family, and the Dirichlet
distribution is the conjugate prior of the Categorical distribution, p(z, l| , ) should belong to a curved exponential family.
Hence, our proposed EM algorithm will finally converge to a
stationary point.
After the marginal log-likelihood function [see (36)] converges to a stationary point, we obtain {k , k = 1, 2, . . . , J} as
well as the membership {j,k , j = 1, 2, . . . , C, k = 1, 2, . . . , J}.
In our proposed DP-based hierarchical BCS model, we update
k by monotonically increasing the likelihood function ( )
in each iteration until the convergence is achieved.
In the foregoing discussions, we fully exploited the spatial relationship among the CS observations from all clusters to infer
spectrum map. To further increase the accuracy of spectrumsensing decision, we next employ the HMM to exploit the
time-domain relevance of subcarrier states and select the most
possible candidate k for the final spectrum decision.
(52)
N j
Hence, a CH exchanges its fusion result j,k l=1 ((mj,l +

2
2
/ej,l,k,n sj,l,k,n )/(sj,l,k,n (sj,l,k,n qj,l,k,n
/ej,l,k,n ))
2a)qj,l,k,n
with other CHs in each iteration. In (52), k,n = means that

channel n is unoccupied by PUs and other SUs. The detailed
steps of our proposed algorithm are described as follows.
C. HMM
In Fig. 3, the relationship between hidden subcarrier states
and CS observations is plotted. Since the subcarrier states
should be time relevant, only a small number of subcarriers change their binary states between two consequent CS
observations.
Fig. 3.
Relationship between hidden subcarriers states and CS observations.
817
After going through {k (t)}k=1,J in (54), we can obtain

a candidate k (t) with the maximal value as the final hyperparameter for cluster j. As we did in Section III, here
we again only consider the real part Re{} in math analysis
since the imaginary part Im{} can be analyzed in the same
manner. Hence, we can get two hyperparameters k (t) and
z (t) for the real and imaginary parts, respectively. The final
binary subcarrier state is determined by a threshold and two
hyperparameters, i.e.,

2
(t)
(t) 2 < threshold
k,n
+ z,n
if
else
dn (t) = 1
(57)
dn (t) = 0
(58)
where n {1, 2, . . . , M }.
Fig. 4.
First-order Markov model for each subcarrier state.
D. Identify Spectrum Decision
Here, we consider the time-domain relevance when assigning

the final hyperparameter k to each cluster for the time tth
spectrum sensing. In Fig. 3, the previous states are considered
in the HMM as well as the final spectrum decision. The probability that cluster j selects k (t) as its hyperparameter can be
calculated, and the hyperparameter that leads to the maximal
probability will be selected as the final hyperparameter, i.e.,
(53)
Vk (1) (1) = p (j (1)|k (1)) p (k (1))
Vk (t) (t) = p (j (t)|k (t)) max

p
(
(t)|
(t
1))
k
z
z (t1)
Vz (t1)
(t 1)
Vz (t) (t) .
(t) = arg max
z (t)
(54)
(55)
Equations (53)(55) are the standard Viterbi algorithms with

the recurrence calculations. The computation complexity of the
Viterbi algorithm is O(J 2 ). Here, Vk (t) (t) is the probability of
the most probable hyperparameter sequence that is responsible
for the first t observations. Moreover, the corresponding k (t)
is used for the final hyperparameter to make a decision on the
subcarriers states.
In (53) and (54), p(j (t)|k (t)) is equal to the membership
j,k in (47). Moreover, the transition probability between two
adjacent hidden hyperparameters is considered a first-order
Markov model [38]. Note that we are only interested in the binary subcarrier state dk,n (t) that corresponds to two cases, i.e.,
(t) < and k,n

(t) = , instead of the exact value
0 < k,n
of k,n (t). Hence, the transition probability (p(k (t)|z (t

1)) can be further rewritten as
p (k (t)|z (t 1)) p {dk (t)|dz (t 1)} .
(56)
The transition probability of the binary subcarrier state is

described as a Markov model in Fig. 4. Here, we assume the
current subcarrier state (tth time) has relationship only with
the last subcarrier state (i.e., t 1th time). The nth element of
dz (t 1) and dk (t), i.e., dz,n (t 1) and dk,n (t), only has one
state (1 or 0). In our first-order Markov model, we assume
that the transition probabilities p0,0 , p0,1 , p1,0 , and p1,1 are fixed
[39][41] and can be detected by SUs.
After we apply the previously discussed spatiotemporal data

mining scheme, we then further employ the spectrum assignment records in neighboring CHs to identify which subcarriers are occupied by PUs, and the remaining subcarriers are
regarded as the interference from the SUs in other clusters. The
subcarriers temporally occupied by SUs in neighboring clusters
are also spectrum opportunities that can be accessed through
negotiation or competition schemes among CHs. The binary
spectrum decision dn (t) obtained previously is a mixture of
PUs and SUs occupancy state in each subcarrier. Each CH
should send out the binary spectrum decision dn (t) and its
corresponding sensing time to its neighboring CHs.
When the neighboring CHs receive dn (t) and the corresponding sensing time, they revise the subcarriers state assigned for their data transmissions during the sensing time
(i.e., change dn (t) = 1 to dn (t) = 0) and return the modified
binary spectrum decision to the source CH. After exchanging
the results with the neighboring CHs, a CH will obtain the final
binary spectrum decision, which is determined only by the PUs
activities.
V. P ERFORMANCE A NALYSIS
In this section, we will test the performance of our proposed
spatiotemporal data mining algorithm and compare it with
orthogonal matching pursuit (OMP) [35], single-task BCS
[34], decentralized fusion scheme [12], and MT-CS [29]. These
schemes were chosen for comparison because they have all
used CS algorithms to sample spectrum and then recover the
spectrum map. The OMP algorithm is a fast greedy strategy
that iteratively selects the basis functions most aligned with
the current residual, and its solution is based on the 1 -norm
[28]. The BCS algorithm builds a hierarchical sparseness prior
and uses the relevance vector machine for single-task BCS
inversion. The OMP and BCS algorithms are executed in each
individual SU, and the spatial diversity from the other SUs
CS observations is not considered. The decentralized fusion
scheme [12] uses the 1 -norm to collect spatial diversity against
wireless fading, and the MetropolisHastings weight set is
adopted to enforce consensus spectrum decision. The MT-CS
algorithm is an extension of BCS and considers the case that
818
Fig. 5. Characteristics of the ISI channels.
multiple users are detecting one common sparseness signal

simultaneously. The MT-CS algorithm assumes that there is
a common hyperparameter shared among all tasks and tries
to discover this common hyperparameter based on multiple
observations from different users. The decentralized fusion
scheme, MT-CS, and our proposed spatiotemporal data mining
algorithms have all used MT-CS.
In our simulation, we consider a CRN that consists of 20
clusters, and each cluster has five member nodes. There are
M = 512 subcarriers available. We assume that there are in
total I = 38 PUs in the CRN, and each PU occupies one subcarrier. In these 20 clusters, each group with two neighboring clusters share one common sparseness spectrum, and 20 PUs are
assumed to exist in the area of those two clusters. Hence, there
exists common sparseness spectrum among several clusters,
but all clusters are unlikely to share one common sparseness
spectrum due to different PU situations (see Section IV).
Since frequency-selective fading exists in ISI channels, we
assume that each signal received by an SU, experiences one of
the following three channels (see list below). These three fixed
ISI channels are selected from the examples used in [42] and
[43]. The characteristics of these three channels are plotted in
Fig. 5. We can see the following:
1) Channel A: h = [0.407, 0.815, 0.407], which is a spectralnull channel;
2) Channel B: h = [0.8, 0.6], although it does not have
spectral nulls, its Fourier transformation values at some
frequencies are small;
3) Channel C: h = [0.0001 + 0.0001j, 0.0485 + 0.0194j,
0.0573+0.0253j, 0.0786 + 0.0282j, 0.0874 + 0.0447j,
0.9222 + 0.3031j, 0.1427 + 0.0349j, 0.0835 + 0.0157j,
0.0621 + 0.0078j, 0.0359 + 0.0049j, 0.0214 + 0.0019j],
which does not have spectral null or small Fourier
transformation values.
For the comparison purpose (with BCS), we initialize 0 =
102 /std()2 [34]. As in [29], the zero-mean Gaussian noise
with standard deviation (i.e., 0.05) is added to each of the measurements. We set a = 102 /std()2 and b = 1 [29] in MT-CS
and our proposed spatiotemporal data mining algorithms such
that the mean of the Gamma prior Ga(0 |a, b) is aligned with
the fixed value of 0 . We assume that the hidden state transformation model can be obtained by SUs. For simplicity, we set
p0,0 = p1,1 = 0.7 and p0,1 = p1,0 = 0.3 in our simulation.
Fig. 6. Performance comparisons of the reconstructed signals in terms of

normalized MSE.
A. Performance of Reconstruction Errors

We first study the performance of spectrum map reconstruction errors during the CS inversion process in our proposed
spatiotemporal data mining algorithm. We use the normalized
MSE of the reconstructed spectrum as in [29], [30], and [34],
i.e.,
!
!
!
! P
P
P
(59)
!R R
reconstructed ! /R 2 .
2
In our simulation, there are five SUs in each cluster. Each

SU collects 100 consecutive CS observations and analyzes
these observations using different CS inversion algorithms,
i.e., OMP, BCS, decentralized fusion scheme, MT-CS, and our
proposed spatiotemporal data mining. The normalized MSE of
the reconstructed signal is plotted in Fig. 6.
From Fig. 6, it can be seen that our proposed algorithm
and MT-CS both perform better than other schemes (OMP,
BCS, and decentralized fusion scheme). This is due to the
use of multiple observations from different SUs to discover
the hidden hyperparameter for spectrum modeling. However,
the observations from all SUs do not share the same sparseness spectrum. Our proposed algorithm uses the DP prior to
automatically group and discover the hidden hyperparameters
in each group simultaneously. Furthermore, the time-domain
dependency among the consecutive CS observations is also
exploited since the hidden subcarriers state does not change
dramatically. Hence, our proposed algorithm performs better
than the MT-CS algorithm.
Although the performance of the MT-CS algorithm is closer
to our scheme compared with the other three schemes, the
reconstruction errors of our proposed algorithm are reduced
by 15% to 22% (corresponding to 40 to 100 points of measurements, respectively) in terms of normalized MSE. Such
an improvement (15% or 22%) is important since this means
that we could achieve 15% or 22% higher spectrum map
reconstruction accuracy in each SU. Note that the CS signal
reconstruction accuracy has been a challenging issue due to the
approximation nature of 1 -normalization that is used in most
CS reconstruction methods. From Fig. 6, one can see that our
proposed algorithm has the smallest normalized MSE, which
indicates that our scheme has the best spectrum-sensing performance among all the schemes. The improvement in spectrumsensing can be further seen in Fig. 7 (to be elaborated next),
Fig. 7.
Comparisons of spectrum detection performance. Number of measurements: (a) 40. (b) 50. (c) 60. (d) 70. (e) 80. (f) 90. (g) 100.
819
820
particularly when the probability of correct detection PCD = 1

(which means no interference to the PUs).
Another important advantage of our scheme is that the HMM
is integrated into our proposed algorithm, which means that a
channel state prediction capability can be further exploited. For
example, we could use HMM to predict what channels will have
high quality, and send data only in those channels. When such
a prediction capability is equipped with the spectrum-sensing
algorithm, the spectrum-sensing overhead can be dramatically
reduced. Our next-step research will design a cross-layer CRN
protocol based on the HMM prediction results.
B. Performance of Spectrum Sensing

The main goal of spectrum sensing is to detect the characteristics of spectrum holes, and then these detection results can be
used by the spectrum decision module (see Fig. 1). Hence, here
we will present the spectrum hole detection performance for
our proposed algorithm [particularly (57) and (58)]. Since the
key metrics in spectrum sensing are the probability of correct
spectrum detection PCD and the probability of false alarm
PFA [31], we will evaluate these two metrics by comparing
the binary spectrum sparseness decision dPU with the true
subcarriers state dPU as follows [12]:
"
#
dTPU (dPU = dPU )
PCD = E
(60)
1T dPU
"
#
(1 dPU )T (dPU = dPU )
PFA = E
.
(61)
M 1T dPU
In CRN, we expect that the spectrum-sensing scheme can
provide a high probability of correct detection PCD and a low
probability of false alarm PFA . A high probability of correct
detection PCD can reduce the interference to the PUs [31],
whereas a low probability of false alarm PFA can provide more
bandwidth to the SUs. The spectrum-sensing performance of
our proposed algorithm is shown in Fig. 7.
For comparison purposes, we also plot the spectrum hole
detection performance of OMP, BCS, decentralized fusion
scheme, and the MT-CS algorithm in Fig. 7. The number of
measurements for each observation varies from 40 to 100. In
Fig. 7, it can be seen that with the increase of the number of
measurements, the performance of the OMP and BCS algorithms improves gradually. The more measurements acquired,
the more spectrum information obtained, and thus the better
spectrum holes detection performance we can achieve. Note
that the local observation is the only information exploited
in the OMP and BCS algorithms [see Fig. 7(a)(g)]. However, increasing the number of measurements alone cannot
dramatically improve the performance of our spatiotemporal
data mining scheme, and the same conclusion holds for both
decentralized fusion scheme and MT-CS algorithm, both of
which can collect spatial diversity information from other SUs.
Fig. 7 shows that under the same PCD and PFA requirements,
our proposed algorithm can further reduce the sampling rate
requirement and thus has a lower sampling cost.
Fig. 8. Tradeoff between the number of member nodes and the number of
measurements.
In Fig. 7, one can also see the following: 1) When the number
of measurements increases, a higher probability of correct
detection can be achieved since a richer spectrum information
can be exploited in each SU. Meanwhile, the spectrum-sensing
performance curves start to deviate from the X-axis more obviously (see Fig. 7) since more spectrum information exchange
among SUs also helps to reduce the probability of false alarm.
Under such a case, Fig. 7 clearly shows that our scheme
has the lowest probability of false alarm among all schemes.
2) In CRN, since interference to PUs is not allowed, one may be
only interested in the PCD = 1 case. For such a case (i.e., when
PCD = 1), Fig. 7 shows that the probability of false alarm in
our proposed algorithm is reduced by 70% compared with the
MT-CS algorithm. This is because more spectrum holes can be
accurately detected via our proposed algorithm.
C. Tradeoff Between the Number of Measurements and the
Number of Tasks
Since our proposed spatiotemporal data mining is a multitask
spectrum-sensing algorithm, we can strike a good balance
between the number of member nodes in each cluster and
the number of spectrum measurements. In Fig. 8, we plot the
normalized MSE performance for three different cases (i.e., 1,
3, and 5 member nodes in each cluster, respectively). From
Fig. 8, it can be seen that if the number of tasks is reduced
(i.e., a smaller number of member nodes in each cluster),
we can increase the number of measurements to collect more
information for the CS inversion process.
When the required normalized MSE values are 0.12, 0.13,
0.14, and 0.15, respectively, the number of measurements for
cluster size = 5 can be reduced by 7.4%, 7.7%, 9.4%, and
3.9%, respectively, compared with the case of cluster size = 3.
Hence, under the same performance requirements (here we use
normalized MSE), the CRN with more nodes in each cluster
can reduce its sampling rate and thus lower the sampling cost.
VI. C ONCLUSION
In this paper, we proposed spatiotemporal data mining
schemes for low-cost spectrum sensing in CRNs. First, we
employed the DP-based MT-CS method to group the observations from different clusters that may not share one common
sparseness spectrum. Meanwhile, the BCS inversion was used
to infer the hyperparameter for each group as well as the

emission probability for each hidden subcarriers state. In addition to the DP-based spatial spectrum-sensing model, we also
used the HMM to further exploit the time-domain dependency
among consecutive observations. The Viterbi algorithm was
used to deduce the hidden hyperparameter to make a correct
spectrum decision. Finally, the spectrum assignment records in
the neighboring CHs were used to identify which subcarriers
are occupied by the PUs, and the others can be regarded as the
interference from the SUs in other clusters. Our simulation results illustrated the efficiency of our proposed spectrum-sensing
algorithm, which utilizes the spatial and temporal data mining
method to discover the hidden subcarriers state. Our results
also show the following: 1) Our proposed algorithm produces
the smallest normalized reconstruction MSE compared with the
other four CS-based algorithms. 2) It has the best spectrum
hole detection performance in terms of the two key metrics, i.e.,
the probability of correct detection and the probability of false
alarm. The results illustrate that spatiotemporal data mining
can effectively collect the spatial diversity information from
different SUs and reflect the time-domain dependency among
consecutive spectrum observations.
Our future work will focus on cross-layer CRN protocol
design to maintain a stable spectrum-sensing performance and
thus provide a more stable quality of service (such as delay,
throughout, etc.) for SUs traffic.
A PPENDIX
D ETAILED S TEPS FOR (12)
In (12), the ML function can be rewritten as
() =
N

j=1
=
=
N

j=1
N
&
' m2j
' 12
M &
2
2
=
0
0 k
$ k=1

1 T T
0
0
j Tj j +A
exp jT j
j j
2
2

1 T
Tj j +A j Tj j +A
j j
%

0
1 T
j j dj
+ jT j Tj j +A
2
mj
$ & ' 2
' 12
M &
2
2
=
0
0 k
$ k=1

1 T T
0
j Tj j +A
exp
j j
2 T
j j +A
%

1 T (
j Tj j +A
j j dj

)

1 T (
0
j j . (63)
exp jT E j Tj j +A
2
From (63), one can see that p(j |, 0 ) follows a zeromean Gaussian distribution, with covariance matrix {0 [E
j (Tj j + A)1 Tj ]}1 = (1/0 )(E + j A1 Tj ). Hence,
(62) can be further rewritten as
p(j |j , 0 )p(j |, 0 )p(0 |a, b)dj d0
log

log
N
$
%

log p(0 |a, b) p(j |j , 0 )p(j |, 0 )dj d0
j=1
p(0 |a, b)
%
p(j |j , 0 )p(j |, 0 )dj
d0
where
p(j |,
0 )
p(j |j , 0 )p(j |, 0 )dj
&
' m2j

2
0
=
exp j j j 22
0
2
' 12
M &

2
0 k
j,k 22 dj
exp
0 k
2
k=1
& ' m2j
' 12
M &
2
2
=
0
0 k
k=1

0
exp (j j j )T (j j j )
2

0
exp jT Aj dj
2
=
' m2j
' 12
M &
2
2
=
0
0 k
k=1

0 T
j j jT j j jT Tj j
exp
2

T T
T
+j j j j +j Aj dj
j=1
&
()
log p(j |)
(62)
821
N
&
2
0
' m2j
1

1
E +j A1 T 2
j=1
j

1
0
exp jT E +j A1 Tj
j d0
2
$

N
mj

mj
1
2
=
log (2) 2
|a,
b)
p(
0
1
0
T
2
1

E +j A j
j=1
%

0 T
1 T 1
exp j E +j A j
j d0
2
$

N

mj
1
ba a1+ m2j
=
log (2) 2
1
(a) 0
E +j A1 T 2
j=1
j
$
1
exp 0 b+ jT
2
%
%

1 T 1
E +j A j
j d0

mj
1
ba
log (2) 2
=
1

E +j A1 T 2 (a)
j=1
j
1

mj

1 a+ 2
j
b+ 12 jT E +j A1 Tj
=
log
p(0 |a, b)
822
N

1
mj
log(2) logE +j A1 Tj
2
2
j=1
& a '
b
mj
+log
a+
2
(a)

1
1 T
j
log b+ j E +j A1 Tj
2
N

1
(mj +2a) log jT Bj1 j +2b
=
2 j=1

+logE +j A1 Tj
& a '
%
N $

b
mj
mj
log(2)+log
log2
+
+ a+
2
(a)
2
j=1
1
=
(mj +2a) log jT Bj1 j +2b
2 j=1
N
+ log |Bj |] + Const
(64)
where Bj = E + j A1 Tj , and E is the identity matrix. We

can also see the existence
of a constant in the foregoing result,

a
{(m
that is, Const = N
j /2) log(2) + log(b /(a)) +
j=1
(a + (mj /2)) log 2}.
R EFERENCES
[1] Spectrum policy task force report, FCC, Washington, DC, Rep. Et
Docket 02-135, Nov. 2002.
[2] H. Khalife, S. Ahuja, N. Malouch, and M. Krunz, Probabilistic path selection in opportunistic cognitive radio networks, in Proc. IEEE GLOBECOM, Dec. 2008, pp. 15.
[3] X.-L. Huang, G. Wang, F. Hu, and S. Kumar, Stability-capacity-adaptive
routing for high-mobility multihop cognitive radio networks, IEEE
Trans. Veh. Technol., vol. 60, no. 6, pp. 27142729, Jul. 2011.
[4] I. F. Akyildiz, W. Lee, M. C. Vuran, and S. Mohanty, Next generation/
dynamic spectrum access/cognitive radio wireless networks: A survey,
Comput. Netw., vol. 50, no. 13, pp. 21272159, Sep. 2006.
[5] K. R. Chowdhury and M. D. Felice, SEARCH: A routing protocol for
mobile cognitive radio ad-hoc networks, Comput. Commun., vol. 32,
no. 18, pp. 19831997, Dec. 2009.
[6] G. Zhu, I. F. Akyildiz, and G. Kuo, STOD-RP: A spectrum-tree based
on-demand routing protocol for multi-hop cognitive radio networks, in
Proc. IEEE GLOBECOM, 2008, pp. 15.
[7] H. Kushwaha, Y. Xing, R. Chandramouli, and H. Heffes, Reliable multimedia transmission over cognitive radio networks using fountain codes,
Proc. IEEE, vol. 96, no. 1, pp. 155165, Jan. 2008.
[8] T. Ycek and H. Arslan, A survey of spectrum sensing algorithms for
cognitive radio applications, IEEE Commun. Surveys Tuts., vol. 11, no. 1,
pp. 116130, 1st Quart, 2009.
[9] R. Tandra and A. Sahai, Fundamental limits on detection in low
SNR under noise uncertainty, in Proc. IEEE WNCMC, 2005, vol. 1,
pp. 464469.
[10] S. Shankar, C. Cordeiro, and K. Challapali, Spectrum agile radios:
Utilization and sensing architectures, in Proc. IEEE DySPAN, 2005,
pp. 160169.
[11] M. Ghozzi, F. Marx, M. Dohler, and J. Palicot, Cyclostationarility-based
test for detection of vacant frequency bands, in Proc. IEEE CROWNCOM, 2006, pp. 15.
[12] F. Zeng, C. Li, and Z. Tian, Distributed compressive spectrum sensing
in cooperative multihop cognitive networks, IEEE J. Sel. Topics Signal
Process., vol. 5, no. 1, pp. 3748, Feb. 2011.
[13] M. R. Duarte, M. B. Wakin, D. Baron, and R. G. Baraniuk, Universal
distributed sensing via random projections, in Proc. IEEE IPSN, 2006,
pp. 177185.
[14] Y. Wang, A. Pandharipande, Y. Polo, and G. Leus, Distributed compressive wide-band spectrum sensing, in Proc. IEEE ITA, 2009, pp. 178183.
[15] M. E. Yildiz, T. C. Aysal, and K. E. Barner, In-network cooperative
spectrum sensing, in Proc. EUSIPCO, 2009, pp. 15.
[16] Z. Li, F. R. Yu, and M. Huang, A distributed consensus-based cooperative

spectrum sensing scheme in cognitive radios, IEEE Trans. Veh. Technol.,
vol. 59, no. 1, pp. 383393, Jan. 2010.
[17] J. A. Bazerque and G. B. Giannakis, Distributed spectrum sensing for
cognitive radio networks by exploiting sparsity, IEEE Trans. Signal
Process., vol. 58, no. 3, pp. 18471862, Mar. 2010.
[18] Z. Tian, Compressed wideband sensing in cooperative cognitive radio
networks, in Proc. IEEE GLOBECOM, 2008, pp. 15.
[19] L. Xiao, S. P. Boyd, and S.-J. Kim, Distributed average consensus with
least-mean-square deviation, J. Parallel Distrib. Comput., vol. 67, no. 1,
pp. 3346, Jan. 2007.
[20] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, Consensus in Ad Hoc
WSNs with noisy linksPart I: Distributed estimation of deterministic signals, IEEE Trans. Signal Process., vol. 56, no. 1, pp. 350364, Jan. 2008.
[21] G. Ganesan and Y. Li, Cooperative spectrum sensing in cognitive radio
networks, in Proc. IEEE DySPAN, 2005, pp. 137143.
[22] G. Ganesan and Y. Li, Cooperative spectrum sensing in cognitive radio,
Part II: Multiuser networks, IEEE Trans. Wireless Commun., vol. 6, no. 6,
pp. 22142222, Jun. 2007.
[23] C. Sun, W. Zhang, and K. B. Letaief, Cluster-based cooperative spectrum sensing in cognitive radio systems, in Proc. IEEE ICC, 2007,
pp. 25112515.
[24] Q. Zhao, L. Tong, and A. Swami, Decentralized cognitive MAC for
dynamic spectrum access, in Proc. IEEE DySPAN, 2005, pp. 224232.
[25] Q. Zhao, L. Tong, A. Swami, and Y. Chen, Decentralized cognitive MAC
for opportunistic spectrum access in ad hoc networks: A POMDP framework, IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589600, Apr. 2007.
[26] M. Gandetto and C. Regazzoni, Spectrum sensing: A distributed approach for cognitive terminals, IEEE J. Sel. Areas Commun., vol. 25,
no. 3, pp. 546557, Apr. 2007.
[27] C. Sun, W. Zhang, and K. B. Letaief, Cooperative spectrum sensing
for cognitive radios under bandwidth constraints, in Proc. IEEE WCNC,
2007, pp. 15.
[28] E. J. Cands, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,
IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489509, Feb. 2006.
[29] S. Ji, D. Dunson, and L. Carin, Multitask compressive sensing, IEEE
Trans. Signal Process., vol. 57, no. 1, pp. 92106, Jan. 2009.
[30] Y. Qi, D. Liu, D. Dunson, and L. Carin, Multi-task compressive sensing
with Dirichlet process priors, in Proc. ACM ICML, 2008, pp. 768775.
[31] X.-L. Huang, G. Wang, F. Hu, and S. Kumar, The impact of spectrum
sensing frequency and packet-loading scheme on multimedia transmission
over cognitive radio networks, IEEE Trans. Multimedia, vol. 13, no. 4,
pp. 748761, Aug. 2011.
[32] Z. Quan, S. Cui, and A. H. Sayed, Optimal linear cooperation for spectrum sensing in cognitive radio networks, IEEE J. Sel. Topics Signal
Process., vol. 2, no. 1, pp. 2840, Feb. 2008.
[33] M. E. Tipping and A. Faul, Fast marginal likelihood maximisation for
sparse Bayesian models, in Proc. AISTATS, 2003, pp. 35.
[34] S. Ji, Y. Xue, and L. Carin, Bayesian compressive sensing, IEEE Trans.
Signal Process., vol. 56, no. 6, pp. 23462356, Jun. 2008.
[35] J. A. Tropp and A. C. Gilbert, Signal recovery from random measurements via orthogonal matching pursuit, IEEE Trans. Inf. Theory, vol. 53,
no. 12, pp. 46554666, Dec. 2007.
[36] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei, Hierarchical Dirichlet processes, J. Amer. Stat. Assoc., vol. 101, no. 476, pp. 15661581,
Dec. 2006.
[37] C. F. Jeff Wu, On the convergence properties of the EM algorithm, Ann.
Stat., vol. 11, no. 1, pp. 95103, Mar. 1983.
[38] H. Ishikawa, Transformation of general binary MRF minimization to the
first-order case, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 6,
pp. 12341249, Jun. 2011.
[39] J. Borges and M. Levene, Evaluating variable-length Markov chain models for analysis of user web navigation sessions, IEEE Trans. Knowl.
Data Eng., vol. 19, no. 4, pp. 441452, Apr. 2007.
[40] J. Wang, F. Wang, C. Zhang, H. C. Shen, and L. Quan, Linear neighborhood propagation and its applications, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 31, no. 9, pp. 16001615, Sep. 2009.
[41] H.-M. Lu, D. Zeng, and H. Chen, Prospective infectious disease outbreak
detection using Markov switching models, IEEE Trans. Knowl. Data
Eng., vol. 22, no. 4, pp. 565577, Apr. 2010.
[42] X.-L. Huang, G. Wang, and F. Hu, Minimal Euclidean distance-inspired
optimal and suboptimal modulation schemes for vector OFDM system,
Int. J. Commun. Syst., vol. 24, no. 5, pp. 553567, May 2011.
[43] X. G. Xia, Precoded and vector OFDM robust to channel spectral nulls
and with reduced cyclic prefix length in single transmit antenna systems,
IEEE Trans. Commun., vol. 49, no. 8, pp. 13631374, Aug. 2001.
Xin-Lin Huang (S09M12) received the M.E. and

Ph.D. degrees in communication engineering from
Harbin Institute of Technology, Harbin, China, in
2008, and 2011, respectively.
He is an Associate Professor with the Department
of Information and Communication Engineering,
Tongji University, Shanghai, China. He has published over 25 research papers and has two patents.
His research focuses on joint source-channel coding,
OFDM technology, cognitive radio networks, and
machine learning.
Dr. Huang was the recipient of Chinese Government Award for Outstanding
Ph.D. Students in 2010. From August 2010 to September 2011, he was
supported by the China Scholarship Council to do research in the Department
of Electrical and Computer Engineering, University of Alabama, as a Visiting
Scholar. He is a Paper Reviewer for IEEE TRANSACTIONS ON WIRELESS
COMMUNICATIONS, IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY,
IEEE COMMUNICATIONS LETTERS, Wireless Personal Communications, and
the International Journal of Communication Systems.
Gang Wang (M11) received the B.E., M.E., and

Ph.D. degrees in communication engineering from
Harbin Institute of Technology, Harbin, China, in
1984,1987 and 2007, respectively.
He is a Professor with the Communication
Research Center, Harbin Institute of Technology,
Harbin, China. He is the Chairman of the Department
of Communication Engineering. He has published
over 60 research papers and four books. His general
interests include ad hoc networks, wireless communications, and artificial intelligence.
Dr. Wang was the recipient of the National Grade II Prize of Science and
Technology Progress and National Grade III Prize of Science and Technology
Progress.
823
Fei Hu (M12) received the Ph.D. degree in signal processing from Tongji University, Shanghai,
China, in 1999, and the Ph.D. degree in electrical
and computer engineering from Clarkson University,
Potsdam, NY, in 2002.
He is currently an Associate Professor with the
Department of Electrical and Computer Engineering,
University of Alabama, Tuscaloosa, AL. He has published over 170 journal/conference papers and book
chapters. His research has been supported by U.S.
NSF, Cisco, Sprint, and other sources. His research
expertise is in cognitive radio networks and security.

Multitask Spectrum Sensing in Cognitive Radio Networks Via Spatiotemporal Data Mining

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multitask Spectrum Sensing in Cognitive Radio Networks Via Spatiotemporal Data Mining

Загружено:

Авторское право:

Доступные форматы

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO.

Multitask Spectrum Sensing in Cognitive Radio

AbstractRecently, compressive sensing (CS) and spectrum

0018-9545/$31.00 2012 IEEE

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

Fig. 1. Basic cognition cycle [4], [7].

filter detection [9], energy detection [10], and cyclostationary

information exchange. In [12], a decentralized fusion solution

observations. The Viterbi algorithm is used to choose the

II. P ROBLEM S TATEMENT

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

distant SUs in other clusters. 3) The CS algorithm is adopted in

Hi,j (k)Xi (k) +

constituted randomly [29], and FM is the M M DFT matrix

Since the Re{} part is orthogonal to the Im{} part and

where mj is the number of measurements. It is much smaller

where j is a mj 1 vector (mj M ) of the CS observa

where = {1 , 2 , . . . , M } is a hyperparameter. To promote

where Hi,j (k), Xi (k), Gi,j (k), Yi (k), and Wj (k) (k =

where j = Re{RjP + RjC } is a M 1 vector that represents

Gi,j (k)Yi (k) + Wj (k) (2)

Thus, the posterior probability of j based on the observed

where k = {1 , 2 , . . . , k1 , k+1 , . . . , M }. sj,k , qj,k ,

In (11), A is a diagnose matrix (A = diag(1 , 2 , . . . , M )).

= Bj,k + k1 j,k Tj,k

where Bj,k =E + n=k n1j,nTj,n(k = 1, 2, . . . , M ). Hence,

where E is the identity matrix. According to [33], Bj can be

IV. D ISTRIBUTED I NFORMATION E XCHANGE AND

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

Equation (26) clearly shows the important sharing property

k with a large population ni

In (27) and (28), k and k are drawn i.i.d. from a Beta

where k represents a mass point concentrate at k with

In (27), we can see that the number of mass points is infinite.

where lk represents the weight of mass point k , and

where J is the number of unique values of the hyperparameter,

B. Automatically Grouping and Distributed

where zj is an index variable to indicate to which group

q(z, l)[log p(, z, l| , )log q(z, l)] dzdl

where Bj,k,n (n = 1, 2, . . . , M ) is used to denote the accumu

(41), we separate the contribution of k,n from other items. The

Substituting (42) and (43) into (38), we can further solve

j,k log 1 + k,n

We can use the variational Bayesian inference, i.e.,

p(0 |a, b)dj d0

From (37) and (38), we can see that the elements of =

Equation (44) indicates the dependence of k (k ) on the

, which can be isolated from all the other

parameters k,n . We assume that cluster j has Nj nodes,

) in (44) can be fur{mj,1 , mj,2 , . . . , mj,Nj }. Hence, k (k,n

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 62, NO. 2, FEBRUARY 2013

According to [30], j,k and (1 , 2 , . . . , J ) can be updated

. Moreover, the entire equation is the sum of C

sj,l,k,n (this is an empirical

Our proposed DP-based hierarchical BCS algorithm:

and the corresponding j,n (k =

Here, we choose the element k,n with the maximal increment

) [see (46)] in each iteration.

Reference [37] pointed out that if the joint distribution of

Hence, a CH exchanges its fusion result j,k l=1 ((mj,l +

with other CHs in each iteration. In (52), k,n = means that

Relationship between hidden subcarriers states and CS observations.

After going through {k (t)}k=1,J in (54), we can obtain

where Bj,k =E + n=k n1j,nTj,n(k = 1, 2, . . . , M ). Hence,