Sparse Channel Estimation For Massive MIMO Using Quasiorthogonal Pilots - 2 PDF

Sparse Channel Estimation for Massive MIMO
using Quasi-orthogonal Pilots
Ivan Tinjaca
Department of Electrical & Computer Engineering

McGill University
Montreal, Canada
December 2015
A thesis submitted to McGill University in partial fulfilment of the requirements of the

degree of Master of Engineering.
2015
c Ivan Tinjaca
To my parents and my family
ii
Abstract
Massive MIMO has been considered recently as the emerging technology to be used in
fifth generation wireless communication. The use of a large number of antennas provides
many more degrees of freedom when compared to actual fourth generation standards that
use basic MIMO. 4G systems are rapidly approaching to their limit of performance and
soon will be inadequate to supply the increasing wireless demand. Massive MIMO brings
the possibility of establishing communications with more speed, capacity and reliability.
However, as in basic MIMO, the knowledge of the channel affects directly its performance.
With a higher number of antennas, not only the channel estimation techniques present an
increase in the complexity but also, traditional training methods face a limitation in the
quantity of pilots to be used. The pilot contamination effect has been intensively studied
recently and several methods have been proposed to alleviate the interference caused by
the pilots reuse.
In this thesis, the estimation process is done using non-orthogonal sequences with low
correlation and it is compared to the traditional estimation methods. The sparse character-
istics of the channel are also studied when using 2D antenna arrays leading to application
of Matching Pursuit algorithms with low complexity. Motivated by the results, a method
that uses the non-orthogonal sequences and that incorporates 2D space angular trans-
formation of the channel to exploit its sparse characteristics is proposed. Finally, it is
observed that a considerable reduction in the channel estimation error is obtained when
using non-orthogonal sequences altogether with Matching Pursuit algorithms.
iii
Sommaire
L’émergence de systèmes de communication hors-ligne de cinquième génération (5G) témoigne
de l’utilisation croissante des systèmes MIMO massifs. Lorsque l’on compare le standard
de quatrième génération (4G) qui utilise MIMO, on remarque que l’implémentation de
systèmes à grande quantité d’antennes offre plus de degrés de liberté. Cependant, les
systèmes de 4G atteignent rapidement leur limite en performance et l’augmentation con-
stante d’utilisateurs les rendra obsolètes dans les prochaines années. Les systèmes MIMO
massifs donnent la possibilité d’avoir des liaisons avec plus de vitesse, de capacité et de
fiabilité. Néanmoins, comme dans le case MIMO, la connaissance de canal affecte directe-
ment la performance. Avec un plus grand nombre d’antennes, non seulement les techniques
d’estimation de canal sont plus complexes, mais les méthodes d’entraı̂nement traditionnelles
font aussi face à la limitation de quantité de pilotes disponibles. Ce problème, que l’on car-
actérise de contamination de pilotes, a suscité un intérêt spécial chez les investigateurs.
Ces derniers ont donc proposé plusieurs méthodes pour réduire l’interférence causée par la
réutilisation de pilotes.
Dans ce mémoire, on effectue le processus d’estimation en employant des séquences non
orthogonales à basse corrélation, et on les compare avec les méthodes d’estimation tradition-
nelles. D’un autre côté, on étudie aussi les caractéristiques éparses de canal lorsque l’on em-
ploie le réseau d’antennes en 2D et les algorithmes de poursuite par correspondance à basse
complexité. Par la suite, on propose une méthode qui emploie les séquences non orthogo-
nales et la transformation angulaire en 2D pour ainsi bénéficier des caractéristiques éparses
du canal. Finalement, on peut noter une réduction importante dans l’erreur d’estimation
de celui-ci en se servant de séquences non orthogonales combinées aux algorithmes de pour-
suite par correspondance.
iv
Acknowledgements
I have to start by thanking my supervisor Ioannis Psaromiligkos, the professors and in
general McGill University for having given me the opportunity to be part of such a won-
derful program and for allowing me to happily finish my journey here. Also to my lab
companions François, Ardavan, Ahmad, Jad, Dinos, Mahmoud, Kiki, friends at McGill,
Michelle, Lysie, Ryan, Jiaying, Bofan, Janice, Audrey, Raquel, Cassandra, Fiona, Shuang,
Alex, Mary, Jeremy, Joey and all those who were kind enough to exchange some words
with me from time to time and offer me their humble friendship.
Also, it would be very inconsiderate of me not to thank all of those who had somehow
influenced my life until today, not only this work is the product of all the energy they
invested on me when they were around but also I would not be writing this words without
any of them. However, it is a very long list and I cannot mention them all in one page, my
only desire is then to thank also those whose names do not appear here.
And last but not least, my family, Pedro, Maria, Gloria, Alejandro, Daniel and Dario,
who always encouraged me to be someone better, well, I keep trying...
Gracias Totales!
Contents
1 Introduction 1
1.1 4G and current MIMO challenges . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Massive MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Channel Estimation in Massive MIMO . . . . . . . . . . . . . . . . 5
1.3 Thesis organization and contributions . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 Channel Estimation under Pilot Contamination . . . . . . . . . . . . . . . 7
2.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Pilot Contamination . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Sparsity in Massive MIMO . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Compressive Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Matching Pursuit Algorithms . . . . . . . . . . . . . . . . . . . . . 22
2.3 Matching Pursuit with unknown sparsity level . . . . . . . . . . . . . . . . 25
2.3.1 Channel estimation with unknown sparsity level . . . . . . . . . . . 25
2.3.2 Adaptive Matching Pursuit . . . . . . . . . . . . . . . . . . . . . . 26
3 Sparse channel estimation using quasi-orthogonal pilots 29

3.1 Quasi-Orthogonal Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Basic System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Modified System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Proposed Extended Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 39
v
vi
4 Simulation results 41
4.1 Quasi-Orthogonal Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Matching Pursuit with unknown sparsity level . . . . . . . . . . . . . . . . 49
5 Conclusions and Future Research 52

5.1 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
List of Figures
2.1 Multicell environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Angular component ULA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Design procedure for optimal pilots set [1] . . . . . . . . . . . . . . . . . . 30

3.2 Angular components URA . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1 Pilot reuse pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Pilots performance with cell size . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Pilots performance vs delay . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Matching Pursuit algorithms comparison . . . . . . . . . . . . . . . . . . . 45
4.5 Pilot reuse pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Performance of Subspace Pursuit and Least Squares . . . . . . . . . . . . . 47
4.7 Adaptive Subspace Pursuit performance (same sparsity) . . . . . . . . . . 48
4.8 Adaptive Subspace Pursuit performance (random sparsity) . . . . . . . . . 49
4.9 Adaptive Subspace Pursuit performance (as a function of users) . . . . . . 50
vii
viii
Mathematical Notation
()† Moore-Penrose Pseudo-inverse
()H Hermitian transpose
tr{} Trace
E{} Expectation
k k0 Count of the number of non-zero elements
k kF Frobenious Norm
k k∞ Infinity Norm
Element by element multiplication of matrices
⊗ Kronecker Product
vec() Vectorization of matrix into a column vector
ix
List of Acronyms
ASP Adaptive Subspace Pursuit
BS Base Station
CDMA Code Division Multiple Access
CIR Channel Impulse Response
CFR Channel Frequency Response
DFT Discrete Fourier Transform
FDD Frequency Division Duplexing
LSE Least Square estimators
LTE-A Long Term Evolution-Advanced
MIMO Multiple Input Multiple Output
MMSE Minimum Mean Square estimators
MP Matching Pursuit
OFDM Orthogonal Frequency Division Multiplexing
OMP Orthogonal Matching Pursuit
RIP Restricted Isometry Property
SP Subspace Pursuit
TDD Time Division Duplexing
ULA Uniform Linear Array
URA Uniform Rectangular Array
Chapter 1
Introduction
Wireless communication technology has had an interesting evolution over the past years
mainly influenced by the new internet services and applications that are developed every
day. A clear evidence of the advancement of wireless technology is the constant increase in
the number of users that enjoy the benefits of being connected and who use mobile devices
as essential tools to supply their social and business needs. Since the second generation
(2G) technology for mobile communications was launched in 1991, the focus of mobility
has clearly transformed from only-calling among some privileged users to the massive dis-
tribution of multimedia content at high speed. Driven by the new subscribers who crave
the internet services available in their smartphones and laptops, it is estimated that since
2006 there has been a 92 percent growth in mobile broadband per year.
Due to this frenetic escalation, the wireless networks that provide the data and calling
service to mobile users face the challenge of supplying more data at a faster speed to
more users. To this end, wireless network design has recently turned its focus to Multiple
User Multiple Input Multiple Output (MIMO) technology to attain reliable transmission of
1
1.1. 4G AND CURRENT MIMO CHALLENGES 2
data at higher rates [2]. MIMO technology employs multiple antennas to transmit multiple
streams of information to several users within an area (cell). The objects and the shape
of the surroundings diffract, reflect and scatter the signals transmitted in these wireless
communication systems. As a consequence, the signals travel through different paths and
different versions of the same signal arrive at the receiver at different times. This multipath
propagation provides additional degrees of freedom in the system that can benefit the rate
(spatial multiplexing gain) and the reliability (diversity gain) of the communication. As
a drawback, all the multipath signal components that arrive with different phases change
the strength of the received signal resulting in a deterioration of the communication (fast
fading). Moreover, large objects can decrease the average power of the received signal,
most commonly known as shadowing effect (slow fading).
Nonetheless, the benefits that can be obtained with MIMO require knowledge of the
radio channel in both, the receiver and transmitter sides of the communication link [3]. In
this fashion, spatial multiplexing and diversity provide higher quality and higher data rates
by taking advantage of multipath propagation and random fading in the channel [4]. One
example of this technology can be seen in the 4G cellular wireless networks launched in
2012 with the Long Term Evolution-Advanced (LTE-A) standard. LTE-A uses Orthogonal
Frequency Division Multiplexing (OFDM) in combination with MIMO technology with the
specification to use a maximum of eight antennas for reception and transmission (8x8) [5].
1.1 4G and current MIMO challenges
After years of the initial deployment of 4G networks for mobile phone users, many challenges
regarding bandwidth, energy consumption, capacity and quality of service have arisen. The
1.2. MASSIVE MIMO 3
main resource, the currently available radio frequency (RF) spectrum, which consists of the
band of frequencies ranging from hundreds of megahertz to gigahertz, has been completely
depleted; making it almost impossible for service providers to exploit more bandwidth.
Their initial objective of energy-efficient communications has not yet been fulfilled becom-
ing a concern for the environment. The actual cellular networks are unable to provide
higher mobility and they do not meet the diverse quality of service (QoS) requirements for
specific scenarios since they are reaching the theoretical limits of performance on data rate.
All these issues put additional pressure on the technologies to be employed in the next
generation of mobile systems. Not only the current difficulties must be addressed but also,
given the forecast in the demand of data, cellular networks in the following ten years should
be able to provide many more times the capacity, spectral efficiency, energy efficiency and
data rate of today’s networks [2].
A radical change in the cellular architecture design must be implemented in order to
tackle all these challenges. Recent research has shown that a large escalation in the number
of antennas used in MIMO potentially gives more degrees of freedom and better perfor-
mance in regards to data rate and link reliability. This trending technology called Massive
MIMO is expected to be used in 5G networks. The promising Massive MIMO might in-
crease the total infrastructure costs but might also supply the required improvement for
the service providers and end user’s needs [5][6].
1.2 Massive MIMO
The use of a larger number of antennas, tens or hundreds, maximizes the benefits of MIMO.
Since the capacity C increases linearly with the minimum number between transmitted nt
1.2. MASSIVE MIMO 4
and received antennas nr , more information can be exchanged given the many more degrees
of freedom that are obtained with Massive MIMO [7][8].
C = min(nt , nr ) log(1 + SN R) (1.1)
Having a large number of antennas at the base station also makes the channel responses for
each terminal almost orthogonal, improving the performance of simple detectors like maxi-
mal ratio combining (matched filtering) over more complex pre-coding detection strategies
like Zero Forcing [6]. Although the costs in the implementation of an infrastructure with
a large number of antennas might be considerable, the requirements and cost on hard-
ware are reduced because the effects of imperfections in components are averaged and, the
large number of components can operate at a lower power compared to systems with small
number of antennas [6].
Higher power efficiency is also achieved because it is possible to concentrate power into
smaller areas, meaning that, the total average power needed to serve a group of users using
multiple antennas is less than the power required to serve the same area with a single
antenna [5]. The total RF output power necessary with Massive MIMO is two orders
of magnitude lower than the one used in current systems [6]. With this power efficient
radiation the effect of uncorrelated thermal noise and fading are reduced, opening the
possibility of creating low-latency wireless links. In other technologies, fast fading affects
the strength of the received signal and in some cases when there is low SNR (fading dip),
it is necessary to wait for the channel to change before data can be transmitted again.
Massive MIMO overcomes this limitation on the latency over the air interface, leaving the
internal interference as the main consideration in the design [2] [6].
1.2. MASSIVE MIMO 5
It is important to mention that massive MIMO comes with a new set of challenges. At
first, the requirements in terms of space will condition the implementation. For instance,
the number of antennas to be used would require adjustments in the design and localization
of base stations, hundreds of antennas would require special configurations like rectangular
or cylindrical arrays. Furthermore, the number of channel responses to be processed is
also proportional to the number of antennas leading to the exponential increase of the
complexity of the optimal algorithms used to detect those signals. The baseband design
is more complex, demanding the algorithms to be optimized and simplified. Moreover,
the complexity of the hardware design and the energy consumed in the baseband circuitry
increases with all the RF chains needed. While the consumption of energy in the RF
sections decreases the opposite is expected for the baseband. On the other hand, the
models currently used to characterize channels and hardware should be updated to reflect
the challenges introduced by Massive MIMO [8].
1.2.1 Channel Estimation in Massive MIMO
Similarly to MIMO, the knowledge of the channel plays an important role in the perfor-
mance of the Massive MIMO technology. Both the transmitter and the receiver must be
able to identify the channel in order to exploit all the benefits of Massive MIMO. Since
usually there is no previous knowledge of the channel, the design of wireless communica-
tion systems must include channel estimation algorithms. The estimation techniques can
be divided into two groups: training based and blind. Training based approaches require
the transmission of training signals (pilots) that are known to the receiver. With the use
of these known pilots it is possible to extract the channel information at the receiver side.
Depending on the dynamic nature of the channel this method has the disadvantage of
1.3. THESIS ORGANIZATION AND CONTRIBUTIONS 6
consuming resources and bandwidth every time the channel information is required. Blind
estimation algorithms profit from the statistics of data-carrying signals but, they require
high complexity signal and data processing and also increase the possibility of propagating
error in the system. These systems perform the detection of the data symbols transmitted
at the same time that the channel is estimated. Because of the complexity that can be
generated in Massive MIMO, training based estimation methods are preferred. These how-
ever, have an additional downside since the training signals need to be reused. As it will
be explained later, this effect called Pilot Contamination limits the performance of Massive
MIMO systems [3][9].
1.3 Thesis organization and contributions
The purpose of this thesis is to propose an estimation method in which the effect of the
pilot contamination can be attenuated by taking advantage of the characteristics of the
channel. The main focus in each of the sections of this thesis is dedicated to the reduction
of the error in estimation of the channel in different scenarios where the pilot contamination
effect is present. Chapter 2 provides a background on the subject of training based channel
estimation and the pilot contamination problem, also reviews related works for estimation
in sparse environments. Chapter 3 proposes a model and a method for the channel esti-
mation on Massive MIMO under pilot contamination by taking advantage of the sparse
characteristics of the channel and the use of non-orthogonal pilots. The specific case of
Uniform Rectangular Arrays is incorporated in the model. The corresponding simulations
are shown in Chapter 4. Finally in Chapter 5, the conclusions and suggested directions for
future research are summarized.
Chapter 2
Background
2.1 Channel Estimation under Pilot Contamination
As it was mentioned before, all benefits of Massive MIMO rely on having knowledge of the
radio channel. The estimation of the channels in Massive MIMO is subject to limitations
due to the effect of the pilot contamination; while in traditional MIMO it was possible to
employ enough orthogonal pilots for most users in all cells, in Massive MIMO the time-
frequency resources consumed by pilots must increase with the number of antennas hence,
the pilots need to be reused in other cells introducing additional interference [6][8]. On
the other hand, the estimation of the channel is more challenging in the downlink than in
the uplink. A massive number of antennas at the base station side creates a large number
of signals arriving at the devices which have a fewer number of antennas. This makes the
pilot overhead excessively large for the user to obtain the channel information. Although
the majority of current cellular systems employ FDD systems, many publications focus
on TDD systems, where taking advantage of channel reciprocity removes the problem of
7
2.1. CHANNEL ESTIMATION UNDER PILOT CONTAMINATION 8
estimating the channel in the downlink [10].
2.1.1 System Model
A simple model of a MIMO system consists of an array of nt transmitter antennas and

nr receiver antennas, so at any time the received signal vector y of dimensions nr × 1 can
be described as the product of the transmitted signal nt × 1 vector x and all the different
multipath channels affected by additive noise [8]:
y = Hx + w (2.1)
where H is the nr × nt complex channel propagation matrix, and w is the nr × 1 vector

of received noise. The objective of the channel estimation algorithm is to identify H using
the received signal y, given that either x is a set of known training symbols (training based
algorithms) or x corresponds to data symbols whose statistics are known (blind algorithms).
Modern implementations in wireless systems commonly use training-based estimation
given that the processing complexity is lower. Although the use of the channel is more
efficient with blind methods, the use of training symbols simplifies the design of the receiver.
The frequent use of pilots increases the performance and robustness of the system. In fact,
according to [11] and [12], with a reasonable number of pilots it is possible to achieve better
channel estimation results compared to blind estimators.
To estimate the channel matrix H, the transmitter sends nt training sequences of length
T with T ≥ nt . By grouping the rows for each pilot symbol in expression (2.1) the following
received signal is obtained [3]:
Y = HX + W (2.2)
where Y = [y1 , . . . , yT ] is the nr × T matrix of received signals at each antenna during

the time the pilots of length T are transmitted, X = [x1 , . . . , xT ] is the nt × T matrix of
transmitted pilots and W = [w1 , . . . , wT ] is the nr ×T matrix of additive noise. The channel
estimation in training-based systems can be seen as the following optimization problem:
Ĥ = arg min{kY − HXk2F } (2.3)

H
Equation (2.3) can be solved using the linear Least Squares approach that minimizes the
distance from the vectors Y to HX by finding the orthogonal projection of Y into the
subspace created by X [13]:
ĤLS = Y X † (2.4)
leading to the error in estimation (2.5) [9]:
E{kĤLS − Hk2F } = σn2 nr tr{(XX H )−1 } (2.5)
where σn2 is the received noise power.

2.1.2 Pilot Contamination
In a massive MIMO system, the use of pilots represents a bigger challenge compared to
basic MIMO. While multipath and fading are present in both, MIMO and Massive MIMO,
in the latter case the use of large numbers of antennas create a higher limitation in the
design of training sequences. Due to multipath, the received signals are spread in time and
fast fading created by mobility and other factors affects the channel response at different
frequencies. For that reason, the channel can only remain approximately constant for a
limited period of time and for some consecutive subcarriers. In other words, the total
length of the pilot sequence is limited by the time in which the channel is approximately
constant. So, in order to have a set of orthogonal pilots to estimate the channel properly,
the pilot length is limited by the coherence time divided by the channel delay spread [6].
...
2-cell ...
K ... nt
nt
... K
...
1-cell i-th cell ...
j-th cell
...
nt
L-cell K
Base Station User
...
...
Figure 2.1: Multicell environment
Besides, given the limited range of frequencies used in wireless systems, the same band
of frequencies is reused in several cells. This implies that the pilots need to be reused in
other cells in the system. As a consequence, the channel estimate for a particular device can
be affected by other devices in different cells that share the same frequencies and the same
pilot sequence. The negative effect of the reuse of these pilots is called pilot contamination.
The interference generated in this case was not a specific problem in MIMO, since the pilot
contamination is masked by the noise and the multiuser interference when having a small
number of antennas [14]. In spite of this, the effect increases with the number of antennas
making the pilot contamination a larger source of interference when the number of antennas
is high. This effect is called the massive MIMO effect in [14]. Recent studies agree that
the effect of pilot contamination is a considerable source of interference in massive MIMO
that limits the performance of the system [15][16].
In a multicell massive MIMO system like the one depicted in Figure 2.1, with L cells,
in the i-th one consisting of K single antenna users together transmitting the signal matrix
Xi of dimensions K × T and one base station with a massive number of antennas nr 1,
the received signal at the j-th cell can be expressed as:
Yj = ΣLi=1 βi Hi Xi + Wj (2.6)
where βi is a scalar representing the path-loss and shadow fading coefficient also called
cross gain between the cell of interest j and the i-th cell, and Hi is the nt × K complex
channel propagation matrix.
Without loss of generality and assuming that all cells have the same number of K users,
the L cells share the same band of frequencies and the users in each cell use the same set
of orthogonal pilots, Xi = Xj for all i 6= j. The fading coefficient βj is normalized to 1 in
the j-th cell. By reorganizing the expression (2.6), it can be rewritten to show the signal
of interest in the following form:
Yj = Hj Xj + ΣLi=1 βi Hi Xj + Wj (2.7)
i6=j
A second term appears in the model of equation (2.7) when compared to the one in (2.2),
which is the consideration of the signals that are received in the j-th cell and that come
from the contiguous cells i, i 6= j. The additional term appears in this multi-cell scenario
due to the assumption that users share the same frequencies and the same pilots in different
cells. In order to estimate the channel in this case, the previous LS solution can be applied
from (2.3) resulting in [16]:
ĤjLS = Yj Xj† (2.8)
ĤjLS = Hj + ΣLi=1 βi Hi +Wj Xj† (2.9)

| i6={z
j
}
pilot contamination
In this last expression (2.9), the effect of the pilot contamination can be observed in the
second term. Since the noise Wj is uncorrelated with the propagation matrix Hi and the
cross gains βi , the noise effect of the third term will be smaller than the second term as nt
becomes larger [14]. The following error in estimation is then deducted [16]:
1 L
E{kĤLS − Hk2F } = Σ βi + σn2 nr tr{(XX H )−1 } (2.10)
T i=1
i6=j
From equation (2.10), it can be inferred that the effect of pilot contamination depends on
the cross gains between cells. In fact, as the values of the cross gains become similar to
2.2. SPARSITY 13
those of the direct gains within the cell the pilot contamination is more significant [16].
Some approaches have been suggested in order to mitigate the pilot contamination
in massive MIMO, i.e., employing a smaller reuse factor and coordinating the use of the
pilots in the network or, using blind estimation techniques and new precoding techniques
designed specifically for the reduction of pilot contamination [6]. In [16], the effect of pilot
contamination is studied in Least Square estimators (LSE) and Minimum Mean Square
estimators (MMSE). While MMSE are more robust than LSE, the previous knowledge of
second order statistics of the channel requiered by the MMSE can represent a limitation
for these estimators.
Other approaches include algorithms to design pilots like in [17], where the concept of
Kalman filtering is applied in order to find optimal pilots that subsequently improve the
channel estimation technique. [18] combines the estimation process of the downlink and the
uplink using feedback, whereas, [19] applies scheduling and time shifting of pilots among
cells. However, the latter is limited by the number of users since the larger the number
of users the higher the number of antennas is needed to appreciate the improvements in
this method. Also the use of specific precoding and spatial beam-forming to reduce pilot
contamination is explored in [20] and [21]. While these studies are promising, a practical
and optimal solution still needs to be developed.
2.2 Sparsity
In several studies mentioned in [3], it has been identified that scenarios with a large number
of antennas or large bandwidth show sparse multipath structures linked to similar scatter-
ers. For that reason, performing training based estimation with traditional methods like
2.2. SPARSITY 14
LSE might be a waste compared to compressed channel sensing methods which achieve
the same level of error taking advantage of sparsity and using fewer resources (bandwidth,
energy and latency) [3].
2.2.1 Sparsity in Massive MIMO
Studies and experimental results in Massive MIMO show that certain channel representa-
tions are sparse. A large number of antennas increases the degrees of freedom in Massive
MIMO systems nevertheless, in scattering environments, some of the channel coefficients
have very small magnitudes, below noise level [3]. In other words, a considerable number
of the coefficients that describe the multipath behaviour of the channels in Massive MIMO
have a magnitude equal to zero or are small enough to be neglected. Recent works have
shown that channels are approximately sparse in the frequency domain and it has been
observed that some transformations in other domains also present sparsity [22].
In a general model it is possible to express the channel as:
H = AHs B (2.11)
where A and B correspond to unitary matrices that perform a linear transformation of the
matrix H with the objective of creating the sparse matrix Hs . By exploiting the sparse
structure of Hs , it is possible to estimate H using compressive sensing.
Several works coincide in the use of transformation of the channel H to exploit its spar-
sity properties either in time domain [23][24], in the frequency domain where the channel
vectors obtained after the transformation show sparse characteristics in different frequencies
[25] or, in the virtual angular domain where the channel vectors obtained after the trans-
2.2. SPARSITY 15
formation also are approximately sparse in different directions [26]. It has been observed
as well that in some cases channels and channel vectors obtained after the transformation
share a similar support (sparsity patterns). Even if the support may be similar, the fading
can be different in each multipath channel coefficient [23]; antenna elements in the same
area will share the same scatterers and will receive similar echoes of the original signals but
with different magnitudes.
Time domain
The multipath characteristics of a wireless channel can make its representation in time
sparse. In fact, the majority of wireless channels can be represented as discrete multipath
channels with large delay spread [23]. The sparsity in time comes as a consequence of
the limited number of scatterers that are distributed in space. The scatterers produce a
restricted number of significant paths through which the transmitted signals can travel to
the receiver. The sparse vector hs ∈ CT ×1 can model the channel impulse response (CIR)
in the time domain between a single antenna user and one antenna at the base station side.
The equivalent representation in frequency-domain can be obtained by taking the Fourier
transform, i.e.,
h = AF hs (2.12)
where, AF is de T × T unitary Discrete Fourier Transform (DFT) matrix whose (k, l)-th
element is given by:
1 2π
ak,l = √ e−j T kl (2.13)
T
2.2. SPARSITY 16
By modifying the representation in frequency domain proposed in [23] and [24], we obtain
y = Xh + w (2.14)
where y ∈ Cnc ×1 is the received signal in nc sub-carriers in the antenna, X ∈ Cnc ×nc is a
diagonal matrix containing the training pilots used and h ∈ Cnc ×1 is the frequency domain
representation of the CIR. The sparsity observed in wideband systems can be then obtained
with the use of the inverse DFT on the CIR h:
hs = AH
Fh (2.15)
In this fashion, hs represents the sparse transformation in time domain of the CIR h.
y = XAF hs + w (2.16)
As a result, by using the DFT it is possible to add to the model the discrete-time channel
vector hs ∈ CT ×1 that presents the sparse characteristics of the channel.
Frequency domain
With the use of a large number of antennas it is possible to observe an approximately

sparse behaviour in the frequency representation of channels. In fact, the channel frequency
response (CFR) can be assumed to be sparse. As mentioned in the previous section the CIR
is sparse due to the limited number of scatterers and the transformation in frequency will
approximately keep the sparse properties in some basis [27]. For instance, in a multi-carrier
2.2. SPARSITY 17
transmission system, specifically OFDM, like the following:
y = Hx + w (2.17)
where, x is the vector of length T containing the OFDM symbols transmitted, H is the
channel mixing matrix of dimensions nc × T , y is the vector of received signals of length
nc (sub-carriers) and w the additive noise vector of length nc , most of the elements of
the matrix H have a small magnitude and most of the energy is concentrated on the main
diagonal and decreases as the elements are farther from the main diagonal [27]. It is possible
to write the sparse representation of H from expression (2.17) as:
y = (xT ⊗ IT )vec(H) + w (2.18)
In this last equation IT is the identity matrix of dimensions T × T and vec(H) is the
vectorized version of channel H.
Angular domain
In a similar way, assuming there is a limited number of scatterers, the channel H from
(2.2) can be obtained grouping all vectors oriented to each scatterer in a finite number
of directions. In the specific case of uniform linear arrays (ULA), the directions of these
vectors in the spatial signal space can be represented in a single dimension, a virtual angle
that depends on the spacing and number of the antennas in the array. Figure 2.2 shows
a representation of the virtual angle on the BS for ULAs in which the signals arrive from
fixed directions in the angular domain. These directions can be expressed as the following
2.2. SPARSITY 18
inverse Fourier decomposition of the channel [28]:
H = AHsw B T (2.19)
where Hsw contains the complex gain of each path among the directions of transmission and
reception, A ∈ Cnr ×nr and B ∈ Cnt ×nt are unitary domain angular transformation matrices
corresponding to the transformation at the receiver and the transmitter side. The matrices
A and B are constructed by stacking correspondingly the vectors a and b representing each
virtual angle direction:
1
a(ϕp ) = √ [1 e−jϕp . . . e−j(nr −1)ϕp ]T (2.20)
nr
2πp
ϕp = , 0 ≤ p ≤ nr − 1 (2.21)
nr
1
b(ϕq ) = √ [1 e−jϕq . . . e−j(nt −1)ϕq ]T (2.22)
nt
2πq
ϕq = , 0 ≤ q ≤ nt − 1 (2.23)
nt
A = [ a(ϕ0 ) a(ϕ1 ) . . . a(ϕnr −1 ) ] (2.24)
B = [ b(ϕ0 ) b(ϕ1 ) . . . b(ϕnt −1 ) ] (2.25)
Experimentation shows that the vector rows in the angular domain channel matrix Hsw are
k-sparse vectors (0 < k nt ), i.e., some of the directions of propagation of the signals will
be very small or close to zero [29].
2.2. SPARSITY 19
...
...
Scatterer Base Station User
Figure 2.2: Angular component ULA
Mixed representations
The work in [3] shows a mixed model which combines angular domain transformation
and DFT. This model is used in the case of frequency-selective channels to extract a
representation of the channel Hsw with sparse rows.
p
Y = ξ/nt Hsw XF + W (2.26)
where Y ∈ Cnr ×T has been pre-multiplied by the domain angular transformation matrix
A ∈ Cnr ×nr , XF ∈ Cnt ×T corresponds to the DFT of B H X and W ∈ Cnr ×T is the noise.
2.2.2 Compressive Sensing
Compressive sensing relies on the fact that it is possible to reconstruct a signal z with
sparse properties from a linear measurement process like the one depicted in expression
(2.28) through convex programming, even when the number of measurement vectors is less
2.2. SPARSITY 20
than the length of z [30]. The sparse signal z ∈ Cn×1 can be represented as a k-sparse
vector, meaning that it contains k non-zero elements in unknown positions and n − k
remaining elements that are zero (kzk`0 ≤ k n). The support of z (supp(z)) is the set
of indices of z that correspond to the elements whose values are not zero. The size of this
set is less or equal than k:
supp(z) = {i : zi 6= 0} (2.27)
A linear measurement process that performs m < n inner products between the vector
z and the measurement vectors {Φi }m
i=1 that compose the rows of the matrix Φ can be
expressed as:
y = Φz + w (2.28)
where Φ is a m × n full rank matrix, y ∈ Cm×1 is the vector that collects the observations of
the measurement, Φ is the measurement process and w ∈ Cm×1 is a noise vector. The model
in expression (2.28) represents a problem that cannot be solved using conventional algebra
in order to find z since the system is under-determined (m < n). However, if the matrix Φ
is chosen in such way that the dimension reduction from z ∈ Cn×1 to y ∈ Cm×1 does not
alter the information of the k-sparse signal, it is possible to estimate z in expression (2.28)
by using `0 or `1 optimization strategies with a unique sparse solution [30].
The first case, `0 convex optimization as described in the next expression can be used
2.2. SPARSITY 21
to recover z [31]:
zb = arg min kzk`0 subject to ky − Φzk`2 ≤ (2.29)
where depends on the variance of the noise vector w and the `0 norm counts the non-zero
elements of z. The expression (2.29) is a hard combinatorial problem and requires the
extensive search of the solution though all possible k-sparse combinations of length n of
the vector z.
The second case, `1 convex optimization (2.30) can also recover the k-sparse signal with
high probability through linear programming.
zb = arg min kzk`1 subject to ky − Φzk`2 ≤ (2.30)
In a broader approach, a variable x ∈ Cn may not be considered sparse in a standard basis

but only through a transformation into a different basis [30]. In this latter case z in (2.28)
can be replaced by z = Ψx, where Ψ is a n × n matrix that sparsifies x, corresponding to
the general model below:
y = ΦΨx + w (2.31)
The last consideration in order to recover the x is to design a measurement matrix Φ that
allows the reconstruction of the vector x with length n from m measurements. A necessary
and sufficient condition to solve this problem is that the matrix Φ satisfies the following
2.2. SPARSITY 22
inequality:
(1 − δs )kΨxk22 ≤ kΦΨxk22 ≤ (1 + δs )kΨxk22 (2.32)
for all s-sparse vectors1 Ψx with the parameter δs ∈ (0, 1) for each integer s = 1, 2, .... This
particular property, called restricted isometry property (RIP), guarantees the recovery with
high probability of the k-sparse vector Ψx even in the presence of noise as it is proven in
[31]. The direct construction of matrix Φ demands the verification of expression (2.32)
for each one of the possible combinations of length n and sparse level k of the vector Ψx.
Nevertheless, random matrices achieve with high probability the RIP property [30].
2.2.3 Matching Pursuit Algorithms
The reconstruction of the sparse vector Ψx can be achieved using optimization techniques
either by finding a sparse vector that matches the m projections or using linear program-
ming. Nevertheless, the complexity of these `0 and `1 optimization methods is too high
to be implemented. Another set of methods, Matching Pursuit (MP) Algorithms, based
on greedy techniques can recover the sparse vector Ψx in a more efficient manner with an
accuracy which can be of the same order of the linear programming optimization methods
[32]. Greedy techniques obtain the solution of a problem through a progression of steps
and finding the best optimal local or immediate solution, in many cases this strategy can
achieve the global optimal solution [33]. Similarly, the basic idea behind the MP algorithms
relies on finding the support of the sparse vectors in a sequential manner.
The Subspace Pursuit in Algorithm 1 [32] presents a difference in the way the support
1
A vector is said to be s-sparse if it has at most s nonzero entries [31]
2.2. SPARSITY 23
set is obtained compared to other Orthogonal Matching Pursuit (OMP) methods, like
OMP [34], regularized OMP [35] and Stagewise OMP [36]. While OMP algorithms select
indices that represent a good partial support estimate and progressively add the remaining
support estimates keeping the initial ones, the SP algorithm improves the support set in
each iteration. The reconstruction done by OMP can be unreliable if at any of the iterations
an index that is not part of the global optimal solution is added to the support set. Since
the index is conserved the remaining steps do not achieve the global optimal solution.
The SP method overcomes the OMP limitation since the indices that are previously
selected at any iteration can be removed after if they are not reliable. The SP algorithm
selects initially the k columns of the measurement matrix Φ that best correlate with the
observation vector y, the residual is calculated by removing the orthogonal projection of
the subspace created by the selected columns of Φ from y. Then, at each iteration the k
columns of the measurement matrix Φ that best correlate with the residual are added to
the support set, the support set is then refined by selecting the index k that corresponds to
the largest magnitude elements of the estimated x and the residual is updated for the new
support set. The SP algorithm then stops when the residual cannot be reduced further
[32].
The SP algorithm outperforms the mentioned OMP, regularized OMP and Stagewise
OMP algorithms in terms of complexity and efficiency, however, for all these algorithms to
perform properly the knowledge of the level of sparsity k is essential. This proves to be a
challenge since this value is usually unknown. Basic Matching Pursuit as introduced in [37]
finds the sparse representation of the vector Ψx with a series of sequential approximations.
With an initial residual error equal to the observations vector y, at each iteration the algo-
rithm selects the column of the measurement matrix Φ that has the largest correlation with
2.2. SPARSITY 24
Algorithm 1 Subspace Pursuit Algorithm

Input: Sparsity level k, measurement matrix Φ, observation vector y
Output: Estimated signal x b
1: T0 = {k indices corresponding to the largest magnitude entries in vector Φ∗ y}

2: r0 = (I − ΦT0 Φ†T0 )y . Residual
3: The matrix ΦT0 consists of the columns of Φ with indices T0
4: l=0
5: repeat
6: l =l+1
7: Tl = Tl−1 ∪ {k indices corresponding to the largest magnitude elements of Φ∗ rl−1 }
8: xp = Φ†Tl y
9: The matrix ΦTl consists of the columns of Φ with indices Tl
10: Tl = {k indices corresponding to the largest magnitude elements of xp }
11: rl = (I − ΦTl Φ†Tl )y . Residual update
12: until k rl k2 >k rl−1 k2
13: Tl = Tl−1
14: x
b{1,...,N }−Tl = 0
15: bTl = Φ†Tl y
x
the residual error and then, the residual error is updated by subtracting the correlation
between the selected column and the actual residual from it. The search stops when k
different columns of the matrix Φ are chosen. The k indices that correspond to the selected
columns conform the support set. The value of Ψx is calculated by projecting the pseudo
inverse of the selected columns of matrix Φ into y. This basic method exhibits slow con-
vergence, however, the Orthogonal Matching Pursuit overcomes this difficulty by updating
the residual in a different way. Removing the orthogonal projection of the subspace created
by the selected columns of Φ from the previous residual assures that the same column will
not be selected twice, increasing the convergence speed of the algorithm [34].
2.3. MATCHING PURSUIT WITH UNKNOWN SPARSITY LEVEL 25
2.3 Matching Pursuit with unknown sparsity level
In previous sections the sparse characteristics of the channels in Massive MIMO are men-
tioned however, the level of sparsity is usually unknown a priori. While the MP algorithms
like SP and OMP are very effective in the estimation of sparse signals, their effectiveness
relies on the knowledge of the sparsity level k. In a more realistic approach, certain mod-
ifications have been performed on the matching pursuit algorithms in order to surmount
this impasse.
2.3.1 Channel estimation with unknown sparsity level
An iterative method is proposed in [23] where the Support Agnostic Bayesian Matching
Pursuit Algorithm to estimate the channel is fed with an initial value of sparsity given by:
k ΦT y k`∞
k = |{j ∈ N : |ΦTj y| ≥ }| (2.33)
2
where Φi is the i column of the measurement matrix and y, the vector of observations.
In expression (2.33), the level of sparsity k corresponds to the number of columns in the
measurement matrix whose correlation with y is larger or equal than half of the maximum
of all the correlations. This initial value of k does not need to be accurate as the algorithm
performs estimation of the channel and then, based on it the value of k is updated in order
to reduce the error in estimation according to the a-posteriori probabilities of the estimated
support set found after the initial estimation. The procedure is repeated until the value of
k converges.
The work presented in [38] identifies the sparsity level based on past channel estimates.
Using initially an iterative channel estimation method based on zero forcing, the level of
sparsity is obtained after comparing each of the elements of the estimated channel vector
h with the following threshold:
b
ασ
hj | ≥ √
|b (2.34)
m
where b
hj corresponds to the j element of b
h, m is number of observations in model (2.31), σ
is the noise variance and α is a factor that multiplies de noise variance. With α = 3 there
is a probability that 99.73% of the noise is below the threshold. With a small additional
compensation of the sparsity value based on the probability of successful detection of each
channel element hj , the sparsity level k is then provided to a MP algorithm to estimate the
current channel.
Other approaches like the ones shown in [39] and [40] use varying-step line search
approaches to find the value of k by modifying MP algorithms. While all these works show
promising results, the one that seems to be more appropriate to the current thesis will be
discussed in the next section.
2.3.2 Adaptive Matching Pursuit
The results shown in [41] provide a possible solution to the case of Matching Pursuit with no
knowledge of the sparsity level. Using a similar approach to Subspace Pursuit, it improves
the support set in each iteration but instead of selecting a specific sparsity level k, the
sparsity level is also estimated.
Through the work in [36], after evaluating the model from (2.31) in the presence of no
noise and when m and n are large, it was observed that the elements in the correlation
(ΦT y) are composed of two components; the combination of the signal Ψx with the addition
of noise. It is proven that the noise has the following standard deviation:
k Ψx k2
σΨx = √ (2.35)
m
This result is true for matrices Φ that are taken from an uniform spherical ensemble, or
in other words, matrices whose columns represent independent and identically distributed
points from a unit sphere in an n-dimensional space. However, the result also applies
√
for matrices whose elements correspond to ±1/ n signs chosen randomly (random signs
ensemble) and, for matrices that are constructed from the rows of Hadamard matrices
(partial Hadamard ensemble) [36].
So, by comparing the magnitude of the correlation (ΦT y) against a threshold propor-
tional to the formal noise level of the signal Ψx, it is possible to separate the largest com-
ponents of the vector Ψx. Furthermore, since Ψx is not known in practice and Φ has the
kyk2
restrictive isometry property (2.32), the value of σΨx can be approximated to σΨx ≈ √ .
m
Hence, the support set representing the largest components of vector Ψx can be extracted
with the following expression:
k y k2
ω = {j :| ΦTj y |> τ √ }, (2.36)
m
where τ is the threshold parameter and y, Ψx are taken from the model (2.31) with the
absence of noise. Based on this result, the level of sparsity can be estimated. Experimental
results have shown that the range 2.5 ≤ τ ≤ 3 enables to recover with high accuracy
the support set. The algorithm 2 [41] consists of two nested loops. The external loop
estimates the value of the level of sparsity k which is calculated in every iteration according
to equation (2.36) using the residual. After that, the internal loop adds the appropriate
columns’ indices to the support set based on the same methodology of SP until the minimal
residual is found. The procedure is repeated until there is no significant variation in the
supports found.
Algorithm 2 Adaptive Sparse Matching Pursuit Algorithm

Input: Measurement matrix Φ, observations y
Output: Estimated signal x
b
1: r=y . Residual
2: x
b=0
3: while halting condition is false do
4: v = ΦT r √ . ”Signal Proxy”
5: ω = {j : |v(j)| > τ k r k2 / m} . Thresholding
6: l=1 . Inner counter
7: r0 = r, x0 = x
b . Message-passing
8: K =k ω ∪ supp(b x) k0 . Sparsity estimation
9: repeat
10: u = ΦT rl−1
11: γ = supp(u) . Identify support set
12: S = γ ∪ supp(b xl−1 ) . Merge support sets
13: b = arg minx0 :supp(x0 )=S k Φx0 − y k2 . LS estimation
14: xl = b
15: rl = y − Φxl . Residual update
16: If k rl k2 ≥k rl−1 k2 , then
r = rl−1
xb= xl−1
go to step 1;
17: l =l+1
18: until Maximum iterations
19: end while
20: Return x b
Chapter 3
Sparse channel estimation using

quasi-orthogonal pilots
In this chapter a way to estimate a Massive MIMO channel in the presence of pilot con-
tamination is proposed. The multicell model has been modified to include the concepts
of quasi-orthogonal pilots and angular space sparsity and will be presented in the subse-
quent sections. The basic system model will be described as a starting point and then the
modifications to the model will be explained to finish with the proposed algorithm.
3.1 Quasi-Orthogonal Pilots
Having orthogonal pilots for all users in a cellular network is not a feasible option in
Massive MIMO for two main reasons: first, the length of the pilot sequences is limited by
the time in which the channel is approximately stable and second, as the number of users
and antennas increases, the pilot length increases as well; hence, the resources consumed
29
3.1. QUASI-ORTHOGONAL PILOTS 30
to transmit and process the pilots for all combinations of users and antennas would make
the communication very inefficient. A similar problem has been investigated in the past,
regarding the generation of spreading codes for CDMA. CDMA systems required to create
codes for many users to access simultaneously a shared frequency and time communication
channel. Sometimes these systems could be overloaded, namely, when the number of users
K is greater than the length of the code sequences T . For that reason the spreading codes
could not be orthogonal due to the limitation of resources. However, in order to maintain
an acceptable reliable data rate for all the users, it was desired that these codes had low
cross correlation.
K,T
max{K,T}+1
N=4
4
H0ßHadamard of Size N, DßN x min{K,T} submatrix of H0
max{K,T}=?
N-1 N N+1 N+2

min{K,T}=?
even odd
Dß(N-1) x min{K,T} D D D
submatrix of D Dß Dß vT2 vT2 Dß vT2 1 vT2
vT1
vT2 -v2T vT2 1 -v2T
1
Sß D
ÖL
K>L
min{K,T}
v1Î{±1}
true false min{K,T}
T
SßS v2Î{±1} 2
Figure 3.1: Design procedure for optimal pilots set [1]

Welch studied the lower bounds in the cross correlation of signals and found the lower
bound for the Total Square Correlation (TSC) of signature sets also known as “Welch
bound” [42]. Having a set of signatures S = {s1 , s2 , . . . , sK }, si ∈ CT , the TSC is defined
as the sum of the squares of all inner products between signatures and is given by:
T SC(S) ≡ ΣK K H
i=1 Σj=1 |si sj |
2
(3.1)
where K corresponds to the number of total signatures and T to their length. The lower
bound of expression (3.1) is given by K 2 /T as proved by Welch [42]. When the set of
signatures achieves this bound, the interference created among the users employing the
signatures is the minimum possible. In this fashion the autocorrelation of the signatures is
maximized and the cross correlation minimized. This is an important characteristic for the
code design problem where orthogonality among the codes is not possible [42]. The Welch
bound can always be attained when using complex/real signatures. However it is not the
case for binary antipodal signatures or spreading codes in CDMA.
In [1], an algorithm to create binary signature sets that meet the Welch Bound is used
for the design of spreading codes in CDMA. Assuming a set of signatures, the design
of the sequences is based in the calculation of the cross-correlation properties between
the signatures of the set. The procedure utilizes the rows of a Hadamard matrix of size
N ≥ 4b(min{K, T } + 1)/4c to construct the K signatures. Hadamard matrices are square
matrices with binary elements (±1) whose rows are orthogonal. The size of the Hadamard
matrix is extended if necessary, by adding new rows that are not orthogonal but have TSC
equal to the Welch bound with respect to the other rows. The use of this technique provides
an optimal set of signatures for overloaded CDMA systems [1].
3.2. BASIC SYSTEM MODEL 32
This CDMA scenario is similar to the issue presented in this thesis about cellular systems
with pilot contamination. A massive MIMO system can be modelled as an overloaded
CDMA system; the length of the training sequences is limited and it is desired to maintain
this length as short as possible to keep the estimation process efficient. In effect, since
the length of the sequences T is smaller than the total number of users in Massive MIMO,
it is impossible to provide orthogonal pilots to all users. The design of non-orthogonal
sequences that reach the Welch bound in Massive MIMO gives an optimal solution to the
pilot contamination problem since the procedure reduces the interference among sequences
by generating sequences with low TSC.
Figure 3.1 [1] shows the algorithm used to produce the quasi-orthogonal binary sig-
natures. This same algorithm can be applied in the current problem of designing quasi-
orthogonal pilots for Massive MIMO using K users and T symbols.
3.2 Basic System Model
The following multi-cell model based on Figure 2.1 includes the effect of the pilot con-
tamination caused by K users in each of the L adjacent cells and the additive noise of the
system. The cross-gain between cells is expressed by the added parameter β that represents
the large-scale fading between each user and the base station of interest. The gain of the
transmitter is also included in g. This gain is the inverse of the mean value of the cross-gain
βj in the main cell:
Yj = gHj βj Xj + gΣLi=1 Hi βi Xi + sWj (3.2)

i6=j
where:
Yj : nr ×T size matrix, signal received at the base station j with nr receiver antennas during
T symbols.
Hi : nr × K size matrix, complex circular Gaussian channel between nr receiver antennas
and K single antenna users. Elements himk have unit variance and zero mean.
βi : K × K diagonal matrix with βik elements, large scale fading (log normal gain) for each
user depending on the distance from the base station.
zik
βik = γ (3.3)
rik
zik : log normal random variable, represents the large-scale fading for user k in cell i.
γ
rik : uniform distributed random variable, represents the distance between user k and base
station i.
γ: decay exponent.
Xi : K × T size matrix, set of training sequences (pilots) for K single antenna users with
duration T symbols. Each symbol has energy equal to xikt = ± √1T .
Wj : nr × T size matrix, additive Gaussian noise at the base station j for the nr receiver
antennas during T symbols. Elements wjmt have unit variance and zero mean.
g: gain at the transmitter, in order to maintain a normalized value. Without any loss of
generality it is chosen to be the inverse of the mean value of one of the elements of the
diagonal from the βi matrix.
s: Constant to escalate the noise according to the signal to noise ratio, s = √ 1 .
SN R
The effect of pilot contamination as mentioned in section (2.1.2) is caused by the use
of the same pilots in the adjacent cells, that is Xi = Xj for all i 6= j. The result for the
actual model (3.2) is presented in the following expression for the Least Squares estimator
of Hj βj :
ĤjLS β̂jLS = gHj βj + gΣLi=1 Hi βi +sWj Xj† (3.4)

| i6={z
j
}
pilot contamination
However, the training matrix Xi can be used with a different arrangement of quasi-orthogonal
pilots and a different reuse factor. With a reuse factor f − 1 < L there will be f − 1 cells
using the same Xj pilots of the cell of interest while the remaining cells use a different
Xi set of pilots. With this modification and assuming that the i-th cells use the same Xj
pilots, 1 6 i 6 f − 1, the received signal at the base station in the cell of interest can be
written as:
−1
Yj = gHj βj Xj + gΣfi=1 Hi βi Xj + gΣLi=f Hi βi Xi +sWj Xj† (3.5)
i6=j
| {z } | i6=j{z }
cells reusing pilots different set of pilots
After changing the reuse factor of the pilots from L cells to f − 1 cells, there will be a
smaller number of users interfering as a consequence of the pilot contamination. In this
case the expression for the Least Squares estimator of Hj βj is:
−1
ĤjLS β̂jLS = gHj βj + gΣfi=1 Hi βi + gΣLi=f Hi βi Xi Xj† +sWj Xj† (3.6)
| i6={z
j
} | i6=j {z }
pilot contamination Quasi-orthogonal effect
It can be seen from the last two expressions that with the distribution of quasi-orthogonal
pilots the effect of the pilot contamination is reduced. Thus, the effect of the quasi-
orthogonal pilots in the adjacent cells appears and becomes completely dependent of the
3.3. MODIFIED SYSTEM MODEL 35
product Xi Xj† which will be a function of the distance among the pilot sequences (Welch
Bound).
3.3 Modified System Model
The model in (3.2) can be further modified to exploit the sparse characteristics of the
channel after angular transformation. In the previous chapter with the use of ULA antennas
the characteristics of sparsity in the angular domain were studied. However, with the
nature of Massive MIMO the implementation of this type of arrays is challenging because
of the physical space required to place nr antennas in a single axis. The model can also be
expanded for other types of arrays like the uniform rectangular array (URA). URA antennas
need a more compact allocation of space for their implementation since the antennas are
arranged in two dimensions, rendering it more practical with Massive MIMO since the nr
antennas can be placed in an area with smaller dimensions than the length required by
ULA.
The change in URA to two physical dimensions compared to ULA implies that the
virtual angles can be represented in two dimensions as shown Figure 3.2. In the same way
to a two-dimensional Fourier transform, the p × q antenna rectangular array components
can be transformed into an elevation θ and an azimuthal φ component. So, given a channel
H ∈ Cnr ×nt whose receiver antennas correspond to a rectangular array nr = p × q, each of
the columns hi can be manipulated to obtain a 2D angular transformation for the receiving
side in the following way:
w ∗
hi = AH
θ hi Aφ (3.7)
w
where hi is a p × q matrix created from the vector hi , hi is the p × q angular tranformation
matrix and, Aθ ∈ Cp×p and Aφ ∈ Cq×q are similar to a two-dimensional Fourier transform
conformed by stacking correspondingly the vectors aθ and aφ :
1
aθ (ϕr ) = √ [1 e−jϕr . . . e−j(p−1)ϕr ]T (3.8)
p
2πr
ϕr = , 0≤r ≤p−1 (3.9)
nr
1
aφ (ϕs ) = √ [1 e−jϕs . . . e−j(q−1)ϕs ]T (3.10)
q
2πs
ϕs = , 0≤s≤q−1 (3.11)
nt
Aθ = [ aθ (ϕ0 ) aθ (ϕ1 ) . . . aθ (ϕp−1 ) ] (3.12)
Aφ = [ aφ (ϕ0 ) aφ (ϕ1 ) . . . aφ (ϕq−1 ) ] (3.13)
The vectorization of expression (3.7) conducts to the original form of the vector hi =
...
.........
f q
...
Base Station User
Figure 3.2: Angular components URA

vec(hi ):
w
vec(hi ) = (AH H
θ ⊗ Aφ )vec(hi ) (3.14)
w
Joining the column vectors hi and hw
i = vec(hi ) to reconstruct the channel matrices H and
H w respectively, the total channel transformation can be written in the following form:
H = (AH H
θ ⊗ Aφ )H
w
(3.15)
Likewise,
H w = (Aθ ⊗ Aφ )H (3.16)
In which H w ∈ Cnt ×nr is the two dimensional angular transformation of H on the receiver
side when the receiver antenna is distributed as a uniform rectangular array. By using
(3.15), the channel Hj can be replaced by (AH H w
θ ⊗ Aφ )Hj , the previously mentioned trans-
formation and its two dimensional angular representation. The model can be rewritten as
follows:
Yj = g(AH H w L
θ ⊗ Aφ )Hj βj Xj + gΣi=1 Hi βi Xi + sWj (3.17)
i6=j
Performing the inverse angular transformation and the vectorization of the expression
(3.17), the total model can be converted into a sparse problem of the form shown in (2.28):
vec{(Aθ ⊗ Aφ )Yj } = (XjT ⊗ I) vec{gHjw βj } + vec{(gΣLi=1 Hiw βi Xi + s(Aθ ⊗ Aφ )Wj )}

| {z } | {z } | {z } i6=j
observation measurement sparse vector
| {z }
noise term
(3.18)
where Hiw is the two dimensional angular transformation of the channel matrix and I ∈
Cnr ×nr is the identity matrix. The solution to the optimization problem proposed in (2.29)
b jw . Then, the desired
can be applied here to find the estimation of the sparse channel H
channel can be obtained by using again the two dimensional angular channel transforma-
tion,
b j = (AH ⊗ AH )H
H bw (3.19)
θ φ j
The measurement matrix (XjT ⊗ I) must meet the restrictive isometry property in order to
obtain a solution to the system described in (3.18) by using compressive sensing. Since the
restrictive isometry property is conserved after performing the Kronecker product with the
identity matrix I [43], in order to satisfy the condition it is only necessary to verify Xj . The
set of training pilots Xj is generated either by random orthogonal pilots or Hadamard quasi-
orthogonal sequences. These two constructions satisfy the requirements of measurement
matrices for compressive sensing according to [44].
3.4. PROPOSED EXTENDED ALGORITHM 39
3.4 Proposed Extended Algorithm
The estimation of the channel Hj in the model (3.2) can be performed after the transfor-
mation shown in (3.18) which produces Hjw , a sparse representation of the channel.
Algorithm 3 Channel Estimation for Massive MIMO

Input: Pilot sequence Xj , Received signal Yj
Output: Estimated Channel H bj
1: r0 = vec {(Aθ ⊗ Aφ )Yj } . Residual

2: l = 1, f = 0
3: Hb jw = ∅ . Estimated 2D angular channel
4: while Kf 6= Kf −1 do
5: v = (XjT ⊗ I)T rl−1
√
6: ω = j : |v(j)| > τ k rl−1 k2 / nr T . Channel tabs larger than Threshold
7: Kf =k ω ∪ supp(H b jw ) k0 . Sparsity level estimation
8: repeat
9: l =l+1
10: v = (XjT ⊗ I)T rl
√
11: ω = j : |v(j)| > τ k rl k2 / nr T
12: S = ω ∪ supp(H b jw ) . Support update
13: Hb w = (X T ⊗ I)† vec {(Aθ ⊗ Aφ )Yj }
l j S
14: rl = vec {(Aθ ⊗ Aφ )Yj } − (XjT ⊗ I)H bw
l . Residual update
15: until k rl k2 >k rl−1 k2
16: r0 = rl−1
17: Hb 0w = H bw
l−1
18: l = 1, f = f + 1
19: end while
20: Hb j = (AH ⊗ AH )H bw . 2D Angular transformation
θ φ 0
With the use of MP the solution can be computed. However, the exact support of Hjw
is unknown. The proposed algorithm uses the Adaptive MP, mentioned in the previous
chapter, where the sparsity level k is also estimated. The procedure is summarized in
Algorithm 3. It consists of two loops; external one maintains the iterations and stops when
3.4. PROPOSED EXTENDED ALGORITHM 40
there is no change in the estimated level of sparsity k and internal one that refines the
support set until the minimum residual is found. Both of the loops estimate the support
based on expression (2.36). The algorithm uses as inputs the pilot sequence Xj and the
received signal Yj . Yj is transformed according to expression (3.15) in order to work in the
angular domain and Xj is modified to use its vectorized version. The final output H
b j is
obtained after the estimation of the 2D angular domain channel by converting it back from
the angular domain into a matrix.
Chapter 4
Simulation results
Initially in this chapter the performance of a traditional LS algorithm to estimate the

channel in a multi-cell environment is evaluated when using the same group of orthogonal
pilots in all cells and after using quasi orthogonal pilots when modifying the size of the cells
and with different delays to the pilots. Then, the estimation algorithm LS is compared
against the performance of a MP algorithm for the both cases of orthogonal and semi
orthogonal pilots in different levels of SNR while exploiting the sparse characteristics of
the channel. Next, the ASP algorithm is added to the simulations and the behaviour for
different number of users is observed.
4.1 Quasi-Orthogonal Pilots
To analyse the performance of the pilots generated using the algorithm to produce quasi-
orthogonal pilots, the simulation of a cellular network with pilot contamination based on
expression (3.2) is performed.
41
Same Pilots Quasi-Orthogonal

Receiver antennas (nr ) per cell 256 256
Users (K) per cell 128 128
Number of cells (L) 7 7
Pilot length (T ) 256 256
Reuse factor 1 3
Nominal cell size rc 250m 250m
Decay exponent (γ) 3.8 3.8
Log-normal parameters (µ, σ) 0, 8dB 0, 8dB
Table 4.1: Parameters of the simulation 1
1 1 3 2
1 1 1 2 1 3
1 1 3 2
Same Pilots Quasi-Orthogonal Pilots
Figure 4.1: Pilot reuse pattern
The performance of the quasi-orthogonal pilots with a reuse factor of 3 is compared

to the performance of orthogonal pilots with a reuse factor of 1 in a hexagonal network
topology of 7 cells. The scenario used consists on a hexagonal pattern of 7 cells, each of one
with equal number of users K = 128 and equal number of receiving antennas nr = 256. The
cell radius rc is the same for all the cells, the decay exponent γ = 3.8 and the parameters
for the slow fading (log-normal distribution) are µ = 0, σ = 8dB. The pilot length T = 256
and the reuse factor of 3 gives a design specification of 3 × K = 384 quasi-orthogonal pilots
of length T , these pilots are divided into 3 groups and they are distributed according to
the pattern in Figure 4.5.
The Table 4.1 summarizes the parameters of the simulation, these will be used in all the
simulations unless otherwise mentioned. The Monte Carlo simulations are run using the
Least Squares estimator for the multicell case as described in (3.4) and (3.6), the random
channels Hi , the cross-gains βi and the noise Wj are changed in each run.
(SNR = 20, Pilot Length = 256, Users = 7x128) (SNR = 0, Pilot Length = 256, Users = 7x128)
10 2 10 2
Same Pilots LS Same Pilots LS
Quasi-Orthogonal Pilots LS Quasi-Orthogonal Pilots LS
1 1
10 10
10 0 10 0
||H-H e ||2F / ||H||2F

||H-H e ||2F / ||H||2F
10 -1 10 -1
10 -2 10 -2
10 -3 10 -3
10 -4 10 -4
200 400 600 800 1000 1200 1400 1600 200 400 600 800 1000 1200 1400 1600
Cell size Cell size
(a) 20dB SNR (b) 0dB SNR
Figure 4.2: Pilots performance with cell size
The first scenario in Figure 4.2 shows the variation of the normalized estimation error
b 2 /kHk2 ) with different cell sizes (rc ) when the users transmit synchronized pilots
(kH − Hk F F
with two different values of SNR averaging over 1e6 realizations. The error in the estimation
in this case consistently decreases as the cell size becomes larger when the SNR is 20dB,
confirming previous results that link the effect of pilot contamination directly to the cross
gain among cells, or in this case the cell size. The performance of the estimator using
quasi-orthogonal pilots is better for all cell sizes. However, when the SNR is 0dB there is
no difference since the effect of the pilot contamination is masked by the noise which is
greater
In the second scenario, the error in estimation is found when there is a delay in the
(SNR = 20, Pilot Length = 256, Users = 7x128) (SNR = 0, Pilot Length = 256, Users = 7x128)
Same Pilots LS Same Pilots LS

Quasi-Orthogonal Pilots LS Quasi-Orthogonal Pilots LS
10 0 10 0
||H-H e ||2F / ||H||2F

||H-H e ||2F / ||H||2F
10 -1 10 -1
0 50 100 150 200 0 50 100 150 200

Symbols delay Symbols delay
Figure 4.3: Pilots performance vs delay
transmission of the pilots in the interfering cells for two different values of SNR. Keeping
the cell of interest as the reference, the transmitted pilots in the neighbouring cells are
delayed. The delay is the same for all users in the neighbouring cells and random data
is added in the delayed space. The delays are changed in each of the 1e4 realizations.
Figure 4.3 shows that the error of estimation increases when using quasi-orthogonal pilots
as the delay grows, matching the performance of the orthogonal pilots. As mentioned
in [45], synchronized transmission and reception is the worst case scenario for the use of
orthogonal pilots which contrasts the result obtained with quasi-orthogonal pilots where
the delay deteriorates the performance. The interference created by the delay of the quasi-
orthogonal pilots becomes similar to that of the orthogonal ones since the quasi-orthogonal
pilots in the neighbouring cells become similar to those of the cell of interest, resembling
the original problem, that is the use of the same pilots in all the cells.
It can be inferred that the use of quasi-orthogonal pilots reduces the error in the estima-
4.2. SPARSITY 45
tion of the channel. However, in order for maximum gain to be achieved synchronization
among the cells is required. This is a challenge in the design of wireless networks since pre-
vious technologies like GSM and current ones like LTE do not include the synchronization
of pilots among cells, given the complexity of the coordination of a large number of cells
and its users. While the worst case for the use of quasi-orthogonal matches the same result
obtained using the same set of pilots in each cell, a gain of almost one order of magnitude
is seen when there is perfect synchronization of the pilot’s symbols among all cells.
4.2 Sparsity
(Pilot Length = 256, Users = 7x128) (Pilot Length = 256, Users = 7x128)
10 1 10 1
Same pilots SP Same pilots SP
Quasi orthogonal SP Quasi orthogonal SP
Same pilots MP Same pilots MP
Quasi orthogonal MP Quasi orthogonal MP
10 0 10 0
||H-H e ||2F / ||H||2F
||H-H e ||2F / ||H||2F
10 -1 10 -1
10 -2 10 -2
10 -3 10 -3
0 5 10 15 20 25 30 0 5 10 15 20 25 30
SNR SNR
(a) Different support sets (b) Same support sets
Figure 4.4: Matching Pursuit algorithms comparison
To take advantage of the sparse characteristics of the channel as shown in (3.18), a

MP algorithm is used to estimate the channel. An initial simulation compares the error
in estimation between MP and SP. The scenario used consists of 7 cells, each one with
K = 128 users and base stations with nr = 256 receiving antennas in a URA pattern of
4.2. SPARSITY 46
dimensions p×q = 16×16. The sparsity of each virtual angular channel is K = 64, in other
words, each column of the matrix Hiw of dimensions 256 × 128 has 64 elements that are
different from zero in random positions and the remaining elements are zero. The cell size
is rc = 250m and the parameters of the cross-gains βi remain the same from the previous
section. The table 4.2 summarizes the main changes in the simulation scenario from the
one in the previous section.
Same Pilots Quasi-Orthogonal

Sparsity level (K) 64 64
Cell size rc 250m 250m
Reuse factor 2 3
Table 4.2: Parameters of the simulation 2
2 1 3 2
2 1 2 2 1 3
1 2 3 2
Same Pilots Quasi-Orthogonal Pilots
Figure 4.5: Pilot reuse pattern
The two algorithms are evaluated with synchronized orthogonal pilots with a reuse
factor of 2 and synchronized quasi-orthogonal pilots with a reuse factor of 3. The reuse
factor is increased for the orthogonal pilots compared to the previous simulation since the
pilot length (T = 256) allows to have two sets of orthogonal pilots, while the benefits of
using quasi-orthogonal was established in the previous section a more challenging scenario
is studied here. As it is required, the sparsity level K is fed into the algorithms.
4.2. SPARSITY 47
The results in Figure 4.4 show a similar estimation error for the MP and SP algorithms.
An interesting result is observed in Figure 4.4b where the error of estimation is smaller
than the one in Figure 4.4a. While in the first case the K non-zero values take random
positions in each column, having the columns of the matrix Hiw different support sets, in
the second case K non-zero values are placed randomly for the first column of Hiw , then the
same positions are kept for the remaining columns, so the columns of the matrix Hiw have
all identical support sets (common supports). The channels, the cross-gains and the noise
are changed in each of the 1e3 realizations. This means that a better estimation result
is observed when the supports among columns are similar which is closer to a realistic
scenario. Due to the same scatterers, the antennas in the same area receive similar echoes
from similar directions of the original signals, after performing the angular transformation
the channel matrix will have columns with common supports.
10 1 10 1
10 0 10 0
10 -1 10 -1
||H-H e ||2F / ||H||2F
||H-H e ||2F / ||H||2F
10 -2 10 -2
10 -3 10 -3
Same pilots SP
10 -4 10 -4 Same pilots SP
One cell SP One cell SP
Same pilots LS
10 -5 10 -5 Same pilots LS
Quasi orthogonal LS Quasi orthogonal LS
One cell LS One cell LS
10 -6 10 -6
0 5 10 15 20 25 30 0 5 10 15 20 25 30
SNR SNR
(a) Sparsity level (K = 64) (b) Sparsity level (K = 32)
Figure 4.6: Performance of Subspace Pursuit and Least Squares
In Figure 4.6, the same scenario is repeated for the different random support sets in each
4.2. SPARSITY 48
column of Hiw . The error in estimation from SP is compared to the result obtained using
LS. For different levels of SNR the two estimation methods are evaluated for a single cell
with no interference from other cells, employing synchronized orthogonal pilots with a reuse
factor of 2 and synchronized quasi-orthogonal pilots with a reuse factor of 3. Similarly to the
previous scenarios, the channels, the cross-gains and the noise are changed in each of the 103
realizations. An improvement is observed in the three cases with SP compared to LS. Also
the performance of the quasi-orthogonal pilots remains better than the orthogonal ones.
However, as the SNR increases the performance of the sparse estimation is deteriorated by
the interference of the other cells to the point of obtaining a similar result to the Least
Squares estimator. Reducing the level of sparsity in Figure 4.6b also shows a similar
behaviour but a higher improvement is achieved when the number of significant signal
directions is reduced to K = 32.
10 1 10 1
10 0 10 0
10 -1 10 -1
||H-H e ||2F / ||H||2F
||H-H e ||2F / ||H||2F
10 -2 10 -2
10 -3 Same pilots SP 10 -3 Same pilots SP

One cell SP One cell SP
10 -4 Same pilots LS 10 -4 Same pilots LS
10 -5 Same pilots ASP 10 -5 Same pilots ASP
Quasi orthogonal ASP Quasi orthogonal ASP
One cell ASP One cell ASP
10 -6 10 -6
0 5 10 15 20 25 30 0 5 10 15 20 25 30
SNR SNR
(a) Synchronized cells (b) Non-synchronized cells
Figure 4.7: Adaptive Subspace Pursuit performance (same sparsity)
Taking advantage of the sparse properties of the channel, compressive sensing gives
better results in the estimation of the channel over LS. The MP algorithms show in general
a good result, however, the performance depends on the knowledge of the level of sparsity
K. This represents a challenge since the sparsity level value is usually unknown.
4.3 Matching Pursuit with unknown sparsity level

10 1 10 1
Same pilots SP
Quasi orthogonal SP
0 Same pilots LS
10
Quasi orthogonal LS
Same pilots ASP
10 0
10 -1 Quasi orthogonal ASP
||H-H e ||2F / ||H||2F
||H-H e ||2F / ||H||2F

10 -2
10 -1
-3
10
10 -4 Same pilots LS
Quasi orthogonal LS 10 -2
One cell LS
10 -5 Same pilots ASP
Quasi orthogonal ASP
One cell ASP
10 -6 10 -3
0 5 10 15 20 25 30 0 5 10 15 20 25 30
SNR SNR
(a) ASP vs LS (b) ASP, SP, LS
Figure 4.8: Adaptive Subspace Pursuit performance (random sparsity)
Using the same 7-cell scenario from the previous section with sparsity level K = 64
and applying Algorithm 3 from the last chapter, the simulation is performed again in two
different situations: the first one with synchronized pilots in all cells and, the second one
non synchronized pilots; using the cell of interest as reference and introducing a random
delay in the pilots of all the other cells, while the delay is the same for all users within a
cell, the delay varies for each cell, random data is added in the delayed space. Following the
same procedure described before the values of the delay and the channels, the cross-gains
and the noise are changed in each of the 103 realizations. The comparison of the adaptive
sparse pursuit (ASP) algorithm is done against the SP and the LS estimator in Figure 4.7
for both situations.
In both cases a reduction in channel estimation error is evident. The ASP improvement
in comparison with the LS is expected since the SP already exhibited a better performance
in the previous section. However, the unexpected superior performance of ASP compared
to the SP can be explained due to the fact that ASP searches for the best fitted elements
in the support set which might not necessarily account to all elements in the support set.
SP selects all the elements in the support set specified by the level of sparsity K regardless
of the fact that some of those indices might correspond to channel components below the
noise level hence, incorporating error in the estimation. ASP in the other hand might select
a fewer number of elements in the support set than those specified by K but it guarantees
that the channel components selected are above the interference coming from the adjacent
cells.
Users vs Estimation error (Pilot Length = 256, SNR = 20) Users vs Estimation error (Pilot Length = 256, SNR = 5)
10 1 10 1
Same pilots LS Same pilots LS
10 0
Same pilots ASP Same pilots ASP
Quasi orthogonal ASP Quasi orthogonal ASP
10 0 One cell ASP
One cell ASP
10 -1
||H-H e ||2F / ||H||2F
||H-H e ||2F / ||H||2F
10 -2
10 -1
-3
10
10 -4
10 -2
10 -5
10 -6 10 -3
0 50 100 150 200 250 0 50 100 150 200 250
Users Users
Figure 4.9: Adaptive Subspace Pursuit performance (as a function of users)

Regarding the influence of the synchronization of the cells, the result goes accordingly
to the findings in the first simulations. The lack of synchronization deteriorates the perfor-
mance of the quasi-orthogonal pilots, getting it closer to the performance of the orthogonal
ones. However, it can be seen that on average the quasi-orthogonal pilots still perform
better given that the delays are random and the delays are not always ”in the worst case”
for the quasi-orthogonal.
in Figure 4.8, the improved performance of the ASP algorithm can also be observed in
a similar scenario, using random values of sparsity level chosen uniformly (20 ≤ K ≤ 100)
in each of the 103 realizations and, having channels with common supports. Looking in
detail and comparing to SP in Figure 4.8b, the limitation of the three estimation methods
is reached at low SNR where the interference among cells affects the estimation.
It is possible to observe also the behaviour when the number of users change. In Figure
4.9 the number of users of each cells is modified for different values of SNR, 20dB and 5dB
correspondingly. In both cases the overall performance of the ASP algorithm is better than
LS. Nevertheless, there is a point in which the quasi orthogonal pilots start performing
worse than the orthogonal ones. It is clear that the distance among the pilots in the quasi
orthogonal case becomes smaller as the number of users increase.
Chapter 5
Conclusions and Future Research
5.1 Summary and Conclusions
In this thesis, the problem of channel estimation in Massive MIMO was investigated; the
knowledge of the channel is usually never known a priori however, it is fundamental for the
performance of the wireless communication link. Previous approaches rely in the trans-
mission of pilots that need to be reused in adjacent cells creating the problem of pilot
contamination. The work presented here focuses in an alternative method to reduce the
pilot contamination effect by taking advantage of the spreading codes used for CDMA and
the sparse characteristics of the channel. The algorithm proposed shows a gain in the
performance compared to traditional approaches.
The relaxation on the condition of orthogonality for the training pilots used to identify
the channel is initially explored by using quasi-orthogonal pilots. Although not orthogonal,
these codes are constructed with the minimum correlation possible by approaching the
Welch Bound. As a result, the effect of pilot contamination is considerably reduced with the
52
5.1. SUMMARY AND CONCLUSIONS 53
use of synchronized quasi-orthogonal pilots. The worst-case scenario for quasi-orthogonal

pilots, when the training sequence is not synchronized, matches the same performance of
the orthogonal sequences under the same conditions.
By approaching the problem from a different angle, the sparse characteristics of the
Massive MIMO channels are examined. With the angular transformation of the channel,
the virtual directions of the multipath signals can be extracted, specially accounting for
the fact that the large number of antennas can produce the necessary resolution for the
virtual angles of the signals’ directions. A specific case for rectangular array antennas in the
receiver was simulated to estimate the angular sparse channel obtaining reduced channel
estimation error.
For the solution of the sparse problem an adaptive matching pursuit algorithm was em-
ployed given the lack of knowledge of the sparse level the channel. The adaptive algorithm
initially performs the estimation of the sparse level and subsequently performs the estima-
tion of the supports in a iterative manner until the minimum residual error of estimation
is obtained. Due to interference and noise, the adaptive algorithm fails to estimate all the
valid supports of the channel however, the channel estimation error is reduced. This result
is obtained since the adaptive algorithm only selects the supports that are above noise level,
incorporating to the estimation only the values that relatively have been modified less by
noise and by removing those that are below noise level and therefore can only degrade the
total estimation.
Finally, it can be concluded that using of quasi-orthogonal pilots and exploiting the
sparsity of the channel, the performance of the estimation process can be improved. Given
the relatively low complexity of adaptive matching pursuit, the low consumption of re-
sources that quasi-orthogonal pilots demand and the gains obtained by using compressive
5.2. FUTURE WORK 54
sensing, this is good alternative method for channel estimation in Massive MIMO.
5.2 Future Work
It is difficult to admit that after all the effort and time consumed in the development of
this work there are still so many unanswered questions. Fortunately the current research
opens the path to many other possible knowledge pursues. To the wisdom of future promi-
nent investigators here are some of the possible directions that might continue the present
work. The results presented showed an interesting improvement after the relaxation of the
orthogonality of the codes. However the conditions in which the codes are distributed are
not considered. A large set of quasi orthogonal codes is created and distributed randomly.
The smart distribution of the training pilots among the users depending on their physical
location and sparse characteristics of the channel gives room to the further improvement.
On the other hand the angular transformation is only performed here for uniform rect-
angular arrays of antennas. This work can be extended to other antenna topologies like
cylindrical arrays. A more profound study on which type of topology will benefit more from
the sparse characteristics of the channel can also bring other advantages. In fact, the sparse
characteristics of the channel were not fully exploited. It is known that multipath signals
from users share same scatterers creating sparse representations on the receiver side with
common supports. This knowledge can ameliorate further the estimation of the channel.
Bibliography
[1] G. N. Karystinos and D. A. Pados, “New bounds on the total squared correlation and
optimum design of ds-cdma binary signature sets,” IEEE Transactions on Communi-
cations, vol. 51, no. 1, pp. 48–51, 2003.
[2] C.-X. Wang, F. Haider, X. Gao, X.-H. You, Y. Yang, D. Yuan, H. M. Aggoune,
H. Haas, S. Fletcher, and E. Hepsaydir, “Cellular architecture and key technologies
for 5g wireless communication networks,” IEEE Communications Magazine, vol. 52,
no. 2, pp. 122–130, 2014.
[3] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing:

A new approach to estimating sparse multipath channels,” Proceedings of the IEEE,
vol. 98, no. 6, pp. 1058–1076, 2010.
[4] D. Gesbert, M. Shafi, D.-s. Shiu, P. J. Smith, and A. Naguib, “From theory to practice:
an overview of mimo space-time coded wireless systems,” IEEE Journal on Selected
Areas in Communications, vol. 21, no. 3, pp. 281–302, 2003.
[5] J. Kimery and I. Wong, “Prototyping massive mimo,” Microwave Journal, vol. 57,
no. 1, pp. 92–+, 2014.
[6] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive mimo for next gener-
ation wireless systems,” IEEE Communications Magazine, vol. 52, no. 2, pp. 186–195,
2014.
[7] X. Su, J. Zeng, L.-P. Rong, and Y.-J. Kuang, “Investigation on key technologies
in large-scale mimo,” Journal of Computer Science and Technology, vol. 28, no. 3,
pp. 412–419, 2013.
[8] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and

F. Tufvesson, “Scaling up mimo: Opportunities and challenges with very large arrays,”
IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, 2013.
BIBLIOGRAPHY
[9] M. Biguesh and A. B. Gershman, “Training-based mimo channel estimation: a study

of estimator tradeoffs and optimal training signals,” IEEE Transactions on Signal
Processing, vol. 54, no. 3, pp. 884–893, 2006.
[10] X. Rao, V. K. Lau, and X. Kong, “Csit estimation and feedback for fdd multi-user
massive mimo systems,” in 2014 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 3157–3161, IEEE, 2014.
[11] D. Boss, T. Petermann, and K.-D. Kammeyer, “Impact of blind versus non-blind
channel estimation on the ber performance of gsm receivers,” in Proceedings of the
IEEE Signal Processing Workshop on Higher-Order Statistics, 1997., pp. 62–66, IEEE,
1997.
[12] Z. Yi and Y.-M. Cai, “Cramer-rao bound for blind, semi-blind and non-blind channel
estimation in ofdm systems,” in IEEE International Symposium on Communications
and Information Technology, 2005. ISCIT 2005., vol. 1, pp. 523–526, IEEE, 2005.
[13] M. K. Steven, “Fundamentals of statistical signal processing,” PTR Prentice-Hall,
Englewood Cliffs, NJ, 1993.
[14] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive mimo in the ul/dl of cellular
networks: How many antennas do we need?,” IEEE Journal on Selected Areas in
Communications, vol. 31, no. 2, pp. 160–171, 2013.
[15] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive mimo: How many antennas do we
need?,” in 2011 49th Annual Allerton Conference on Communication, Control, and
Computing, pp. 545–550, IEEE, 2011.
[16] P. Xu, J. Wang, and J. Wang, “Effect of pilot contamination on channel estimation
in massive mimo systems,” in 2013 International Conference on Wireless Communi-
cations & Signal Processing (WCSP), pp. 1–6, IEEE, 2013.
[17] S. Noh, M. D. Zoltowski, Y. Sung, and D. J. Love, “Training signal design for channel
estimation in massive mimo systems,” in 2014 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 6499–6503, IEEE, 2014.
[18] J. Zhang, B. Zhang, S. Chen, X. Mu, M. El-Hajjar, and L. Hanzo, “Pilot contamination
elimination for large-scale multiple-antenna aided ofdm systems,” IEEE Journal of
Selected Topics in Signal Processing, vol. 8, no. 5, pp. 759–772, 2014.
[19] W. A. Mahyiddin, P. A. Martin, and P. J. Smith, “Pilot contamination reduction
using time-shifted pilots in finite massive mimo systems,” in 2014 IEEE 80th Vehicular
Technology Conference (VTC Fall), pp. 1–5, IEEE, 2014.
BIBLIOGRAPHY
[20] A. Ashikhmin and T. Marzetta, “Pilot contamination precoding in multi-cell large scale
antenna systems,” in 2012 IEEE International Symposium on Information Theory
Proceedings (ISIT), pp. 1137–1141, IEEE, 2012.
[21] H. Wang, Z. Pan, J. Ni, and I. Chih-Lin, “A spatial domain based method against pilot
contamination for multi-cell massive mimo systems,” in 2013 8th International ICST
Conference on Communications and Networking in China (CHINACOM), pp. 218–
222, IEEE, 2013.
[22] G. Wunder, H. Boche, T. Strohmer, and P. Jung, “Sparse signal processing concepts
for efficient 5g system design,” IEEE Access, vol. 3, pp. 195–208, 2015.
[23] M. Masood, L. H. Afify, and T. Y. Al-Naffouri, “Efficient coordinated recovery of

sparse channels in massive mimo,” IEEE Transactions on Signal Processing, vol. 63,
no. 1, pp. 104–118, 2014.
[24] Z. Chen and C. Yang, “Pilot decontamination in massive mimo systems: Exploiting
channel sparsity with pilot assignment,” in 2014 IEEE Global Conference on Signal
and Information Processing (GlobalSIP), pp. 637–641, IEEE, 2014.
[25] C.-K. Wen, S. Jin, K.-K. Wong, J.-C. Chen, and P. Ting, “Channel estimation for mas-
sive mimo using gaussian-mixture bayesian learning,” IEEE Transactions on Wireless
Communications, vol. 14, no. 3, pp. 1356–1368, 2014.
[26] D. C. Araujo, A. L. de Almeida, J. Axnas, and J. Mota, “Channel estimation for

millimeter-wave very-large mimo systems,” in 2013 Proceedings of the 22nd European
Signal Processing Conference (EUSIPCO), pp. 81–85, IEEE, 2014.
[27] C. R. Berger, Z. Wang, J. Huang, and S. Zhou, “Application of compressive sensing to

sparse channel estimation,” IEEE Communications Magazine, vol. 48, no. 11, pp. 164–
174, 2010.
[28] B. Clerckx and C. Oestges, MIMO wireless networks: Channels, techniques and stan-
dards for multi-antenna, multi-user and multi-cell systems. Academic Press, 2013.
[29] X. Rao and V. K. Lau, “Distributed compressive csit estimation and feedback for fdd
multi-user massive mimo systems,” IEEE Transactions on Signal Processing, vol. 62,
no. 12, pp. 3261–3271, 2014.
[30] R. G. Baraniuk, “Compressive sensing,” IEEE signal processing magazine, vol. 24,
no. 4, 2007.
BIBLIOGRAPHY
[31] E. J. Candès, “The restricted isometry property and its implications for compressed
sensing,” Comptes Rendus Mathematique, vol. 346, no. 9, pp. 589–592, 2008.
[32] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal recon-
struction,” IEEE Transactions on Information Theory, vol. 55, no. 5, pp. 2230–2249,
2009.
[33] T. H. Cormen, Introduction to algorithms. MIT press, 2009.
[34] T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with
noise,” IEEE Transactions on Information Theory, vol. 57, no. 7, pp. 4680–4688, 2011.
[35] D. Needell and R. Vershynin, “Signal recovery from incomplete and inaccurate mea-
surements via regularized orthogonal matching pursuit,” IEEE Journal of Selected
Topics in Signal Processing, vol. 4, no. 2, pp. 310–316, 2010.
[36] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution of underdeter-
mined systems of linear equations by stagewise orthogonal matching pursuit,” IEEE
Transactions on Information Theory, vol. 58, no. 2, pp. 1094–1121, 2012.
[37] S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,”

IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, 1993.
[38] X. Zhu, J. Wang, L. Dai, and Z. Wang, “Sparsity-aware adaptive channel estimation
based on snr detection,” IEEE Transactions on Broadcasting, vol. 61, no. 1, pp. 119–
126, 2015.
[39] H. Weiqiang, Z. Jianlin, L. Zhiqiang, and D. Xuejie, “Sparsity and step-size adap-
tive regularized matching pursuit algorithm for compressed sensing,” in 2014 IEEE
7th Joint International Information Technology and Artificial Intelligence Conference
(ITAIC), pp. 536–540, IEEE, 2014.
[40] G. Sun, Y. Zhou, Z. Wang, W. Dang, and Z. Li, “Sparsity adaptive compressive
sampling matching pursuit algorithm based on compressive sensing,” Journal of Com-
putational Information Systems, vol. 7, no. 4, pp. 2883–2890, 2012.
[41] H. Wu and S. Wang, “Adaptive sparsity matching pursuit algorithm for sparse recon-
struction,” IEEE Signal Processing Letters, vol. 19, no. 8, pp. 471–474, 2012.
[42] L. Welch, “Lower bounds on the maximum cross correlation of signals (corresp.),”
IEEE Transactions on Information theory, pp. 397–399, 1974.
BIBLIOGRAPHY
[43] M. F. Duarte and R. G. Baraniuk, “Kronecker product matrices for compressive sens-
ing,” in 2010 IEEE International Conference on Acoustics Speech and Signal Process-
ing (ICASSP), pp. 3650–3653, IEEE, 2010.
[44] C. Zhuoran, Z. Honglin, J. Min, W. Gang, and S. Jingshi, “An improved hadamard
measurement matrix based on walsh code for compressive sensing,” in 2013 9th Inter-
national Conference on Information, Communications and Signal Processing (ICICS),
pp. 1–4, IEEE, 2013.
[45] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base

station antennas,” IEEE Transactions on Wireless Communications, vol. 9, no. 11,
pp. 3590–3600, 2010.

Sparse Channel Estimation For Massive MIMO Using Quasiorthogonal Pilots - 2 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Sparse Channel Estimation For Massive MIMO Using Quasiorthogonal Pilots - 2 PDF

Загружено:

Авторское право:

Доступные форматы

Sparse Channel Estimation for Massive MIMO

using Quasi-orthogonal Pilots

Department of Electrical & Computer Engineering

A thesis submitted to McGill University in partial fulfilment of the requirements of the

3 Sparse channel estimation using quasi-orthogonal pilots 29

5 Conclusions and Future Research 52

2.1 Multicell environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Design procedure for optimal pilots set [1] . . . . . . . . . . . . . . . . . . 30

4.1 Pilot reuse pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1.1 4G and current MIMO challenges

1.2 Massive MIMO

C = min(nt , nr ) log(1 + SN R) (1.1)

1.2.1 Channel Estimation in Massive MIMO

1.3 Thesis organization and contributions

2.1 Channel Estimation under Pilot Contamination

estimating the channel in the downlink [10].

2.1.1 System Model

A simple model of a MIMO system consists of an array of nt transmitter antennas and

where H is the nr × nt complex channel propagation matrix, and w is the nr × 1 vector

received signal is obtained [3]:

where Y = [y1 , . . . , yT ] is the nr × T matrix of received signals at each antenna during

Ĥ = arg min{kY − HXk2F } (2.3)

leading to the error in estimation (2.5) [9]:

E{kĤLS − Hk2F } = σn2 nr tr{(XX H )−1 } (2.5)

where σn2 is the received noise power.

2.1.2 Pilot Contamination

1-cell i-th cell ...

Figure 2.1: Multicell environment

of interest in the following form:

ĤjLS = Yj Xj† (2.8)

ĤjLS = Hj + ΣLi=1 βi Hi +Wj Xj† (2.9)

2.2.1 Sparsity in Massive MIMO

With the use of a large number of antennas it is possible to observe an approximately

transmission system, specifically OFDM, like the following:

y = (xT ⊗ IT )vec(H) + w (2.18)

inverse Fourier decomposition of the channel [28]:

B = [ b(ϕ0 ) b(ϕ1 ) . . . b(ϕnt −1 ) ] (2.25)

Scatterer Base Station User

Figure 2.2: Angular component ULA

2.2.2 Compressive Sensing

zb = arg min kzk`0 subject to ky − Φzk`2 ≤  (2.29)

zb = arg min kzk`1 subject to ky − Φzk`2 ≤  (2.30)

In a broader approach, a variable x ∈ Cn may not be considered sparse in a standard basis

(1 − δs )kΨxk22 ≤ kΦΨxk22 ≤ (1 + δs )kΨxk22 (2.32)

2.2.3 Matching Pursuit Algorithms

Algorithm 1 Subspace Pursuit Algorithm

1: T0 = {k indices corresponding to the largest magnitude entries in vector Φ∗ y}

2.3 Matching Pursuit with unknown sparsity level

2.3.1 Channel estimation with unknown sparsity level

2.3.2 Adaptive Matching Pursuit

Algorithm 2 Adaptive Sparse Matching Pursuit Algorithm

Sparse channel estimation using

3.1 Quasi-Orthogonal Pilots

H0ßHadamard of Size N, DßN x min{K,T} submatrix of H0

N-1 N N+1 N+2

Figure 3.1: Design procedure for optimal pilots set [1]

3.2 Basic System Model

Yj = gHj βj Xj + gΣLi=1 Hi βi Xi + sWj (3.2)

ĤjLS β̂jLS = gHj βj + gΣLi=1 Hi βi +sWj Xj† (3.4)

3.3 Modified System Model

Aφ = [ aφ (ϕ0 ) aφ (ϕ1 ) . . . aφ (ϕq−1 ) ] (3.13)

Base Station User

Figure 3.2: Angular components URA

zb = arg min kzk`0 subject to ky − Φzk`2 ≤ (2.29)

zb = arg min kzk`1 subject to ky − Φzk`2 ≤ (2.30)