Comparison of Parallel vs. Successive Interference Cancellation for NOMA

Comparison of Parallel and Successive Interference
Cancellation for Non-Orthogonal Multiple Access

Talgat Manglayev Refik Caglar Kizilirmak Yau Hee Kho
Department of Electrical Department of Electrical School of Engineering
and Computer Engineering and Computer Engineering and Computer Science
Nazabayev University Nazabayev University Victoria University of Wellington
Astana Z05H0P9, Kazakhstan Astana Z05H0P9, Kazakhstan Wellington 6140, New Zealand
talgat.manglayev@nu.edu.kz refik.kizilirmak@nu.edu.kz yhkho@ieee.org
Abstract—Non-orthogonal multiple access (NOMA) is a challenging decoding problem in NOMA networks, it has
promising method for the fifth generation (5G) cellular networks some drawbacks. Firstly, it carries dependency on correct
as it provides improved spectral efficiency by multiplexing users decoding during each iteration, i.e., an error made in one
in power domain. One key challenge for the receivers in NOMA
networks is to distinguish the individual signals that use the iteration may propagate the other ones. Secondly, user signals
same band at the same time. Currently, the two widely dis- have to wait to be decoded until the decoding of signals
cussed decoding schemes are successive interference cancellation of previous users are complete. Another drawback of SIC is
(SIC) and parallel interference cancellation (PIC). Both schemes that for correct decoding, power allocation between the user
suppress the multi-user interference by subtracting the decoded signals should be accurately defined. Even though SIC with
signals from the received signal based on different algorithms, i.e.,
SIC decodes iteratively and PIC decodes collectively. This paper NOMA is being promised, it remains impractical for massive
compares the computation time of SIC and PIC schemes at the deployment.
base station and demonstrates multi-thread implementation of In contrast, when PIC is employed, the receiver decodes
PIC. each user signal at the same time in parallel. Its plain architec-
Index Terms—5G, NOMA, SIC, PIC, Parallel programming, ture and ease of implementation overcome the aforementioned
GPU, CUDA
deficiencies for SIC. In PIC, removing multi-user interference
and decoding the signal are conducted in parallel [7] [8] [9].
I. I NTRODUCTION
When we compare SIC and PIC schemes, it is clear that
Developments in information technologies and growth of decoding of each user signal in SIC is mutually dependant,
mobile devices put new requirements on wireless commu- while for PIC, decoding for each user is independent of each
nications. Major requirements put upon 5G are caused by other and cancellation tasks may be distributed to multi-core
popularity of internet of things and mobile internet. Modern processors. Multi-core processors run programs concurrently
consumers of wireless communication services such as vir- and independently of each other. One task may be split into
tual reality, augmented reality and e-health services demand sub-tasks equal to number of cores and final results from each
enhanced broadband, massive machine-type communications, core are gathered. Results of sub-tasks must be independent
low latency and spectrum efficiency [1] [2] [3]. System re- of each other so that they can be run separately.
sources are used to reach high spectrum efficiency with several If number of users in a NOMA system grows, heavy
different multiple access schemes. These are multiple access interference cancellations become time consuming. Such com-
schemes implemented in different standards; time division plex computations may be optimized by running parallel sub
multiple access (TDMA), frequency division multiple access tasks on graphical processing unit (GPU), rather than central
(FDMA), code division multiple access (CDMA) implemented processing unit (CPU). GPU is widely used to solve various
in the third generation (3G) and orthogonal frequency division general purpose problems as it dedicates a larger fraction of
multiple access (OFDMA) used in the latest fourth generation hardware resources to computation than CPU does [10].
(4G) networks. In this work, our aim is to compare computation time of
In NOMA, which is a candidate multiple access scheme, SIC and PIC, and show that structure of PIC enables faster
signals are multiplexed in power domain where individual execution than SIC. In reaching our aim, we also demonstrate
signals are distinguished by their power levels. The receiver multi-threaded implementation of PIC. Multi-threading allows
decodes the signals by either successive interference cancel- us to use hardware more efficiently and run the work even
lation (SIC) or parallel interference cancellation (PIC) [4] [5] faster. Multi-threading comparison is done on CPU with Java
[6]. programming language and on GPU with CUDA architecture.
When SIC is employed, the receiver decodes user signals The rest of the paper is organized as follows. In Section II
iteratively by subtracting each decoded signal from the re- we introduce SIC and PIC decoders, then in Section III we
ceived signal. Although, SIC has a potential to solve the present our numerical results and finally we conclude our work
978-1-5386-5928-1/18/$31.00 2018
c IEEE 74
in Section IV.
II. S YSTEM M ODEL
In this paper, we consider uplink transmission for a single
cell NOMA network. In the network, all users transmit signals
at the same time and in the same band simultaneously via
different paths. For K user equipments (UEs), the received
Fig. 1. Block diagram of a typical SIC scheme.
signal by the base station (BS) can be written as
K

y(t) = gk ( ak PT xk (t)) + n(t) (1) K
k=1 xk (t) = y(t) − k=1 sî ,
k=i (2)
where gk is the channel attenuation coefficient between UE i = 1, 2, ..., k − 1, k + 1, ..., K.
k and the BS, xk (t) is the modulated waveform transmitted
from UE k, ak is power coefficient for UE k assigned by the In (2), decoded messages for K UEs (except the desired kth
BS, PT is the overall power for signals of all UEs and n(t) UE) are summed and subtracted from the received signal y(t).
is the additive white Gaussian noise (AWGN) at the BS with Then the result is decoded and message for UE k is obtained.
zero mean and density N0 (W/Hz). As can be seen, this summing operations and then subtractions
may take place in parallel [16].
A. Power Allocation
In SIC, order of decoding depends on the power allocation III. N UMERICAL R ESULTS
between each UEs. In other words, the user signal with the In this section, we compare the computation time of PIC
highest received power is decoded first. In [11], spectral and SIC schemes for different number of UEs in a single cell
efficiency of the network is maximized by optimizing the NOMA network. We considered single carrier transmission
power allocation among UEs in NOMA uplink channels. In with QPSK and 16-QAM and maximum likelihood decoding
this work, we also optimize the power allocation coefficients so that higher modulation order requires higher decoding time.
αk ’s in decoding the received signals with SIC. The same Maximum available power for UEs PT = 23 dBm. Frequency
methodology described in [12] is followed. In contrast, in PIC, is 1 GHz and users are distributed randomly in the coverage
we assume that power is equally distributed among UEs and area of the cell. Channel gains are obtained with Okumura-
αk ’s are set equal to each other. 1 Hata propagation model. The noise density N0 is taken as
10−17 W/Hz. Calculations were done on a machine with
B. Successive Interference Cancellation
following hardware parameters: CPU Intel(R) Xeon(R) CPU
Fig. 1 shows the block diagram of a SIC receiver. In SIC, E5620 @ 2.40GHz, 5GB RAM Memory 1333 MHz DDR3,
BS has the knowledge of power allocation coefficients (αk ’s) NVIDIA TITAN Xp with 3840 CUDA Cores and Boost Clock
and channel gains (gk ’s) for each UE. The first decoded signal of 1582 MHz. Simple tic toc MATLAB functions were used
by the BS would be the signal for UE 1 and the remaining part for results in Fig. 3. Version 8 of Java programming language
is considered as multi-user interference. BS then subtracts the for running PIC via streams in parallel threads on multi cores
decoded signal from the received signal and the signals of the in Fig. 4. Java allows running parallel tasks using threads,
rest UEs are decoded in the same manner. Iterations continue which may be assigned per processor [17]. Version 8 of Java
until signals of all UEs are decoded. Order of decoding is introduced streams and parallel streams, which enable working
defined by the power allocation among UEs [13] [14] [15]. with collections of different object types and manipulate them
Power allocation coefficients are linearly dependent on the in parallel [18]. Finally, CUDA tool to measure elapsed time
distance from UEs to BS. The data rate for closer UEs and of events running on GPU device in Fig. 5.
with lower transmit power is less than for farther but with
higher transmit power [12].
C. Parallel Interference Cancellation
Fig. 2 shows the block diagram of a PIC receiver. PIC
decodes for each UE from the received composite signal at
the same time. In order to cancel the multi-user interference,
BS then subtracts each estimated message from the received
signal as
1 Power allocation for PIC-NOMA can also be optimized. However, it
should be noted that this work compares the computation time of SIC and
PIC schemes where the power allocation coefficients do not have any impact.
Still, we mention power allocation here since it is one of the most significant
factors in NOMA networks. Fig. 2. Block diagram of a typical PIC scheme.
Second International Conference on Computing and Network Communications (CoCoNet’18) 75

Fig. 3 shows the computation times of PIC and SIC as
the number of UEs increase. Dependency of decoding among 250
UEs makes SIC execution time even longer as the number SIC-QPSK
PIC-QPSK
of UEs grows, whereas PIC scheme decoding time changes SIC-QAM-16
200
PIC-QAM-16
marginally. PIC execution time is twice faster than SIC ex-
ecution time for 1000 UEs and it reaches approximately 5
fold difference for 2000 UEs. Execution time of modulation 150
Time (ms)
schemes QPSK and QAM-16 do not differ much.
100
12
SIC-QPSK
PIC-QPSK 50
10 SIC-QAM-16
PIC-QAM-16
8 0
500 1000 1500 2000 2500
Time (ms)
Number of UEs
6
Fig. 4. Computation time of SIC and PIC on multiple threads for QPSK and
16-QAM modulation schemes.
4
2 10
9
PIC-QPSK
0 SIC-QPSK
0 200 400 600 800 1000 1200 1400 1600 1800 2000 8
PIC-16-QAM
Number of UEs SIC-16-QAM
7
Fig. 3. Computation time of SIC and PIC for QPSK and 16-QAM modulation 6
Time (ms)
schemes.
5
PIC is found suitable to be implemented on parallel threads 4

as discussed above. In Fig. 4, we implement PIC via Java
3
8 using parallel streams on multi-cores and compare it with
SIC. As a result, execution time for both modulation schemes 2
is approximately constant and keep around 100 ms. As can be 1
seen, even if it took less time for SIC implementation until
0
500 UEs, execution time of SIC scheme is as twice as PIC 0 500 1000 1500 2000 2500 3000
when it reaches 2500 UEs. It may be predicted that number of Number of UEs
UEs does not affect execution time for PIC on parallel threads.
Fig. 5 shows SIC and PIC computation times on CUDA Fig. 5. Computation time of SIC and PIC on GPU run with CUDA
architecture for QPSK and 16-QAM modulation schemes.
architecture with C++ programming language. Linear curves
show that SIC execution time grows faster than PIC. Mod-
ulation schemes QPSK and 16-QAM do not differ much in increase. We also evaluated PIC execution time on multi-core
implementation time, only with slight faster performance of processor using multi-threading and results reveal that PIC
QPSK. For approximately 2500 UEs execution time reached outperforms SIC. Then we implemented multi-thread approach
9 ms for SIC against only 2 ms for PIC. on GPU device with CUDA architecture, which allows initi-
IV. C ONCLUSION ating GPU threads equal to number of UEs simultaneously.
PIC outperformed SIC and time difference increases with the
In this paper, we compared the computation time of SIC number of UEs. Breaking the task into parallel streams allowed
and PIC decoding schemes in the uplink of NOMA based us to use processor more efficiently.
cellular networks. Both schemes decode the received signal
by cancelling out interference in power domain. Decoding R EFERENCES
approaches resulted in different computation performances.
SIC execution time depends on the number of UEs in the [1] Z. K. Xiang Wei, 5G Mobile Communications. Springer, 2017.
network and increases gradually with the number of UEs. PIC [2] R. Vannithamby and S. Talwar, Towards 5G: Applications, Requirements
and Candidate Technologies. John Wiley & Sons, 2017.
has advantage of decoding the signal in parallel, therefore [3] B. M. J. Mavromoustakis Constandinos X., Mastorakis George, Internet
execution time grows moderately as the number of UEs of Things (IoT) in 5G Mobile Technologies. Springer, 2016.
76 Second International Conference on Computing and Network Communications (CoCoNet’18)

[4] Y. Saito, A. Benjebbour, Y. Kishiyama, and T. Nakamura, “System-level
performance evaluation of downlink non-orthogonal multiple access
(NOMA),” in Personal Indoor and Mobile Radio Communications
(PIMRC), 2013 IEEE 24th International Symposium on, pp. 611–615,
IEEE, 2013.
[5] L. Dai, B. Wang, Y. Yuan, S. Han, I. Chih-Lin, and Z. Wang, “Non-
orthogonal multiple access for 5g: solutions, challenges, opportunities,
and future research trends,” IEEE Communications Magazine, vol. 53,
no. 9, pp. 74–81, 2015.
[6] A. Benjebbour, Y. Saito, Y. Kishiyama, A. Li, A. Harada, and T. Naka-
mura, “Concept and practical considerations of non-orthogonal multiple
access (NOMA) for future radio access,” in Intelligent Signal Processing
and Communications Systems (ISPACS), 2013 International Symposium
on, pp. 770–774, IEEE, 2013.
[7] H. Yan and S. Roy, “Parallel interference cancellation for uplink mul-
tirate overlay cdma channels,” IEEE transactions on communications,
vol. 53, no. 1, pp. 152–161, 2005.
[8] M. Zhang, S. Ahmed, and S. Kim, “Iterative mmse-based soft mimo
detection with parallel interference cancellation,” IET Communications,
vol. 11, no. 11, pp. 1775–1781, 2017.
[9] N. Benvenuto and P. Bisaglia, “Parallel and successive interference
cancellation for mc-cdma and their near-far resistance,” in Vehicular
Technology Conference, 2003. VTC 2003-Fall. 2003 IEEE 58th, vol. 2,
pp. 1045–1049, IEEE, 2003.
[10] T. M. Aamodt, W. W. L. Fung, and T. G. Rogers, “General-purpose
graphics processor architectures,” Synthesis Lectures on Computer Ar-
chitecture, vol. 13, no. 2, pp. 1–140, 2018.
[11] M. Al-Imari, P. Xiao, M. A. Imran, and R. Tafazolli, “Uplink non-
orthogonal multiple access for 5g wireless networks,” in Wireless
Communications Systems (ISWCS), 2014 11th International Symposium
on, pp. 781–785, IEEE, 2014.
[12] T. Manglayev, R. C. Kizilirmak, and Y. H. Kho, “Optimum power
allocation for non-orthogonal multiple access (NOMA),” in Application
of Information and Communication Technologies (AICT), 2016 IEEE
10th International Conference on, pp. 1–4, IEEE, 2016.
[13] D. Tse and P. Viswanath, Fundamentals of wireless communication.
Cambridge university press, 2005.
[14] J. G. Andrews and T. H. Meng, “Optimum power control for succes-
sive interference cancellation with imperfect channel estimation,” IEEE
Transactions on Wireless Communications, vol. 2, no. 2, pp. 375–383,
2003.
[15] T. Manglayev, R. C. Kizilirmak, Y. H. Kho, N. Bazhayev, and I. Lebedev,
“NOMA with imperfect SIC implementation,” in Smart Technologies,
IEEE EUROCON 2017-17th International Conference on, pp. 22–25,
IEEE, 2017.
[16] A. Anwar, B.-C. Seet, and X. J. Li, “PIC-based receiver structure for
5G downlink NOMA,” in Information, Communications and Signal
Processing (ICICS), 2015 10th International Conference on, pp. 1–5,
IEEE, 2015.
[17] P. Watcharawitch and S. Moore, “Jma: the java-multithreading ar-
chitecture for embedded processors,” in Computer Design: VLSI in
Computers and Processors, 2002. Proceedings. 2002 IEEE International
Conference on, pp. 527–529, IEEE, 2002.
[18] Y. Chan, A. Wellings, I. Gray, and N. Audsley, “A distributed stream
library for java 8,” IEEE Transactions on Big Data, vol. 3, no. 3,
pp. 262–275, 2017.
Second International Conference on Computing and Network Communications (CoCoNet’18) 77

Comparison of Parallel vs. Successive Interference Cancellation for NOMA

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Comparison of Parallel vs. Successive Interference Cancellation for NOMA

Загружено:

Авторское право:

Доступные форматы

Comparison of Parallel and Successive Interference

Cancellation for Non-Orthogonal Multiple Access

Second International Conference on Computing and Network Communications (CoCoNet’18) 75

PIC is found suitable to be implemented on parallel threads 4

76 Second International Conference on Computing and Network Communications (CoCoNet’18)

Second International Conference on Computing and Network Communications (CoCoNet’18) 77

Вам также может понравиться