Академический Документы
Профессиональный Документы
Культура Документы
Abstract—Non-orthogonal multiple access (NOMA) is a challenging decoding problem in NOMA networks, it has
promising method for the fifth generation (5G) cellular networks some drawbacks. Firstly, it carries dependency on correct
as it provides improved spectral efficiency by multiplexing users decoding during each iteration, i.e., an error made in one
in power domain. One key challenge for the receivers in NOMA
networks is to distinguish the individual signals that use the iteration may propagate the other ones. Secondly, user signals
same band at the same time. Currently, the two widely dis- have to wait to be decoded until the decoding of signals
cussed decoding schemes are successive interference cancellation of previous users are complete. Another drawback of SIC is
(SIC) and parallel interference cancellation (PIC). Both schemes that for correct decoding, power allocation between the user
suppress the multi-user interference by subtracting the decoded signals should be accurately defined. Even though SIC with
signals from the received signal based on different algorithms, i.e.,
SIC decodes iteratively and PIC decodes collectively. This paper NOMA is being promised, it remains impractical for massive
compares the computation time of SIC and PIC schemes at the deployment.
base station and demonstrates multi-thread implementation of In contrast, when PIC is employed, the receiver decodes
PIC. each user signal at the same time in parallel. Its plain architec-
Index Terms—5G, NOMA, SIC, PIC, Parallel programming, ture and ease of implementation overcome the aforementioned
GPU, CUDA
deficiencies for SIC. In PIC, removing multi-user interference
and decoding the signal are conducted in parallel [7] [8] [9].
I. I NTRODUCTION
When we compare SIC and PIC schemes, it is clear that
Developments in information technologies and growth of decoding of each user signal in SIC is mutually dependant,
mobile devices put new requirements on wireless commu- while for PIC, decoding for each user is independent of each
nications. Major requirements put upon 5G are caused by other and cancellation tasks may be distributed to multi-core
popularity of internet of things and mobile internet. Modern processors. Multi-core processors run programs concurrently
consumers of wireless communication services such as vir- and independently of each other. One task may be split into
tual reality, augmented reality and e-health services demand sub-tasks equal to number of cores and final results from each
enhanced broadband, massive machine-type communications, core are gathered. Results of sub-tasks must be independent
low latency and spectrum efficiency [1] [2] [3]. System re- of each other so that they can be run separately.
sources are used to reach high spectrum efficiency with several If number of users in a NOMA system grows, heavy
different multiple access schemes. These are multiple access interference cancellations become time consuming. Such com-
schemes implemented in different standards; time division plex computations may be optimized by running parallel sub
multiple access (TDMA), frequency division multiple access tasks on graphical processing unit (GPU), rather than central
(FDMA), code division multiple access (CDMA) implemented processing unit (CPU). GPU is widely used to solve various
in the third generation (3G) and orthogonal frequency division general purpose problems as it dedicates a larger fraction of
multiple access (OFDMA) used in the latest fourth generation hardware resources to computation than CPU does [10].
(4G) networks. In this work, our aim is to compare computation time of
In NOMA, which is a candidate multiple access scheme, SIC and PIC, and show that structure of PIC enables faster
signals are multiplexed in power domain where individual execution than SIC. In reaching our aim, we also demonstrate
signals are distinguished by their power levels. The receiver multi-threaded implementation of PIC. Multi-threading allows
decodes the signals by either successive interference cancel- us to use hardware more efficiently and run the work even
lation (SIC) or parallel interference cancellation (PIC) [4] [5] faster. Multi-threading comparison is done on CPU with Java
[6]. programming language and on GPU with CUDA architecture.
When SIC is employed, the receiver decodes user signals The rest of the paper is organized as follows. In Section II
iteratively by subtracting each decoded signal from the re- we introduce SIC and PIC decoders, then in Section III we
ceived signal. Although, SIC has a potential to solve the present our numerical results and finally we conclude our work
978-1-5386-5928-1/18/$31.00 2018
c IEEE 74
in Section IV.
II. S YSTEM M ODEL
In this paper, we consider uplink transmission for a single
cell NOMA network. In the network, all users transmit signals
at the same time and in the same band simultaneously via
different paths. For K user equipments (UEs), the received
Fig. 1. Block diagram of a typical SIC scheme.
signal by the base station (BS) can be written as
K
y(t) = gk ( ak PT xk (t)) + n(t) (1) K
k=1 xk (t) = y(t) − k=1 sˆi ,
k=i (2)
where gk is the channel attenuation coefficient between UE i = 1, 2, ..., k − 1, k + 1, ..., K.
k and the BS, xk (t) is the modulated waveform transmitted
from UE k, ak is power coefficient for UE k assigned by the In (2), decoded messages for K UEs (except the desired kth
BS, PT is the overall power for signals of all UEs and n(t) UE) are summed and subtracted from the received signal y(t).
is the additive white Gaussian noise (AWGN) at the BS with Then the result is decoded and message for UE k is obtained.
zero mean and density N0 (W/Hz). As can be seen, this summing operations and then subtractions
may take place in parallel [16].
A. Power Allocation
In SIC, order of decoding depends on the power allocation III. N UMERICAL R ESULTS
between each UEs. In other words, the user signal with the In this section, we compare the computation time of PIC
highest received power is decoded first. In [11], spectral and SIC schemes for different number of UEs in a single cell
efficiency of the network is maximized by optimizing the NOMA network. We considered single carrier transmission
power allocation among UEs in NOMA uplink channels. In with QPSK and 16-QAM and maximum likelihood decoding
this work, we also optimize the power allocation coefficients so that higher modulation order requires higher decoding time.
αk ’s in decoding the received signals with SIC. The same Maximum available power for UEs PT = 23 dBm. Frequency
methodology described in [12] is followed. In contrast, in PIC, is 1 GHz and users are distributed randomly in the coverage
we assume that power is equally distributed among UEs and area of the cell. Channel gains are obtained with Okumura-
αk ’s are set equal to each other. 1 Hata propagation model. The noise density N0 is taken as
10−17 W/Hz. Calculations were done on a machine with
B. Successive Interference Cancellation
following hardware parameters: CPU Intel(R) Xeon(R) CPU
Fig. 1 shows the block diagram of a SIC receiver. In SIC, E5620 @ 2.40GHz, 5GB RAM Memory 1333 MHz DDR3,
BS has the knowledge of power allocation coefficients (αk ’s) NVIDIA TITAN Xp with 3840 CUDA Cores and Boost Clock
and channel gains (gk ’s) for each UE. The first decoded signal of 1582 MHz. Simple tic toc MATLAB functions were used
by the BS would be the signal for UE 1 and the remaining part for results in Fig. 3. Version 8 of Java programming language
is considered as multi-user interference. BS then subtracts the for running PIC via streams in parallel threads on multi cores
decoded signal from the received signal and the signals of the in Fig. 4. Java allows running parallel tasks using threads,
rest UEs are decoded in the same manner. Iterations continue which may be assigned per processor [17]. Version 8 of Java
until signals of all UEs are decoded. Order of decoding is introduced streams and parallel streams, which enable working
defined by the power allocation among UEs [13] [14] [15]. with collections of different object types and manipulate them
Power allocation coefficients are linearly dependent on the in parallel [18]. Finally, CUDA tool to measure elapsed time
distance from UEs to BS. The data rate for closer UEs and of events running on GPU device in Fig. 5.
with lower transmit power is less than for farther but with
higher transmit power [12].
C. Parallel Interference Cancellation
Fig. 2 shows the block diagram of a PIC receiver. PIC
decodes for each UE from the received composite signal at
the same time. In order to cancel the multi-user interference,
BS then subtracts each estimated message from the received
signal as
1 Power allocation for PIC-NOMA can also be optimized. However, it
should be noted that this work compares the computation time of SIC and
PIC schemes where the power allocation coefficients do not have any impact.
Still, we mention power allocation here since it is one of the most significant
factors in NOMA networks. Fig. 2. Block diagram of a typical PIC scheme.
UEs makes SIC execution time even longer as the number SIC-QPSK
PIC-QPSK
of UEs grows, whereas PIC scheme decoding time changes SIC-QAM-16
200
PIC-QAM-16
marginally. PIC execution time is twice faster than SIC ex-
ecution time for 1000 UEs and it reaches approximately 5
fold difference for 2000 UEs. Execution time of modulation 150
Time (ms)
schemes QPSK and QAM-16 do not differ much.
100
12
SIC-QPSK
PIC-QPSK 50
10 SIC-QAM-16
PIC-QAM-16
8 0
500 1000 1500 2000 2500
Time (ms)
Number of UEs
6
Fig. 4. Computation time of SIC and PIC on multiple threads for QPSK and
16-QAM modulation schemes.
4
2 10
9
PIC-QPSK
0 SIC-QPSK
0 200 400 600 800 1000 1200 1400 1600 1800 2000 8
PIC-16-QAM
Number of UEs SIC-16-QAM
7
Fig. 3. Computation time of SIC and PIC for QPSK and 16-QAM modulation 6
Time (ms)
schemes.
5