Date: 04-01-2008
Version: v0.50
Status: Preliminary
Type: CTO Office Technical Note
ID: CTO-TN-0009
Table of Contents
Table of Contents............................................................................................................................................. 2
Version History ................................................................................................................................................ 2
1 Introduction to picoArray ......................................................................................................................... 3
2 Introduction to PRACH ............................................................................................................................. 5
3 Generation of RACH Preamble ................................................................................................................ 6
4 Detection of RACH Preamble ................................................................................................................... 7
4.1 RACH Preamble Receiver structure.................................................................................................... 7
4.2 Complexity Analysis............................................................................................................................. 8
4.3 Consideration of one antenna ............................................................................................................. 9
4.4 Implementation on PC20x ................................................................................................................. 10
4.4.1 Detection Method ....................................................................................................................... 10
4.4.2 Throughput calculations............................................................................................................. 10
4.4.3 Resource Estimation.................................................................................................................. 11
References ..................................................................................................................................................... 11
5 Glossary ................................................................................................................................................... 12
Version History
Version
Date
Author(s)
v0.10
25-Oct-2007
Yu Huai
Created
v0.20
03-Jan-2008
Yu Huai
Re-write
v0.30
10-Jan-2008
Yu Huai
V0.40
20-Jan-2008
Yu Huai
Re-construct after
general review
v0.50
01-Apr-2008
Yu Huai
Page 2 of 12
1 Introduction to picoArray
The picoArray is a multi-processor IC which integrates hundreds of processing elements into a single array.
The individual elements have been optimized for signal processing and wireless algorithm computation and
control. The result is a general purpose wireless communications processor, capable of executing all
contemporary wireless standards, which combines the computational density of a dedicated ASIC with the
programmability of a traditional high-end Digital Signal Processor (DSP). Details of picoArray can be found in
[3]. For the time being there are two main picoArray DSP products: PC102 and PC20x (PC202, PC203 and
PC205).
The PC102[6] contains four different types of array elements (AEs) which are detailed in Table 1, three of
which are programmable and the fourth is a configurable hardware accelerator unit. Minor differences exist
between the three programmable AE types (STAN2, MEM2 and CTRL2). These differences include the size
of instruction/data memory, additional processing units and instructions supported (e.g. multiply-accumulate,
multiply). Each AE can issue a long instruction word (LIW) of up to 64 bits into up to 3 execution units in a
single cycle (at 160 MHz). Each AE communicates with other AEs within the array over a bus which is
connected to by several ports.
In addition to the STAN2, MEM2 and CTRL2 AE types specified in Table 1, software for the PC102 can also
be targeted at the ANY2 AE type implying that: (1) the function does not use any AE-specific instructions and
(2) the code and data memory requirements can be met by all AE types.
Software, written in C or ASM, is targeted at an AE type depending on the processing units used and
memory required.
Page 3 of 12
Description
Number
Memory
(bytes)
STAN2
Standard.
240
768
64
8,704
14
n/a
65,536
322
1,003,520
Memory
An AE having multiply unit and additional memory.
Memory division between code and data is configurable.
FAU
CTRL2
Control
An AE type with a multiply unit and larger amounts of data
and instruction memory optimized for the implementation of
base station control functionality.
Memory division between code and data is configurable.
Totals per PC102 device:
The picoBus is the name given to the switching fabric running vertically and horizontally between the
processing elements in the array. AEs are assigned 32-bit slots on the picoBus at compile time thereby
removing the need for arbitration and making performance completely deterministic. Each AE communicates
over the picoBus via its ports. These are defined using picoVHDL. Each AE has a number of ports which can
be configured to be read (incoming) or write (outgoing). Data sent between AEs is:
1. Written to a write port FIFO by the sending AE.
2. Sent over the picoBus on the next available slot.
3. Read from the read port FIFO by the receiving AE.
By default, communication between AEs is data blocking. On an attempt to read data from the picoBus, an
AE will block until data becomes available in the read port FIFO. Similarly, when attempting to write data to
the picoBus, the sending AE will block if its write port FIFO is full. A full write port FIFO infers that the
receiving AEs read port is not taking data (i.e., is full itself).
Bandwidth on the picoBus between communicating AEs is assigned via @-rates. A signal is assigned an @rate which is a positive integer power of 2, e.g., @8, @16. The @-rate is defined in the port declarations in
both the sending and receiving AEs. This @-rate is relative to the system clock (160 MHz for the PC102 and
Page 4 of 12
PC20x) and indicates how often data may be sent. For example, @8 means that a 32-bit data value can be
sent every 8 cycles (of the 160 MHz bus). The receiving AEs must therefore issue a read (against the
associated port) once every 8 cycles in order to prevent the sending AE from blocking.
PC20x and PC102 are similar except that they have different number of AEs and different accelerators. In
Table 2 we give a brief overview of PC202, PC203 and PC205[7].
Table 2: Brief overview of PC202, PC203 and PC205
AE Type
PC20x
Number
Memory
of AEs
(bytes)
STAN
196
768
FFT/IFFT
MEM
50
8,704
Viterbi
CTRL
65,536
Turbo decoder,
Total*
248
716,800
Reed-Solomon decoder,
FAU
Cryptography accelerator
PC202 & PC205 only: ARM9 host & peripherals
* FAU AEs not included
2 Introduction to PRACH
This document discusses the structure, complexity, resource estimation of the LTE random access preamble
correlator implemented on a picoArray.
The main purpose of the random access procedure is to obtain uplink time synchronization and to obtain
access to the network.
The physical layer random access preamble, illustrated in Figure 1, consists of a cyclic prefix of length TCP
and a sequence part of length
TPRE .
CP
Sequence
TCP
TPRE
Page 5 of 12
TCP
TSEQ
3168 Ts
24576 Ts
21024 Ts
24576 Ts
6240 Ts
2 24576 Ts
21024 Ts
2 24576 Ts
448 Ts
4096 Ts
4
(frame structure type 2 only)
The details of parameters and configuration of the RACH preamble can be found in [1]. In this document, we
focus on transmission bandwidth to 5MHz, 10MHz and 20MHz.
un ( n +1)
N ZC
, 0 n N ZC 1
where N ZC is the length of the Zadoff-Chu sequence. From the u th root Zadoff-Chu sequence, random
access preambles with zero correlation zones of length N CS 1 are defined by cyclic shifts according to
Guard Interval
Signature 1
CP
ZC sequence
Signatures are generated
by Cyclic Shift
Signature 2
Signature 3
Page 6 of 12
The frequency domain scheme used to generate the RACH preamble is explained as follows.
1. ZC sequence is generated in Time Domain.
2. N ZC -point DFT is used for time to frequency domain conversion, where N ZC is the ZC sequence length
with prime number, for FDD it is 839.
3. The output of N ZC -point DFT is mapped to the assigned sub-carrier.
4.
N IDFT point IDFT is used for frequency to time domain conversion, where IDFT is used instead of IFFT
N
since the number of the samples after IDFT is not 2 in order to maintain the same system sampling rate.
5. If preamble time duration is larger than 0.8 ms, it needs repeat the output of IDFT.
6. CP insertion
DFT
Sub-carrier
Mapping
IDFT
Repeat
CP
insertion
Preamble segment
Size: N IDFT
Size: N zc
Page 7 of 12
Then IFFT engines transform the cross correlation from frequency domain to time domain. The energy
detection block estimates noise power, sets the detection threshold, and makes a decision.
FFT
IFFT
RACH sequence
Signature0
( Freq. Domain)
Signature3
Signature1 Signature2
1.25MHz
128
2.5MHz
256
5MHz
512
10MHz
1024
15MHz
1536
20MHz
2048
1.92MHz
3.84MHz
7.68MHz
15.36MHz
23.04MHz
30.72MHz
128*12
=1536
256*12
=1536*2
512*12
=2048*2
1024*12
=2048*2*3
1536*12
=1536*2*2*
3
1536
2048*12
=2048*2*2*
3
2048
RACH receiver
1536
1536
2048
2048
DFT size
Polyphase Filter
1
2
3
3*2
3*2*2
Deceimation rate
From the Figure 4, the number of complex multiplications, N CML is calculated as follows.
picoChip Designs Ltd
3*2*2
Page 8 of 12
Here
6 long blocks per UL sub-frame, 2 sub-frames per 1ms TTI, and QPSK modulation results in 288
channel bits per TTI
Excessive use of repetition after channel coding is to be avoided as this consumes valuable
time/frequency resources in the cell and suggests an alternative multiple access scheme or
resource block size should be considered
Assuming time-separated transmission of RACH and UL-SCH, the transmit power available at the UE for
non-sync RACH and UL-SCH are equal.
For balance transmit power conditions between RACH and PUSCH, and similar noise-limited situations,
required RACH preamble length is obtained as [2],
Tp =
where
E p / N0
RPUSCH Eb / N 0
channel for a probability of false alarm of 1% or 0.1% respectively and for a probability of missed detection of
1e-2(see e.g. [2],[3]).
Comparing various results for the performance of the SC-FDMA uplink (e.g. [4]), it would appear that it is
st
reasonable to expect 20% BLER (a probable HARQ operating point for 1 time transmissions) at around 1dB
Es / N 0 per receive antenna in a TU channel using a 1/3 rate turbo code with realistic channel estimation.
Page 9 of 12
antennas).
Given
this result in a 18.8dB, its larger than the required 18dB, so we can use one antenna for RACH receiver in
2x2 MIMO systems.
For lower data rate, the longer RACH preamble sequence is used for RACH detector, its also satisfied with
above discussion.
1.28MHz
2.56MHz
5MHz
10MHz
15MHz
20MHz
1.92MHz
3.84MHz
7.68MHz
15.36MHz
23.04MHz
30.72MHz
@83.33
@41.67
@20.83
@10.42
@6.94
@5.20
@83.33
@41.67
@20.83
@10.42
@6.94
@5.20
@83.33
@83.33
@41.67
@83.33
@20.83
@62.50
@10.42
@62.50
@6.94
@83.33
@5.20
@62.50
@83.33
@94.07
@94.07
@94.07
@94.07
@172.0
@172.0
@172.0
@172.0
@172.0
@83.33
@94.07
@94.07
@94.07
@94.07
@172.0
@172.0
@172.0
@172.0
@172.0
@62.50
@70.50
@70.50
@70.50
@70.50
@172.0
@172.0
@172.0
@172.0
@172.0
@62.50
@70.50
@70.50
@70.50
@70.50
@172.0
@172.0
@172.0
@172.0
@172.0
@83.33
@94.07
@94.07
@94.07
@70.50
@172.0
@172.0
@172.0
@172.0
@172.0
@62.50
@70.50
@70.50
@70.50
@70.50
@172.0
@172.0
@172.0
@172.0
@172.0
Note 1: number of cycle per input sample is 160/(system sampling rate), where 160 is core clock rate of the
pA.
Page 10 of 12
Note 2: Polyphase filters are used to as filters and decimation. The result of decimation is constant length
2048 point, for different transmission bandwidth, the decimation rate is different. For 1.28MHz, theres not
polyphase filters.
Note 3: Due to decimation operation, the throughput of FFT engine is same as 2.56MHz, the CP ratio is
3168/24576, so the throughput of the FFT engine is 2.56*(24576)/(24576+3168) = 2.267MHz. So the
number of cycles per sample is 160/2.267 = 70.50 cycles. The top slot rate is input data rate, the bottom slot
rate is output data rate.
Note 4: The number of used data sub-carrier of RACH preamble is 839, after 2048-point FFT, extracting
sub-carriers from frequency domain, the throughput of the input of the multiplied by each signature block is
2.267*839/2048=0.929MHz, the number of cycles per sample is 160/0.929 = 172.0 cycles. The latecy of
RACH receiver is computed as following: the 2048 point FFTs latency is about 140 us, 1024 point IFFTs
latency is about 70 us, the latency of other block is about 300 us. For 64 preamble sequences generated by
cyclic shifts of the same ZC sequence, the latency is 140us+70us+300us = ~510us. For the worst case, the
latency is 140us+(70us)*64+300us=~5ms.
Sub-carrier
de-mapping
Zadoff-Chu
mother
sequence
Cross corelator
839 point IDFT
Energy
Detection
1.28MHz
2.56MHz
5MHz
10MHz
15MHz
20MHz
1 Mems
1 Mems
1 Mems
1 Mems
1 Mems
1 Mems
6 STANs
5 STANs
6 STANs
7 STANs
7 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
512-point
accelerator
1 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
512-point
accelerator
1 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
1024-point
accelerator
1 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
1024-point
accelerator
1 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
512-point
accelerator
1 STANs
1 STANs
1 STANs
2 ANYs
3 MEMs
1024-point
accelerator
1 STANs
1 MEMs
1 STANs
1 MEMs
1 STANs
1 MEMs
1 STANs
1 MEMs
1 STANs
1 MEMs
1 STANs
1 MEMs
1 STANs
1 STANs
1024-point
accelerator
1 MEMs
1 STANs
1024-point
accelerator
1 MEMs
1 STANs
1024-point
accelerator
1 MEMs
1 STANs
1024-point
accelerator
1 MEMs
1 STANs
1024-point
accelerator
1 MEMs
1 STANs
1024-point
accelerator
1 MEMs
Note 1: Linear filters resource reference to wimax PC8530 decimation filters implementation.
Note 2: For 2048 point FFT, theres not directly accelerator, 3MEMs 1 STANs and 2 ANYs and 1024 point
HW accelerator are need. For 1536 point FFT, 3MEMs 1 STANs and 2 ANYs and 512 point HW accelerator
are need.
References
1.
2.
R1-060998 E-UTRA Random Access Preamble Design, Ericsson, RAN WG1 #44bis, Athens, Greece,
27-31 March 2006
Page 11 of 12
3.
R1-062306 RACH Sequence Extension Methods for Large Cell deployment, LGE, RAN WG1 #46,
th
st
Tallinn, Estonia, 28 Auguest-1 September 2006
4.
R1-051073 Performance Comparison of Distributed FDMA and Localised FDMA with Frequency
th
th
Hopping for EUTRA Uplink, NEC Group, RAN WG1 #42bis, San Diego, USA, 10 -14 October 2005
5.
Digital Signal Processing Principles, Algorithms, and Applications, Fourth Edition, Jhon G. Proakis
6.
PC102 datasheet
7.
PC20x datasheet
5 Glossary
CP
DFT/IDFT
Es
Eb
FFT/IFF
LTE
No
RACH
OFDM
picoArray
PUCCH
PUSCH
QPSK
TU channel
Ts
ZC
Cyclic prefix
Fourier transform
energy per symbol
energy per bit
Fast Fourier transform/Inverse Fast Fourier transform
Long Term Evolution
Noise power spectral density
Random Access channel
Orthogonal Frequency Division Multiplexing
picoChip Designs Limited proprietary array processing architecture
Physical Uplink Control channel
Physical Uplink Shared channel
Quadrature Phase Shift Keying
Typical Urban channel
A number of time units Ts=1/(15000x2048) seconds
ZadChu
Page 12 of 12