Академический Документы
Профессиональный Документы
Культура Документы
0
Capacity Monitoring Guide
Issue 03
Date 2012-11-07
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Purpose
Traffic on a mobile telecommunications network, especially a new network, increases by the
day. To support the increasing traffic, more and more resources are required, such as signaling
processing resources, transmission resources, and air interface resources.
If any type of network resource is insufficient, user experience is affected (for example, the
call drop rate increases). This means that real-time resource monitoring, timely resource
bottleneck detection, and proper network expansion are critical to good user experience on a
mobile telecommunications network. This document describes how to monitor usage of
various network resources, locate network resource bottlenecks, and perform network
expansion in a timely manner.
Guidelines provided in this document are applicable to BSC6900 and BTS3900 series base
stations, but can only be used as references for RNCs and NodeBs of earlier versions.
Audience
This document is intended for network maintenance personnel.
Organization
This document consists of the following chapters.
Chapter Description
1 Network Resource Describes basic concepts associated with network resources, including definitions
Monitoring Methods and monitoring activities.
2 Network Resource Describes various network resources.
Counters
3 HSPA Related Describes how to monitor network resources when HSPA is enabled.
Resources
4 Diagnosis of Problems Provides fault analysis and locating methods that experienced WCDMA network
Related to Network maintenance personnel can use to handle network congestion or overload events
Resources efficiently.
Chapter Description
5 Counter Definitions Lists all performance counters mentioned in the other chapters. These counters
help in monitoring network resources and designing resource analyzing
instruments.
Change History
Changes between document issues are cumulative. Therefore, the latest document issue
contains all changes made in previous issues.
03 (2012-11-07)
This is the third commercial release of RAN 14.0.
Compared with issue 02 (2012-06-30), this issue incorporates the following changes:
Updated section 2.12 CNBAP Load of the NodeB Main Processing and Transmission
Unit (WMPT/UMPT).
02 (2012-06-30)
This is the second commercial release of RAN 14.0.
Compared with issue 01 (2012-04-30), this issue incorporates the following changes:
Update the formula for calculating CE usage, replace the NodeB counter with RNC
Counter.
Add MPU part.
Adjust SPU,DPU,Interface board threshold.
Adjust the document structure.
01 (2012-04-30)
This is the first commercial release of RAN 14.0.
Compared with issue Draft A (2012-02-15), this issue optimizes the description.
Draft A (2012-02-15)
This is the draft for RAN14.0.
Contents
There are two methods of monitoring system resources and detecting resource bottlenecks:
Prediction-based monitoring: This is a proactive approach wherein various network resources
are monitored simultaneously.
You can monitor usage of a network resource (for example, the downlink transmit power of a
cell), predict the resource usage trend and impacts, and determine whether to perform network
expansion after comparing the detected resource usage with a preset upper threshold. After
detecting that usage of a resource is higher than its upper threshold for a long time (for
example, a cell remains overloaded during busy hours for several consecutive days), you can
split the cell or add carriers for network expansion. This approach, which applies to daily
resource monitoring, is easy to implement and can be used to determine high-load cells and
RNCs. This chapter describes the procedure for monitoring network resources.
NOTE
For details on network resources, see chapter 2 "Network Resource Counters." For details on
HSPA-associated resources, see chapter 3 "HSPA Related Resources."
NOTE
In addition to the preceding two methods, other methods may also be used by network maintenance
engineers for system problem analysis.
For a newly constructed network, you can monitor only one resource. Once detecting that this
resource exceeds its upper threshold, check whether other resources exceed their upper
thresholds.
If yes, the cell or NodeB is overloaded. Perform network expansion.
If no, the cell or NodeB is not necessarily overloaded. In this case, network expansion is
not mandatory and the problem can be solved by other adjustments or optimizations.
For example, the CE usage is more than 70% but the usages of other resources such as RTWP,
TCP, and OVSF codes are within their allowed ranges. In this case, CE resources are
insufficient but the cell is not overloaded. To solve the problem in this example, configure
licenses allowing more CEs or add baseband processing boards, instead of performing
network expansion immediately.
As shown in Figure 1-2, an SPU is overloaded if its CPU usage is 50% to 60%, regardless of
other resource usages.
This flowchart is applicable to most resource monitoring scenarios, except when the system
overload is due to an unexpected event, but not a service increase. Unexpected events are not
considered in this flowchart.
Causes for unexpected events can be located based on their association with various resource
bottlenecks. For details on how to locate a resource-related problem, see chapter 4 "Diagnosis
of Problems Related to Network Resources."
Various counters are defined to represent the resource usage or load of a UTRAN system. In
addition, upper thresholds for these counters are predefined.
Identifying the busy hour is a key to accurate counter analysis. There are various methods of
identifying the busy hour. The simplest one is to take the hour when the most resources are
consumed as the busy hour.
The mean SPU resource usage (SPU CPU load) is indicated by the counter
VS.XPU.CPULOAD.MEAN expressed in percentage.
It is recommended:
If the SPU CPU usage is over 50% in the busy hour for three consecutive days in one
week, add SPUs as required.
If the SPU CPU usage is over 60% in the busy hour for three consecutive days in one
week, take emergency expansion measures.
Figure 2-3 Relationship between RTWP, noise increase, and uplink load
Generally, the uplink load threshold is 75% and the corresponding RTWP is smaller than 100
dBm. The corresponding equivalent number of users (ENU) ratio should be smaller than 75%
if the power-based admission decision is based on algorithm 2 (the algorithm for the ENU).
If the RTWP value is larger than 100 dBm, the cell is overloaded in the uplink direction.
Generally, if a cell is overloaded or the RTWP value is too large, the cell coverage decreases,
live service quality declines, or new service requests are rejected.
Huawei RNCs support the following RTWP and ENU counters:
VS.MeanRTWP: mean RTWP in a cell (unit: dBm)
VS.MinRTWP: minimum RTWP in a cell (unit: dBm)
VS.RAC.UL.EqvUserN: uplink mean ENU on all dedicated channels in a cell
UlTotalEqUserNum: maximum ENU that is configured by the ADD UCELLCAC
command.
UL ENU Ratio = VS.RAC.UL.EqvUserNum/UlTotalEqUserNum
In some areas, the background noise increases to more than 106 dBm due to other
interference or hardware faults (for example, poor quality of antennas or feeder connectors).
In this case, the VS.MinRTWP counter value (RTWP when the cell carries no traffic) is
considered the background noise.
If the VS.MinRTWP value is larger than 100 dBm or smaller than 110 dBm in the idle hour
for three consecutive days in one week, there are hardware faults or external interference.
Locate and rectify the faults.
Normally, VS.MeanRTWP is used as the cell capacity indicator. If the VS.MeanRTWP value
is higher than 100 dBm (corresponding to a 6 dB noise increase or 75% load) or the uplink
ENU ratio is higher than 75% in the busy hour for two or three days in one week, the cell is
regarded as heavily loaded.
When the cell is heavily loaded, perform capacity expansion operations such as adding a
carrier or increasing the UlTotalEqUserNum values.
2.7 CE Usage
CE resources are baseband resources in a NodeB. One CE is the resources consumed by a
12.2 kbit/s voice call. If a new call arrives but there are not enough CEs (not enough baseband
processing resources), the call will be blocked.
CE resources are managed and shared at the NodeB level (note that 850 MHz and 1900 MHz
cells cannot share CEs with each other, because the cells belong to different license groups).
The total available CE resources are limited by both the installed hardware and the configured
software licenses. If the hardware resources in the current installation are sufficient and the
CEs are only limited by licenses, then the corrective action is to modify the license file to
expand the cell capacity.
The usage metric can also be used to monitor CE resources. Once the CE usage is consistently
higher than the threshold 70%, the NodeB is overloaded, with respect to CE usage. CE
expansion is required.
Since separate baseband processing units are used in the uplink and downlink, CE
management is also separate for the uplink and downlink. CE usage for the uplink and
downlink is defined as:
NodeB_UL_CE_MEAN_RATIO = UL Mean CE Used Number / UL NodeB CE Cfg Number
If VS.NodeB.ULCreditUsed.Mean>0, it indicates that CE OVERBOOKING feature is
available, then UL Mean CE Used Number= VS.NodeB.ULCreditUsed.Mean/2 ,otherwise
UL Mean CE Used Number = Sum_AllCells_of_NodeB(VS.LC.ULCreditUsed.Mean/2),
VS.LC.ULCreditUsed.Mean counts usage of UL Credit for cell, /2 is for the uplink credit
number is twice the number of uplink CEs, and the downlink credit number is equal to the
number of downlink CEs.
UL NodeB CE Cfg Number = MIN(NodeB License UL CE Number, NodeB Physical UL CE
Capacity)
NodeB_DL_CE_MEAN_RATIO = DL Mean CE Used Number / DL CE Cfg Number
Where,
DL Mean CE Used Number = Sum_AllCells_of_NodeB(VS.LC.DLCreditUsed.Mean),
VS.LC.DLCreditUsed.Mean counts usage of DL Credit for cell.
DL CE Cfg Number = MIN(NodeB License DL CE Number, NodeB Physical DL CE
Capacity)
There is one RACH channel in a cell. When both signaling and traffic existRACH Utility
Ratio could be calculated as follow
where
VS.RadioLink.Recv.Mean: indicates the average number of wireless connection
receptions per second. It is a NodeB counter.
VS.DedicMeaRpt.MEAN: indicates the average number of dedicated measurement
reports per second. It is also a NodeB counter.
SP: indicates the measurement period. It is expressed in minutes.
CNBAP Capacity of NodeB: depends on the configurations of main processing and
transmission units, baseband processing boards, and extended transmission boards.
Note: Generally, VS.DedicMeaRpt.MEAN can be ignored. In the second formula,
VS.DedicMeaRpt.MEAN/12 is used for equivalent conversion.
The soft handover factor is a cell-level counter.
Soft handover factor =
((<VS.SHO.AS.1RL> + <VS.SHO.AS.2RL> + <VS.SHO.AS.3RL> + <VS.SHO.AS.4RL> +
<VS.SHO.AS.5RL> + <VS.SHO.AS.6RL>)/(<VS.SHO.AS.1RL> + <VS.SHO.AS.2RL> /2 +
<VS.SHO.AS.3RL> /3 + <VS.SHO.AS.4RL> /4 + <VS.SHO.AS.5RL> /5 +
<VS.SHO.AS.6RL> /6)) 1
VS.SHO.AS.1RL: Mean Number of UEs with One Radio Link for Cell; RNC counter
VS.SHO.AS.2RL: Mean Number of UEs with Two Radio Links for Cell; RNC counter
VS.SHO.AS.3RL: Mean Number of UEs with Three Radio Links for Cell; RNC counter
VS.SHO.AS.4RL: Mean Number of UEs with four Radio Links for Cell; RNC counter
VS.SHO.AS.5RL: Mean Number of UEs with five Radio Links for Cell; RNC counter
VS.SHO.AS.6RL: Mean Number of UEs with six Radio Links for Cell; RNC counter
The NodeB soft handover factor equals the average soft handover factor of all cells under the
NodeB.
If the CNBAP Load Ratio is higher than 60% in the busy hour for three consecutive days in
one week, the main processing and transmission unit is becoming overloaded. If this happens,
add a baseband processing board or an extended transmission board, or split the NodeB.
High Speed Packet Access (HSPA) includes High Speed Downlink Packet Access (HSDPA)
and High Speed Uplink Packet Access (HSUPA). HSDPA and HSUPA functionalities are part
of the WCDMA standard. HSPA uses technologies such as fast scheduling, adaptive
modulation, and hybrid automatic repeat request (HARQ) to transport data at high speed.
HSPA carries PS data. As conversational services are prioritized over PS data, HSPA uses
system resources only after conversational services are served. This chapter looks into how to
efficiently use the system resources by means of HSPA without changing the existing pattern
for resource allocation.
3.1 HSDPA
3.1.1 Power Resources
Figure 3-1 illustrates how the downlink transmit power of a cell is allocated. The dashed line
indicates the total downlink transmit power of a cell.
Power for CCH: This portion of power is allocated to common transport channels (CCHs) of
the cell such as the broadcast channel, pilot channel, and paging channel.
Power margin: This portion of power is not allocated. The power margin is reserved to ensure
that the system can remain stable even if the UE position or environment changes.
Power for DPCH: This portion of power is allocated to real-time services (voice and video
calls) and PS R99 services, and varies with the number and locations of users. RNCs and UEs
can adjust power for DPCH based on the power control algorithm.
Power for HSPA: This portion of power is allocated to HSDPA and is calculated as follows:
HSDPA user power = Maximum cell transmit power (Power for CCH + Power margin +
Power for DPCH)
HSPA power schedulers are designed primarily to make the most of available power.
In an HSDPA-enabled cell, TCP is still monitored to see if the system is overloaded in the
downlink. TCP thresholds for this cell are the same as those for a cell without HSDPA. With
HSDPA, downlink power overload affects HSDPA performance before it affects
conversational services.
3.2 HSUPA
3.2.1 CE Resources
HSUPA channels are dedicated channels, and resource consumption of HUSPA services is
measured by CE. UL CEs are shared between R99 services and HSUPA services.
HSUPA improves user experience and uplink throughput, but also consumes more uplink CE
overhead for hybrid automatic repeat requests (HARQ) and soft handovers. This means that
uplink CE resources may become a system bottleneck. Therefore, uplink CE usage needs to
be monitored when HSUPA is enabled.
Huawei NodeBs support dynamic HSUPA CE management.
3.2.2 RTWP
Similar to HSDPA, which is designed to make the most of the downlink power, HSUPA is
designed to make the most of uplink capacity margin. HSUPA is always authorized to send
data until the RTWP rises to 6 dBm.
HSUPA provision increases uplink data throughput but also consumes a large amount of
uplink RTWP, which is monitored in the same way regardless of whether HSUPA is
provisioned. If RTWP overload occurs, rates of HSUPA services must be lowered to ensure
QoS of conversational services.
The preceding chapters describe the basic methods of monitoring network resources. These
methods can be used to resolve most problems caused by high resource usage. In certain
scenarios, further analysis is required to determine whether high resource usage is caused by a
traffic increase or other exceptions.
This chapter describes how to diagnose problems related to network resources. This chapter is
intended for experts who have a deep understanding of WCDMA networks.
Figure 4-1 Call flowchart where possible block and failure points are marked
The call flow, which uses a mobile-terminated call as an example, is described as follows:
Step 1 The CN sends a paging message to the RNC.
Step 2 Upon receipt of the paging message, the RNC broadcasts the message on a PCH. If the PCH
is congested, the RNC may drop the message. See block point #1.
Step 3 The UE cannot receive the paging message or fails to connect to the network. See failure
point # 2.
Step 4 After receiving the paging message, the UE sends an RRC connection request to the RNC.
Step 5 If the RNC is congested when receiving the RRC connection request, the RNC may drop the
request. See block point #3.
Step 6 If the RNC receives the RRC connection request and does not drop it, the RNC determines
whether to accept or reject the request. The request may be rejected due to insufficient
resources. See block point #4.
Step 7 If the RNC accepts the request, the RNC instructs the UE to set up an RRC connection. The
RRC connection setup may fail, the UE does not receive the instruction, or the UE receives
the message but finds the configuration information to be incorrect. See failure points #5 and
#6.
Step 8 After the RRC connection is set up, the UE sends NAS messages to negotiate with the CN
about service setup. If the CN determines to set up a service, the CN sends an RAB
assignment request to the RNC.
Step 9 The RNC accepts or rejects the RAB assignment request based on the resource usage on the
RAN side. See block point #7.
Step 10 If the RNC accepts the RAB assignment request, the RNC initiates an RB setup process.
During the process, the RNC sets up transmission resources over the Iub interface by setting
up a radio link (RL) to the NodeB, and sets up channel resources over the Uu interface by
sending an RB setup message to the UE. A failure may occur in the RL or RB setup process.
See failure points #8 and #9.
The following is the formula for calculating the call congestion ratio:
VS.RAB.Block.Rate = Total number of congestions due to the preceding
causes/VS.RAB.AttEstab.Cell
Table 4-1 provides solutions to signaling storms. These solutions attempt to reduce signaling
loads so that the network capacity does not need to be expanded immediately.
SCRI without values iPhone (R6) Enable the enhanced fast dormancy (EFD) function for
indicating causes RNCs and add international mobile equipment
identities (IMEIs) of terminals to the whitelist.
R8 terminals with SCRI iPhone4 (after R6) Enable the R8 FD function for RNCs and add terminal
carrying values IMEIs to the whitelist.
indicating causes
Generally, an abnormal KPI initiates a troubleshooting process. Determining the top N cells
that may have problems facilitates follow-up troubleshooting.
It is recommended to analyze accessibility KPIs to identify the system bottleneck that causes
access congestion.
NOTE
CE usage in Table 4-2 assumes that the signaling radio bearer (SRB) over HSUPA feature is enabled. If
the SRB is carried on an R99 DCH independently, an extra CE is consumed by the SRB. Therefore, add
one CE to the number listed in Table 4-2.
HSDPA services do not consume downlink R99 CEs. HSUPA services and R99 services share
uplink CEs.
CE congestion or routine CE usage monitoring may trigger CE resource analysis.
If the CE resource usage is higher than a preset threshold for a period of time or CE
congestion occurs, the CE resources are insufficient and must be increased to ensure system
stability.
Cells belonging to the same NodeB share CEs and CE resources consumed by a NodeB must
be manually calculated.
Check whether CE resource congestion occurs in a resource group or an entire site. If CE
resource congestion occurs in a resource group, reallocate CEs between resource groups. If
CE resource congestion occurs in an entire site, perform site capacity expansion and
reconfigure CEs as required.
If insufficient Iub bandwidth causes congestion, check the Iub bandwidth usage.
If the Iub bandwidth usage remains higher than 80% for a certain period, it can be determined
that the Iub bandwidth is insufficient.
If no more Iub resources are available or the issue is not urgent, decrease PS activity factors
so the system admits more users. The activity factor, which is the ratio of actual bandwidth
occupied by a user to the allocated bandwidth, is used to estimate the real bandwidth needed
in admission. The activity factor can be set on a per-NodeB basis. The default activity factor
is 70% for voice services and 40% for PS BE services.
Adding carriers is the most efficient solution to insufficient uplink power. If no more carriers
are available, add more sites or tilt down antennas to spit cells.
If the SPU CPU usage is higher than 50%, advise customers to add SPU boards. If SPU CPU
usage is higher than 60%, add SPU boards immediately.
Check whether SPU subsystem loads are balanced. If they are unbalanced, adjust load sharing
thresholds so that subsystems share loads evenly.
In addition, identify root causes for the high CPU usage.
If signaling storms occur, check whether system configurations are correct or the transmission
link is interrupted. If high traffic causes the high CPU usage, add SPU boards to expand
capacity.
Figure 4-9 Process for analyzing DPU DSP and interface board CPU usage
If the DPU DSP or interface board CPU usage is higher than 60%, add DPU boards or
interface boards.
Add hardware for capacity expansion if traffic increase or unbalanced transmission
causes the high loads.
5 Counter Definitions
OVSF Usage
Counter
OVSF usage VS.RAB.SFOccupy VS.RAB.SFOccupy
OVSF usability VS.RAB.SFOccupy.Ratio VS.RAB.SFOccupy/256
ratio
DCH OVSF ratio DCH_OVSF_Utilization [(<VS.SingleRAB.SF4> + <VS.MultRAB.SF4>)
x 64 + (<VS.MultRAB.SF8> +
<VS.SingleRAB.SF8>) x 32 +
(<VS.MultRAB.SF16> +
<VS.SingleRAB.SF16>) x 16 +
(<VS.SingleRAB.SF32> +
<VS.MultRAB.SF32>) x 8 +
(<VS.MultRAB.SF64> +
<VS.SingleRAB.SF64>) x 4 +
(<VS.SingleRAB.SF128> +
<VS.MultRAB.SF128>) x 2 +
(<VS.SingleRAB.SF256> +
<VS.MultRAB.SF256>)]/256
CPU Usage
Counter
SPU usage VS.XPU.CPULOAD.MEAN VS.XPU.CPULOAD.MEAN
MPU usage VS.XPU.CPULOAD.MEAN VS.XPU.CPULOAD.MEAN
VS.INT.TRANSLOAD.RATIO.MEA VS.INT.TRANSLOAD.RATIO.MEAN
N
NodeB CPU VS.BRD.CPULOAD.MEAN VS.BRD.CPULOAD.MEAN
usage
Credit Usage
Counter
VS.NodeB.ULCreditUsed.Mean if VS.NodeB.ULCreditUsed.Mean>0
VS.LC.ULCreditUsed.Mean Sum_AllCells_of_NodeB(VS.NodeB.ULCreditU
VS.LC.DLCreditUsed.Mean sed.Mean /2) / MIN(NodeB License UL CE
Number, NodeB Physical UL CE Capacity)
else
Sum_AllCells_of_NodeB(VS.LC.ULCreditUsed.
UL_CE_MEAN Mean/2) / MIN(NodeB License UL CE Number,
_RATIO NodeB Physical UL CE Capacity)
Sum_AllCells_of_NodeB(VS.LC.DLCreditUsed.
DL_CE_MEAN Mean) / MIN(NodeB License DL CE Number,
_REMAIN NodeB Physical DL CE Capacity)