Вы находитесь на странице: 1из 9

Voice-Enhancement Devices and

GSM Network Quality


Fredrik Pettersson
Product Marketing Manager
Ditech Networks

Mobile phone customers rank call quality and reliability as the most important criteria when
choosing a provider. Global System for Mobile Communications (GSM) carriers must do what
they can to preserve acceptable call quality, often under difficult circumstances. For example,
handset capabilities vary, network conditions vary, and the codecs used to process voice traffic
can also vary.

Under these conditions, carriers should learn all they can about the factors that impact GSM
network quality and how to best manage it. This article will look at how voice quality is measured
in GSM networks, the impact of traditional voice quality–enhancement solutions on GSM
networks, and how a new type of device—a voice-enhancement device (VED)—significantly
improves voice quality by delivering a more consistent and tighter voice quality distribution.

The article also addresses VEDs’ ability to enable a higher use of low-bit-rate speech codecs
(employed by GSM half-rate [HR] and adaptive multi-rate [AMR] calls for cost-effective voice
capacity during peak traffic conditions) while maintaining acceptable quality.

Measuring Voice Quality in GSM Networks

Current methods for measuring voice quality in GSM networks primarily use drive test tools
based on the perceptual evaluation of speech quality (PESQ) algorithm [1], which is an objective
method for assessing end-to-end speech quality developed by the International
Telecommunication Union (ITU). Carriers typically use PESQ with clean speech files to measure
the radio quality of their networks and to quantify how impairments related to codec type and
frame loss (caused by poor coverage, handovers, and interference) impact listening quality.
Another common method is based on Ericsson’s speech quality index (SQI), which estimates how
codec type and radio link parameters such as bit-error rate (BER), frame-erasure rate (FER),
discontinuous transmission (DTX), and handover rates affect voice quality [2].

Limitations of Traditional Objective Voice-Quality Measurement Methods


Essentially, PESQ and SQI measure the impact of radio frequency (RF)–related impairments on
listening quality. Thus, they are not able to capture and quantify how other important voice-
quality impairments present in live calls—background noise, acoustic echo, and mismatched
speech levels—affect customer experience.

High levels of background noise are common in mobile calls placed in busy streets; crowded
places; and inside cars, buses, and trains. Acoustic echo is a nonlinear type of echo often
generated by low-end handsets due to insufficient acoustic isolation between the speaker and the
microphone. Acoustic echo can also be generated by headsets and Bluetooth devices, which are
becoming increasingly common due to new laws that require motorists to use hands-free devices
when talking on their phones while driving.

Moreover, PESQ is an intrusive method, which means that it can only measure the quality of a
limited number of test calls. It cannot measure the quality of actual live calls or the quality of
calls placed in locations different from those covered by the standardized drive test routes. In
addition, PESQ and SQI can only estimate listening quality; they cannot measure conversational
quality, which is greatly affected by acoustic echo and transmission delay.

Subjective Voice-Quality Test Methods


Subjective methods use human listeners to measure voice quality so they can detect and evaluate
all aspects of voice quality. The ITU has developed a number of recommendations for subjective
testing, including the ITU Telecommunication Standardization Sector (ITU–T) P.800 [3], which
defines the mean opinion score (MOS) as one important metric for subjective determination of
transmission quality. The test subjects are asked to listen to prepared speech files and then
determine the quality of the speech according to the following scale:

• Excellent 5
• Good 4
• Fair 3
• Poor 2
• Bad 1

The MOS is then calculated as an average of all participants’ scores.

The ITU–T P.835 recommendation [4] is particularly suitable for subjectively evaluating speech
communications systems that include noise suppression algorithms. The scoring method is similar
to ITU–T P.800 with the addition that participants are asked to do three separate evaluations of
the speech signal, the background signal, and the overall signal.

A key limitation of these subjective test methods is that they are expensive, because they require
a lot of testing time and a large number of participants. This means that carriers cannot employ
these subjective methods for large-scale network testing and continuous monitoring of voice
quality.

The ITU–T G.107 E-Model


The ITU–T G.107 e-model [5] is an objective method that addresses many of the limitations of
the traditional voice–quality measurement methods. The e-model is a comprehensive method that
takes into account not only codec impairments and the impact of frame loss, but also
environmental noise, mismatched speech levels, echo, and delay to determine perceived voice
quality. Further, the e-model is a non-intrusive method that allows cost-effective, large-scale
monitoring of all live calls passing through a network.

The e-model estimates overall voice quality using a metric called the transmission rating (R)
factor. The R factor is a number from 0 to 100, where 100 represents an ideal call with no
impairments. Connections with an R factor of less than 50 are considered to be of poor quality
with nearly all users dissatisfied, and are therefore not recommended by the ITU.

The European Telecom Standards Institute (ETSI) originally specified the e-model, and it was
later adopted by the ITU, which created the G.107 recommendation.
The e-model assumes that all impairments are additive, and calculates the R factor according to
the following formula:

R = Ro – Is – Id – Ie + A

where Ro is the signal-to-noise ratio (SNR) and Is is the signal impairment occurring
simultaneously with the speech, including loudness, codec quantization distortions, and non-
optimum side tones. Id captures impairments related to echo and delay, and Ie (“equipment
impairment factor”) represents impairments due to low bit-rate codecs and packet loss. Finally,
the advantage factor, A, reflects the user’s expectation of quality when making a phone call.

Table 1 shows maximum clean speech R values for common codecs used in wireless networks
[6]:

Codec Bit rate Maximum R value for clean speech


G.711 64 kbps 94
GSM–EFR 12.2 kbps 89
EVRC 8.55 kbps 88
GSM–FR 13 kbps 74
GSM–HR 5.6 kbps 71

Table 1

The e-model also provides a method for converting R values into MOS, according to Figure 1,
provided in G.107 recommendation [5].

Figure 1: MOS as Function of Rating Factor R

For example, R = 50 translates to about 2.6 points on the MOS scale.

Figure 2 shows an example of how codec impairments, frame loss, background noise, acoustic
echo, and mismatched speech levels impact the delivered voice quality according to the ITU–T
G.107 model.
Figure 2: Voice-Quality Impairments and Actual Delivered Quality (Based on the ITU–T G.107
E-Model)

The R value of a perfect EFR call is 89, but due to the different impairments, the actual delivered
quality is close to 55 on the R scale in this particular example.

Figure 2 also illustrates the discrepancy between the voice quality that is measured by a PESQ–
based drive test tool using clean speech files and the actual quality delivered to the subscriber. A
clean-speech EFR call at 1 percent FER would be of good quality, but due to the common
existence of moderate background noise, a relatively weak acoustic echo, and a speech level that
is somewhat too low in this case, the actual quality is reduced to R = 55, which is quite close to
the R = 50 limit for an acceptable call. The e-model is able to measure and quantify these
additional impairments that tend to have a significant impact on user experience.

Finally, the e-model has been adopted by the voice over Internet protocol (VoIP) industry, where
carriers frequently use it to monitor voice quality. In Japan, it is even employed to regulate voice
quality according to the following classes:

• R > 80, and end-to-end delay < 150 ms for PSTN–equivalent and emergency calls
• R > 50, and end-to-end delay < 400 ms for plain IP telephony

The ITU–T G.107 method is now also being integrated into some of the most advanced VEDs for
non-intrusive monitoring of live wireless calls.

Traditional Voice-Enhancement Solutions

For many years, GSM carriers have used advanced drive test tools and methods to measure and
mitigate radio network–related voice-quality impairments. Other important factors impacting
voice quality in GSM networks are the so-called “external impairments”—background noise and
nonlinear acoustic echo. As we have seen, these external impairments are not being measured by
traditional drive test tools. The impacts of background noise and nonlinear acoustic echo are
therefore poorly understood in the GSM industry, and previous solutions addressing these
impairments have not been optimal.

There are several typical approaches to voice enhancement in GSM networks. Some GSM
handsets have basic noise suppression and acoustic echo cancellation built in. However, the
performance of these handset-based solutions varies considerably depending on handset vendor
and model, so the GSM carrier cannot optimize its delivered quality and capacity in the way it
could with network-based solutions that can process traffic from all handsets in the network. In
addition, handsets have limitations in processing power and battery life that make it difficult and
expensive to implement the most advanced noise-suppression and echo-cancellation algorithms.

During the standardization of the latest codec in the GSM family, AMR, 3GPP identified the
impact of background noise on voice quality, especially the limitations of the lowest AMR codec
rates in noisy conditions. 3GPP issued a set of minimum performance requirements on handset-
based noise suppression in an attempt to address the problem [7].

These requirements are not mandatory, however, and apparently not stringent enough. Large-
scale measurements on live GSM–AMR calls in a North American network show that
considerable amounts of noise remain in the calls, with significant impacts on voice quality and
user experience. Based on these measurements, about 18 percent of live AMR calls have
background noise impairments exceeding 24 points on the R scale, equivalent to degradation on
the order of 1.0 MOS points (Figure 3).

Figure 3: Background Noise Impairments Measured in Millions of Live Calls in North American
GSM and CDMA Networks (100 Percent AMR Calls in GSM Networks)

Another approach is to build noise reduction into the codec itself. Unlike GSM, the CDMA
standard has mandatory noise reduction integrated into the enhanced variable rate codec (EVRC).
The effects of this built-in noise reduction can also be seen in Figure 3, where, CDMA calls
perform better than GSM ones for moderate and small impairments of fewer than 24 points.

However, neither CDMA nor GSM noise suppression seems to be very effective for heavily
impaired calls (where the impairment exceeds 24 points on the R scale), as there is no significant
difference between GSM and CDMA calls. This could be explained by the inherent processing
power limitations of handset-based noise suppression discussed earlier.

Finally, carriers can always improve the quality of the radio network itself by deploying more
base station sites and more transceivers. However, this proposition is very costly and does not
address the external impairments that are the root cause of the problem.

Voice-Enhancement Devices

VEDs are a newer approach to reducing voice impairments in GSM networks. VEDs are
network-based functions that typically include adaptive noise cancellation (ANC), acoustic echo
cancellation (AEC), and automatic level control (ALC). A VED is typically deployed in the
mobile switching center (MSC).

VEDs process PCM–encoded speech and are typically installed on the standard A interface
between the transcoder and the MSC. This allows the VED to process all call types, including
mobile-to-mobile and mobile-to–PSTN calls, for all codec types such as EFR, AMR, and HR.

Basic VEDs typically remove noise and acoustic echo impairments in the uplink, while more
advanced ones can remove impairments on the downlink as well. The ability to process downlink
speech allows a VED to enhance off-net calls that originate in other mobile carriers’ networks.

More advanced VEDs have additional functions such as enhanced voice intelligibility (EVI),
which processes and enhances the downlink speech path to additionally improve voice clarity for
callers in noisy environments.

Impact of VED Deployment

VEDs provide an added layer of voice-quality processing for mobile calls to provide an enhanced
customer experience. VEDs also enable GSM carriers to better control the delivered quality
distribution and allow them to offer a more consistent voice service based on their minimum
quality requirements. When a GSM network uses a VED, the noise impairments are significantly
reduced, as shown in Figure 4.

Figure 4: GSM Network Noise Impairment Reduction with a VED (North American Network with
100 Percent AMR)

In addition, advanced VEDs are also effective at removing very large impairments exceeding 25
points on the R-factor scale. These impairments correspond to a quality degradation of 1.0 MOS
points or more.

The delivered quality varies widely without a VED due to the impact of external impairments,
which tend to generate a large tail of poor quality calls. These calls are the ones most likely to
drive churn and lost revenue, since they fall below the minimum allowable quality level (Figure
5).

Figure 5: Impact of Voice Quality on Churn and Lost Revenue

A VED removes these external impairments and reduces the percentage of calls that fall below
the minimum quality limit. Thus, the VED helps a carrier to control the percentage of
unacceptable calls in its network and provides greater flexibility and better quality margins.

VEDs also improve voice quality in crowded and busy urban locations by removing noise and
echo impairments and adjusting speech levels for a comfortable listening experience. Subjective
testing by Dynastat, a leading listening test lab in the United States, indicates that the voice-
quality improvement from a VED is on the order of 0.4 to 0.5 MOS points for low-bit-rate codecs
such as GSM–HR in noisy conditions (Figure 6).
Figure 6: Impact of VED on MOS Scores

Figure 7 shows the results of ITU–T G.107 measurements performed on millions of live calls in a
GSM network in southern Europe. About 40 percent of the HR calls fall below 44 points on the R
scale, a figure that corresponds to about 2.2 points on the MOS scale. After VED processing, less
than 10 percent of calls have poor quality.

Figure 7: Impact of VED on HR Calls

Thus, one important benefit of the VED is that it improves HR and AMR call quality, thereby
enabling carriers to increase their use of HR and the lowest AMR codec rates in AMR–HR and
AMR–FR modes to gain more voice capacity during busy hours, while at the same time limiting
the percentage of calls with unacceptable quality.

Conclusion

With a better understanding of the factors impacting voice quality in their networks, GSM
operators are better equipped to maintain acceptable quality levels for their subscribers.
Traditional quality management methods address part of the problem, but the key issues are noise
and nonlinear acoustic echo.

By deploying VEDs, carriers can address voice-quality problems such noise, echo, and level
impairments. VEDs deliver better and more consistent voice quality for all calls, regardless of
handset model or codec. VEDs combined with G.107 monitoring capabilities also allow carriers
to aggressively employ HR and the lowest AMR codec rates to increase call capacity while
maintaining acceptable voice-quality levels and controlling the percentage of calls with poor
quality. Finally, VEDs enable implementation of a better test methodology that uses the ITU–T
G.107 e-model and provide the ability to monitor the customer experience for all live calls on the
network.

Voice quality will always be an important selection criterion for GSM subscribers, and by
deploying VEDs, carriers can ensure a quality calling experience.

References

[1] ITU–T Recommendation P.862 (02/2001), Perceptual Evaluation of Speech Quality


(PESQ): An objective method for end-to-end speech quality assessment of narrow-band
telephone networks and speech codecs.
[2] Speech Quality Measurement with SQI, Ericsson Technical Paper, Revision B, August 9,
2006.
[3] ITU–T Recommendation P.800 (08/96), Methods for subjective determination of
transmission quality.
[4] ITU–T Recommendation P.835 (11/2003), Subjective test methodology for evaluating
speech communications systems that include noise suppression algorithm.
[5] ITU–T Recommendation G.107 (03/2005), The E-model, a computational model for use
in transmission planning.
[6] ITU–T Recommendation G.113 (02/2001), Transmission Impairments due to Speech
Processing.
[7] 3GPP TS 26.077: Minimum Performance Requirements for Noise Suppresser;
Application to the Adaptive Multi-Rate (AMR) Speech Encoder.

Вам также может понравиться