Вы находитесь на странице: 1из 3

3 Understanding the Need for PESQ to POLQA

Transition
For more than a decade the current P.862/P.862.1 (PESQ) standard has
proven to be a well performing and stable solution for voice quality
evaluation on 2G/3G, circuit switch based networks, and, to a certain
extent, on packet switch based networks. In addition, results obtained
during the POLQA evaluation and validation process showed that in a large
number of NB voice test scenarios, PESQ performs statistically equal with
POLQA 2. However, as described above, the complexity of new network
scenarios might cause new and unexpected degradation effects. The P.863
(POLQA) standard has been designed to handle disruptive effects caused
by these new multicomponent distortions. Details on POLQA operability
requirements, test and application scenarios, and limitations are presented
in a separate paper 3.
Therefore, the transition from PESQ to POLQA should be based on a good
understanding of both technical and business aspects. To gain this
understanding, it is necessary to take a look at the various applications for
which POLQA has been designed. In the following paragraphs, we discuss
these application-related aspects.

3.1 The Two MOS Scales: Narrowband and Super Wideband


POLQA has two operational modes: NB and SWB.
In NB mode, the received (and potentially degraded) speech signal is
compared with the NB (300 to 3400Hz) original. Thus, normal telephone
band limitations are not considered to be severe degradations. NB mode
Ascom (2011) Document:
NT11-22759, Rev. 1.0 5(11)

aims to maintain backward compatibility to P.862.1 (PESQ).1 The listening


quality is modeled as perceived by a human listener using a loosely
coupled IRS type handset at one ear (monotic presentation). For a large
number of NB scenarios, PESQ and POLQA NB mode showed statistically
equal performance.
In SWB mode, the received (and potentially degraded or band limited)
speech signal is compared with an SWB reference. The listening quality is
modeled as perceived by a human listener using a diffuse-field equalized
headphone with diotic presentation (same signal in both ears). Therefore,
NB or WB limitations are considered to be degradations and are scored
accordingly on a unique SWB MOS scale. In the case of NB quality, it is
likely that quality will be compressed at the lower end of the MOS scale,
which could impact POLQA SWB performance in predicting NB scenarios.
Another reason why NB scenarios in SWB mode are less accurately
predicted than in NB mode is the different optimizations that POLQA uses
for each of the modes. The NB scenarios are more accurately predicted in
the NB mode, since the POLQA optimization is NB focused in this mode.
Therefore, as long as user applications are still mainly NB scenarios, which
is the case in the majority of todays up and running wireless networks,
and no comparison with higher definition bandwidths (WB or SWB) is
required, then transition to POLQA wouldnt be of imminent need.

3.2 The Impact of Various Codecs

POLQA has been evaluated and validated for a large range of multiband
codecs, standardized (e.g., AMR-WB, EVRC-WB, iLBC, AMB+, AAC,
G.711.x, and G.723, G.729.x) and commercial (Skype/SILK), which are
used or are planned to be used for the voice service supported by different

network solutions (e.g., wireless: GSM/WCDMA/LTE, WiMAX fixed, and


Bluetooth). In addition, CS, VoIP, and VoIP over IMS scenarios have been
either simulated or live collected.
A large number of 3G networks still use only the traditional standardized
NB codecs (e.g., AMR and EVRC). In addition, the 4G/LTE voice service
solutions (CSFB or VoIP over IMS) are or could be generally launched with
NB codecs (AMR) more likely in the second half of 2012. However,
operators choosing an over the top (OTT) voice solution (e.g., Skype and
Viber) will have mainly commercial codecs involved (e.g., Skype/SILK).
Therefore, it could be concluded that if the user application does not yet
involve any new commercial codec or high definition (HD), commercial, or
standardized codec, then the transition to POLQA wouldnt be a must in the
very short term.

3.3 Network-Centric Types of Speech Degradation

The P.863 standard has been designed to cope with a large range of
degradations that have emerged at both acoustical and electrical interfaces
on a variety of networks (e.g., wireless or fixed, CS or PS/VoIP, or
Bluetooth) and under different noise conditions 3.
Ascom (2011) Document:
NT11-22759, Rev. 1.0 6(11)

Some of these types of degradations are not yet present in todays


networks (e.g., wireless SWB, VoIP over IMS, and VoIP time scaling). In
these particular cases, only simulated conditions have been used, which for
some scenarios do not have the ability to emulate real network behavior
since these real network conditions are not yet exactly known. An important
example is the time scaling (or speech frequency re-sampling) generated
by the jitter buffer adaptation, which compensates for variable delays on the
voice connection. This condition is typical for VoIP over IMS, which today
can be evaluated using simulations and running time scaling algorithms
only 4 5. This challenge emerged from the fact that there is currently no live
experience available on which time scaling characteristics can be defined
(e.g., time scaling length or ratio, its time distribution, and the frequency
within the speech).
The various types of time scaling (e.g., lengths, distribution, and
frequency), along with other aspects (e.g., speech content and
speaker/gender dependency, reference samples time shifting, and
reference vs. encoded speech scores variation) are under investigation
during the POLQA characterization phase and expected to be finalized and
published in the forthcoming ITU-T POLQA Application Guide (estimated
for the second half of 2012).
Therefore, users handling applications as described in the previous
paragraphs might consider waiting until there is an application guide that
answers how all of these cases should be properly and accurately
addressed.

3.4 Devices/Phones

Generally, there are main categories of degradations related to the devices:


the time variant linear distortions (e.g., spectral shaping implemented in
some of the older phones); and the non-linear distortions produced by the
microphone / transducer or by reverberations caused by hands-free set-ups
at acoustical interfaces. POLQA has been designed and proven to
accurately predict the speech quality affected by these distortions.
However, if the user is concerned about NB applications that do not involve
evaluations at an acoustical interface that is specific to device testing, then

PESQ could safely serve this need.

Вам также может понравиться