Benchmarking of Volte Services A First Field Experience: A Rohde & Schwarz Company

A Rohde & Schwarz Company
Benchmarking of VoLTE Services
A first field experience
March 2015
SwissQual AG
Allmendweg 8 CH-4528 Zuchwil Switzerland
t +41 32 686 65 65 f +41 32 686 65 66 e info@swissqual.com
www.swissqual.com
Part Number: 12-070-200912-4

SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and free
of errors and omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents.
SwissQual’s liability for any errors in the documents is limited to the correction of errors and the aforementioned advisory
services.
Copyright 2000 - 2015 SwissQual AG. All rights reserved.
No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated
into any human or computer language without the prior written permission of SwissQual AG.
Confidential materials.
All information in this document is regarded as commercial valuable, protected and privileged intellectual property, and is
provided under the terms of existing Non-Disclosure Agreements or as commercial-in-confidence material.
When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademark
somewhere in your text.
SwissQual®, Seven.Five®, SQuad®, QualiPoc®, NetQual®, VQuad®, Diversity® as well as the following logos are
registered trademarks of SwissQual AG.
Diversity Explorer™, Diversity Ranger™, Diversity Unattended™, NiNA+™, NiNA™, NQAgent™, NQComm™, NQDI™,
NQTM™, NQView™, NQWeb™, QPControl™, QPView™, QualiPoc Freerider™, QualiPoc iQ™, QualiPoc Mobile™,
QualiPoc Static™, QualiWatch-M™, QualiWatch-S™, SystemInspector™, TestManager™, VMon™, VQuad-HD™ are
trademarks of SwissQual AG.
SwissQual acknowledges the following trademarks for company names and products:
Adobe®, Adobe Acrobat®, and Adobe Postscript® are trademarks of Adobe Systems Incorporated.
Apple is a trademark of Apple Computer, Inc.
DIMENSION®, LATITUDE®, and OPTIPLEX® are registered trademarks of Dell Inc.
ELEKTROBIT® is a registered trademark of Elektrobit Group Plc.
Google® is a registered trademark of Google Inc.
Intel®, Intel Itanium®, Intel Pentium®, and Intel Xeon™ are trademarks or registered trademarks of Intel Corporation.
INTERNET EXPLORER®, SMARTPHONE®, TABLET® are registered trademarks of Microsoft Corporation.
Java™ is a U.S. trademark of Sun Microsystems, Inc.
Linux® is a registered trademark of Linus Torvalds.
Microsoft®, Microsoft Windows®, Microsoft Windows NT®, and Windows Vista® are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries U.S.
NOKIA® is a registered trademark of Nokia Corporation.
Oracle® is a registered US trademark of Oracle Corporation, Redwood City, California.
SAMSUNG® is a registered trademark of Samsung Corporation.
SIERRA WIRELESS® is a registered trademark of Sierra Wireless, Inc.
TRIMBLE® is a registered trademark of Trimble Navigation Limited.
U-BLOX® is a registered trademark of u-blox Holding AG.
UNIX® is a registered trademark of The Open Group.


© 2000 - 2015 SwissQual AG
Contents
Benchmarking of VoLTE Services A first field experience ......................................................................... 0
1 VoLTE as a new voice service ............................................................................................................ 1
2 VoLTE - Technicals very simple ......................................................................................................... 2
3 How to benchmark VoLTE? ................................................................................................................ 2

VoLTE Call Setup .................................................................................................................................. 3
VoLTE Audio Transmission ................................................................................................................... 5
Audio quality in VoLTE .......................................................................................................................... 6
Audio quality and transcoding ............................................................................................................... 7
Audio delay ............................................................................................................................................ 8
Absolute mean audio delay .............................................................................................................. 9
Variable audio delay ....................................................................................................................... 10
Measuring and evaluation of variable audio delay ......................................................................... 11
4 Conclusion .........................................................................................................................................14
CONFIDENTIAL MATERIALS ii

Figures
Figure 1: Basic flow in a VoLTE to VoLTE connection ...................................................................................... 2
Figure 2: Example of SIP message flow in a VoLTE to VoLTE call .................................................................. 3
Figure 3: Call Set-Up times for mobile to PSTN connections ............................................................................ 4
Figure 4: Call Set-Up times for mobile to mobile connections........................................................................... 4
Figure 5: Voice codec information in VoLTE RTP header ................................................................................. 5
Figure 6: IP Throughput in a VoLTE call ........................................................................................................... 5
Figure 7: Examples for listening quality for VoLTE in comparison to 3G calls .................................................. 6
Figure 8: Examples of quality distribution for VoLTE and 3G calls ................................................................... 7
Figure 9: Examples of audio delay in mobile to PSTN calls .............................................................................. 9
Figure 10: Examples of audio delay in VoLTE and 3G mobile to mobile calls .................................................. 9
Figure 11: Basic principle of a jitter buffer to compensate packet delay jitter ................................................. 10
Figure 12: Example of an aligned pair of reference and degraded signal ....................................................... 11
Figure 13: Example of variable delay in a VoLTE live test sample ................................................................. 12
Figure 14: Example of variable delay in a 3G live test sample........................................................................ 12
Figure 15: Occurrence of delay changes for VoLTE and 3G live network samples ........................................ 13
Figure 16: P.863 MOS-LQO statistics in relation to delay changes in the test sample ................................... 13
CONFIDENTIAL MATERIALS iii


Foreword
In this white paper we take an in depth look at the performance of recently launched commercial VoLTE
networks by evaluating speech quality. The information presented is based on real world data collected from
an end-to-end perspective – simulating the end user experience. The data used was collected between
June of 2014 and January 2015 on three different VoLTE operators in the US. This white paper does not
cover core network architecture and call handling within the network. What is covered is the attributes of a
VoLTE network and how they impact voice quality. Test methodology is also covered in detail.
This document will provide first results and methodologies of testing and benchmarking VoLTE. It treats
VoLTE from an end-to-end measurement perspective. The focus of the document revolves around the typical
measurements made in all voice telecommunications networks.
1 VoLTE as a new voice service

LTE networks first launched commercially in 2010. Since that time, there has been steady acceleration in
the deployment of LTE networks globally. It is fair to say the technology has proven to be very successful in
the real world with the ever increasing demands for more and more data by mobile device users. LTE
networks offer well documented bandwidth advantages over the legacy 3G networks and their derivative
data services (UMTS, HSPA, EVDO), even though there has been significant improvement in the bandwidth
of these 3G based networks. In principle, LTE is more a concept than a fixed technology. It opens the doors
forwards to a step-wise evolvement in capacity and speed. LTE is well on its way to becoming the basic
transmission technology for all networks around the world, providing an excellent way forward for GSM and
CDMA based networks currently deployed.
Since LTE is a data only technology (no circuit switched voice capability), voice telephony to date has been
handled by circuit switch fall back to either a CDMA or GSM based voice service. This adds complexity to the
device being used. For instance, GSM based carriers must fall back to a UMTS (or HSPA) data connection
when a voice call is established. The voice call requires a circuit switched connection (2G/3G) and for data, a
parallel connection cannot stay in LTE so it must fallback to UMTS or HSDPA too. In CDMA networks, the
device must have a CDMA based voice call and a second radio chipset for LTE if simultaneous voice and
data are enabled. For these reasons along with the inherent spectral efficiency of LTE, voice calls over LTE
are a natural next step. In LTE a dedicated voice call service, known as Voice over LTE (VoLTE), is a high
priority for many LTE operators globally. As of late 2014, several LTE operators launched VoLTE services,
albeit in a conservative manner. Although VoLTE should not be confused with VoIP (voice over internet
protocol), there is a desire by the industry for the public to associate these two. VoLTE signalling, Call-Setup
and voice transmission are very close to common SIP-based VoIP services. The most important difference is
the deep integration of VoLTE as a service into the mobile core network.
Typically, VoIP services are installed over the top (OTT) of a data connection, usually as an application on a
smart phone. It runs as a normal data service, uses the default data bearer and has to share it with other
active services. Usually, there is no integration in the phone’s call client. By not being integrated into a
phone’s call client, features such as call waiting, three way calling, call forwarding, etc. are not available. In
addition, if you are on an OTT VoIP client, there is a good chance your call will drop if a “real” voice call
comes in.
Network operators stand to benefit substantially by deploying VoLTE services. Primarily, operators will gain
spectrum by pushing voice calls to VoLTE. The more voice traffic moved to LTE, the more operators can re-
farm 3G spectrum to LTE spectrum. The end game will be to eliminate all 2G/3G networks for a single LTE
network. Subscribers stand to benefit as well. Carriers will be able to offer HD Voice, video chat, fast call
set up times, and better battery life, just to name a few. For all its promise, network operators will need to
proceed with caution. Nothing will be more catastrophic than poor performance on voice calls. Even though
most subscribers are gobbling up data at unprecedented rates, their voice experience will define the overall
quality of a carrier.
Chapter 1 | VoLTE as a new voice service 1

CONFIDENTIAL MATERIALS

2 VoLTE - Technically very simple

From a technical perspective, a VoLTE call is quite straightforward. The first step is for the UE to establish a
data connection on the LTE network. If the UE is VoLTE capable it will then attempt to register on the IMS
(IP Multimedia Subsystem) Server. The IMS server is unique to each carrier and not accessible from the
public domain, which is unlike a typical VoIP server that is connected from the public internet.
Once the UE is registered on the IMS Server, the server will manage all voice connections. The
communication protocol between the IMS Server and the UE is SIP (Session Initiation Protocol) and the
underlying protocol is TCP. All of these transactions will happen in the background, often not visible to the
user. Some UEs will have an icon that shows up in the display indicating the device is properly registered on
the IMS Server.
In the case of a call request, the call setup is managed by the IMS server and either linked to another LTE
device (as shown below), to a circuit switched mobile, or to a fixed line subscriber.
Figure 1: Basic flow in a VoLTE to VoLTE connection

The voice connection itself uses RTP over a UDP connection between the two mobile phones. For calls
from a VoLTE device to a circuit-switched technology, there is a gateway that the IMS Server communicates
with. The voice codecs utilized are the same as in today's mobile networks such as AMR and AMR-WB.
Instead of framing the voice stream gets packetized. In the near future a new voice codec, EVS (Enhanced
Voice Service), will be available.
The applied audio pre-processing is also the same for legacy calls and VoLTE. Gain control, noise
suppression and DTX are applied the same way.
3 Testing and Benchmarking VoLTE

From the user's perspective nothing is visible about the technology used for a voice call. The user simply
dials a number and presses the send button. He/She doesn’t care if the call is made in 2G/3G or in a VoLTE
environment. All the user really cares about is if the call went through, the voice quality, and call retainability
(in our industry we refer to these KPIs as blocks, MOS, and drops). The same should be true from the end-
to-end testing perspective. When VoLTE is tested as a service it must be benchmarked using the same KPIs
and statistics as a normal voice service.
Operators are interested in the call setup time, audio transmission time, handovers, radio conditions, and
much more.
Chapter 2 | VoLTE - Technically very simple 2


In general, the meaning of KPIs and metrics are the same for VoLTE as for legacy technologies. This is
required if operators have to benchmark a VoLTE service with a legacy voice call in 2G / 3G under
comparable conditions. When we look deeper into the technical details that enable a VoLTE call to function,
it is fairly straightforward when we evaluate a VoLTE to VoLTE call within the same IMS Server. Beyond this
simple case, however, complexities quickly arise. The following call types need special attention to ensure
KPI are not compromised.
- Subscriber is in LTE, having VoLTE service but calling a 2G/3G/PSTN subscriber
- Subscriber is in LTE, having an established VoLTE call but loosing LTE coverage
(SRVCC, Single Radio Voice Call Continuity)
- Subscriber is in LTE, having no VoLTE service, just LTE data access
(CSFB, Circuit-Switched Fall-Back)
The following sections provide some real world examples for analysis. These measurement results were
taken from live VoLTE networks using commercially available devices. These numbers illustrate the
important parameters associated with running a quality VoLTE network. They can be used for setting a
benchmark upon which to improve, or to evaluate one carrier vs. another.
VoLTE Call Setup

Call setup time is one of the most important KPIs when measuring voice quality. From a user perspective,
this is the amount of time it takes to get connected and the audio channel is opened after the send button is
pressed. These types of call events can be measured in a VoLTE environment by looking into SIP
messaging. Once the VoLTE device is registered in the IMS server, the call setup and handling is taken care
of by SIP messaging. The SIP connection itself is usually encrypted, but it depends on the individual settings
for VoLTE by the operator. As shown in Figure 2, SIP provides the main states of the call flow and the
required trigger-points to calculate the main KPIs, Call Setup Time and Post Dial Delay. The list on the left is
the calling phone and the list on the right shows the SIP messages from the receiving phone.
Figure 2: Example of SIP message flow in a VoLTE to VoLTE call

It’s important for operators to distinguish between various call types when analyzing call setup time metrics.
For mobile to mobile calls, VoLTE to VoLTE call results should be separated from VoLTE to 2G/3G calls.
These two types of voice calls make for an interesting comparison, particularly when compared to a
traditional 3G to 3G call. In addition to analyzing the mobile to mobile differences, it is important to analyze
the differences between 2G/3G calls to a PSTN and VoLTE calls to a PSTN As mentioned above, carriers
can derive benefit from comparing these different call types to ensure that newer services such as VoLTE do
not provide a more negative experience to customers than what they are used to. In addition, benchmarking
one carrier to another carrier can provide competitive feedback on areas where improvement is needed to be
better than the competition.
An example of a drive test campaign conducted in early 2015 running a few hundred calls shows a visible
difference in between the categories. The test was conducted across 3 carriers (A, B, and C) all with VoLTE
services enabled.
Chapter 3 | Testing and Benchmarking VoLTE 3


Figure 3: Call Set-Up times for mobile to PSTN connections

The measured Call Set-Up Times are almost statistically similar across all carriers for the 3G to VoLTE call
scenario. It appears there is a slight advantage in VoLTE to PSTN calls, particularly on Carrier A, but the
amount of data does not allow a statistical significance analysis for confirmation.
Given that mobile phone to mobile phone traffic is on the rise, it is important to evaluate the results of the
various call scenarios that exist. The Call Set-Up Times in mobile to mobile connections can be sub-divided
in three categories, legacy 3G to 3G and VoLTE to VoLTE connections but in cross-technology calls from CS
to VoLTE too.
Figure 4: Call Set-Up times for mobile to mobile connections

If analysing these cases there is a clear advantage for VoLTE. Even in calling a VoLTE client from 3G shows
a shorter Call Set-Up Time than a common 3G to 3G call, but a pure VoLTE to VoLTE call set-up just takes
less than 60% in time compared to 3G to 3G in this real world data.
Please consider this as an example, data where gained in just three arbitrarily chosen networks, in one
1
arbitrary chosen region in a two days test drive. However, the Call Setup Time seems to benefit significantly
from a move to VoLTE support.
1
It has to be noted that LTE capable devices are in LTE in idle mode and fall back to 3G/2G in the event of a voice call.
This CSFB called strategy extends the call setup in 3G/2G by around 1s compared to phones that have no LTE support
at all and CSFB is not applied.


VoLTE Audio Transmission

Technically speaking, the audio processing in VoLTE is not different from legacy circuit-switched calls. The
speech codecs are the same, as are frame sizes and loss concealment strategies. Even though VoLTE
supports a wide range of speech codecs, today mainly AMR-WB is integrated. In the near future the EVS
codec may be integrated. AMR-WB offers 7 kHz audio bandwidth where EVS will offer 20 kHz audio
bandwidth. For the end user, this will lead to superior audio quality over traditional telephony.
Figure 5: Voice codec information in VoLTE RTP header

Compared to a circuit-switched call the voice transmission is visible at the IP layer. The following graph
illustrates the bit-streams in uplink and downlink directions.
Figure 6: IP Throughput in a VoLTE call

The visible pulse-like transmission is caused by a half-duplex transmission of voice samples in either
direction. Each sample consists of two sentences, and even this speech-pause-speech pattern can be
recognized in the IP throughput.


Audio quality in VoLTE

VoLTE is a new service; it is based in the use of AMR-WB in practice. Even though other, older codecs are
supported technically, no commercially released devices have to fallback to these older codecs. This makes
VoLTE different to legacy CS voice services, where typically AMR narrowband is the common ground today.
At first just a few typical results of drive-test campaigns will be shown for illustration. In each scenario several
hundred calls are processed and more than 1000 single test samples have been averaged to get the mean
score. The data collection was done in summer 2014, mainly in U.S. networks.
The following graph (Figure 7) shows results collected in early drive test campaigns comparing VoLTE to
traditional CS mobile to mobile calls. The figure shows the MOS derived by ITU-T P.863 ‘POLQA’ in super-
wideband mode in different scenarios.
At the left most column (3G-3G NB) shows the results for legacy mobile to mobile calls with transcoding. The
MOS is lower because there are two coding steps and the transmission is based on AMR (NB) instead of
AMR-WB (narrower audio bandwidth).
In the next column (3G-3G TrFO) the average results for AMR-WB transcoding-free connections (TrFO) are
shown. The average score is – of course – higher than for the red column, even though the RF conditions
might be the same. This is due to the reduction to just one compression step and – more important – to the
higher audio bandwidth offered by the AMR-WB speech codec.
Of course, a wideband channel can be benchmarked to a narrowband channel, but in this case narrowband
will be scored lower just because of the band limitation, despite the RF network conditions are equally good
or even better. It is important to note that ITU-T P.863, when set into the Super Wideband Mode, will
perceive a good quality traditional mobile phone call as a low score such as 3.23 because the expectation is
for a wide band audio call. From a benchmarking perspective, the wide band audio will sound much better
than a narrow band audio call, hence a significantly better MOS value for wide band calls.
The resulting question is:

When comparing a VoLTE to VoLTE connection directly to a circuit-switched channel, what is the
corresponding counterpart in 2G/3G technology?
If comparing VoLTE to VoLTE, it is fairer to compare to such transcoding-free AMR-WB connections. The
two most right columns are based on VoLTE to VoLTE connections. Both campaigns are made by using the
same driving route in the same network, just with different VoLTE devices.
Figure 7: Examples for listening quality for VoLTE in comparison to 3G calls

It is visible that a VoLTE to VoLTE connection is in the same quality range of a 3G mobile to mobile
connection if TrFO is used. This seems logical since the coding technology is exactly the same. What


accounts for the small difference (3.74 vs 3.66) in voice quality that the VoLTE based calls have over the 3G
calls if the coding is the same?
The explanation can be found in the distribution of the MOS values shown in Figure 8. For both, the
maximum quality is determined by the used compression technology that is AMR-WB 12.65kbps. The
maximum value recorded for AMR-WB falls in the range of a MOS value of 4.0 to 4.2. Note there is no
technology dependent disadvantage with VoLTE or 3G.
Figure 8: Examples of quality distribution for VoLTE and 3G calls

However, the slope towards lower MOS values is steeper for VoLTE-VoLTE calls in these campaigns.
Essentially, the occurrence of slightly distorted voice clips is less common with VoLTE based calls than with
3G based calls (MOS values in the range of 3.2 to 3.8). The occurrence of considerable distorted clips below
MOS 3.2 remains the same again, but they are quite rare in both cases.
What could be the reason for that? The typical cause for small impacts on quality is frame replacement by
the decoder. In 3G it is usually due to discarded erroneous frames that are replaced. For VoLTE it is the
same and in addition delayed packets are discarded too. Finally, the occurrence of those problems is smaller
in the tested LTE connections. It can be partially caused by the low load in the live LTE network; however,
these numbers are just examples and may change when VoLTE is fully used as a service.
Audio quality and transcoding

In a pure VoLTE to VoLTE connection the voice stream is compressed once, transmitted and decoded with
an AMR-WB, typically at 12.65kbps. The same is true for a 3G TrFO connection. However, today
transcoding may still happen when the VoLTE packet-stream is converted into a legacy technology, usually
into AMR 12.2kbps or in G.711 A-/µ-Law.
This transcoding will not only insert an additional coding step, it also reduces the audio bandwidth from 7kHz
wideband to traditional narrow bandwidth 3.1kHz-band telephony. Since VoLTE subscribers are a minority
today, they will experience wideband quality only in cases they are calling another VoLTE subscriber in the
same network. If calling a mobile subscriber in 3G or 2G they will lose wideband and have a further
disadvantage by the additional transcoding to AMR narrowband.
VoLTE technology has just started to be deployed; we will recognize improvements and better
interoperability over the next months and years. Transcoding free IP transmission will ultimately become
standard in core networks and may even serve IP-capable fixed network devices. As evidence of this rapid
evolution, we see one operator transferring AMR-WB calls from 3G to VoLTE without losing wideband audio
(based on real world data collected in January 2015).


Audio delay and conversation quality

A fixed audio delay as a physical transmission time of the voice signal is not directly related to listening audio
quality. In a pure listening situation a fixed delay is not distorting, not even perceptible. A long audio delay
gets annoying in a conversation, where a fluent interaction becomes more and more difficult. Rather than
referring to this as audio quality, we refer to this element of voice calls as “conversation quality.”
Conversation quality deteriorates as audio delays increase. In Public Switched Telephone (PSTN) networks,
the audio delay has traditionally been less than 100ms and was not considered detrimental to typical
conversation quality. Since the 1970s there has been rapid expansion of global telecommunications systems
that are interconnected, primarily brought on as a byproduct of a more global economy and the “shrinking
world.” With this come longer distance connections and especially satellite links that have contributed to
longer audio delays in our speech communications. Audio delays of several hundred milliseconds to more
than one second were not uncommon. Today, even though satellite links are not used anymore, open VoIP
connections in OTT applications have audio delays that are annoyingly large.
Compared to pure PSTN connections, mobile networks experience longer audio delays. With the introduction
of 2G and 3G networks, audio delay grew into a moderate problem again. Due to framing, interleaving and
the air-link itself, a typical mobile to PSTN connection goes easily above 150ms in delay. For a mobile-
mobile connection audio delays can be pretty bad. Depending on the network setup, one-way audio delays
may hit the 400ms threshold.
However, in circuit-switched connections including 2G and 3G mobile channels, there is a fixed delay but no
further variation. All voice frames get received after a certain transmission time but this time is essentially the
2
same for all voice frames. This means that there is no delay variation when transmitting the speech.
For VoLTE the framing and queuing is different. Audio is transmitted in a packet-stream and affected by
variations in transmission times, typical for packet-switched connections. To deal with these variations (jitter),
in receiving the packets, VoLTE receivers (the same as any VoIP receiver) make use of a so-called jitter
buffer. Received packets will be filled in this queue before getting decoded. This allows the decoder to take
packets out of the queue at the correct time. Since there are always some packets buffered, the VoLTE client
can assemble them properly for fluid playback of audio to the subscriber.
As usual, there is a trade-off. On one hand, a large jitter buffer can deal well with a high varying packet
delay. On the other hand, the large jitter buffer is additive to the audio delay, locking in compromised
conversation quality. Conversely, if a very short jitter buffer is deployed (preserving conversation quality),
the buffer has a higher probability of being emptied, referred to as a buffer underrun. A buffer underrun will
cause a gap in audio decoding, since, the buffer is empty, and there is no packet available anymore. The
decoder cannot decode and deliver voice anymore. In a simple case, there is just silence, perceived as a
gap, ultimately impacting audio quality. Fortunately, VoLTE devices have sophisticated strategies that help
maintain audio quality even while jitter buffers potentially underrun. For instance, if a jitter buffer is running
low, the client can attempt to insert the gaps caused by a buffer underrun into a point in the audio where
there is silence. If there is no silent area to insert gaps, the client can repeat packets. This preserves audio
quality but results in received audio that is “stretched” to fill the blanks caused by buffer underruns. It’s
important to understand that the mean delay of the voice itself has no influence on listening, just on
conversation quality. But un-compensated delay-jitter may lead to actions by the decoder causing perceptible
distortions in audio quality. Carriers implementing VoLTE have to find the balance between audio quality
and conversation quality (audio delay). For this reason, pure MOS measurements must be supplemented by
audio delay measurements to get the complete “conversation quality” picture.
2
There are rare cases, where in handovers one frame gets lost or duplicated, but it results in a delay variation of one
voice frame (20ms).


Absolute mean audio delay

The absolute audio delay, or a mean delay, describes the mean transmission time from mouth to ear, from
one side to the other. It influences the interaction in a conversation and in principle the shorter the mean
audio delay, the better the conversation quality. For a comprehensive explanation one the effect excessive
audio delay has from a human perception standpoint, please refer to ITU-T recommendation G.114.
As explained before, in VoLTE the audio delay strongly depends on the length of the jitter buffer in the user’s
device. Early device implementations experienced unacceptably long audio delays, but now we are seeing
the first consumer devices on the market have well designed clients with more than acceptable delays.
Figure 9 below shows example measurements made in early 2015 in live VoLTE and 3G networks in U.S.
For voice calls between mobiles and a PSTN line, there are two setups to benchmark, 3G to PSTN and
VoLTE to PSTN.
Figure 9: Examples of audio delay in mobile to PSTN calls

In fact there are small differences between the three analyzed operators but they are not dramatic from a
perceptual point. On average the delay is just below 200ms regardless of the call originating in 3G or VoLTE.
The evaluation of mobile to mobile connections can be split into three setups for benchmarking, there is
common 3G to 3G and VoLTE to VoLTE but the cross-technology setup 3G – VoLTE too.
Figure 10: Examples of audio delay in VoLTE and 3G mobile to mobile calls
As expected, a mobile to mobile connection in circuit-switched 3G with common transcoding shows the
highest audio delay. There are two air-links but two compression / de-compression steps applied. The shown
average of 300ms is a fair value, but there are many examples of networks with more than 400ms of audio


delay – barely acceptable. Almost identical same values in audio delay were measured when calling a
VoLTE device from a 3G device.
In both cases there is one operator ‘C’ who applies Transcoding-Free Operation that enables a much shorter
transmission time because one coding/decoding step is avoided. This operator ‘C’ is even processing 3G to
VoLTE calls without transcoding and keeps the audio wideband capability across the technologies! This will
not only result in a short audio delay but also significantly increased audio quality.
When looking at VoLTE to VoLTE, the audio delay is much shorter than in a normal 3G call; however it
results in about the same delay as measured with a 3G to 3G call using AMR-WB TrFo.
The given figures are real field examples of well implemented technology and designed devices. In practice
there might be delays much higher than these, a delay measurement may help to optimize towards those
almost perfect values.
Variable audio delay

The absolute audio delay just describes the delay of an audio stream to its origin at the sending side as a
fixed offset. The signal is sent at a point in time t0 and is received at t0 + c, where c is constant. This is not
the whole story in regards to audio delay. In a VoLTE environment, parts of the voice (packets) can be
received with a different latency, which means that c is not a constant anymore rather varies in time. In
practice, there is a given delay and the variation is on top of it.
How can this variable delay happen in a real-time circuit-switched voice transmission? It can happen if a
voice stream is transmitted via two different transmission channels A and B, while the transmission time via
B is longer than via A. At the far end there is one receiver and its input can be switched from A to B and vice
versa. In consequence there is a jump backwards and forwards in the voice stream. Those jumps can even
happen in traditional 2G or 3G networks when an inter-cell or inter-technology handover is performed. The
voice stream is delivered via the two transmission paths before the handover takes place; however the
synchronization is based on frame borders, not on its sequential number. It may happen that a frame’s
content gets repeated or skipped at once. In those (rare) cases, we can observe variable delay even in
circuit-switched mobile channels.
In packet-switched networks the occurrence of variable delay is much higher than with circuit switched
networks. The typical belief is that this is just a consequence of jitter in the packet reception time, which is
true in principle. However, none of today’s receivers will play out the received voice frames in the exact order
they arrive. Common practice is the use of a packet buffer, where incoming packets will be stored in a pile
and the decoder takes one after another in the defined time steps. What is observed as variable delay is just
uncompensated packet delay jitter and how it affects the voice stream, this is different from the packet jitter
itself.
Jitter Buffer Decoder

10001 10110 10110 10110 10110 10110 10110 10000 10110 10110
01001 10010 10010 10010 10010 10010 10110 10010 11110 11010
00101 01010 01010 01010 01111 11111 01010 01010 00000 11110
00111 01101 01101 01101 01101 01101 01101 01111 01101 01101
Figure 11: Basic principle of a jitter buffer to compensate packet delay jitter
In case of very long packet delay or packet loss, it may happen that the buffer runs empty (buffer underrun).
In this case, the decoder has no more information to decode. Clever strategies are used to solve this issue.
Intelligent decoders recognize an up-coming under-run and try to extend speech pauses to empty the buffer
slower in the hope that new packets are received in the meantime. If there are no speech pauses, the
speech information itself gets stretched, either by time-warping or by creating additional voice frames similar
to the replacement strategies in voice decoders. All these smoothing strategies try to gain time by bridging
long delays of the next expected packet. However, these strategies can just bridge a limited time when the
buffer runs empty. The worst case is that the smoothing strategies reach their own limits and voice is inter-
rupted due to excessive buffer underrun. The decoder simply has nothing to work with. It is logical that all
these issues are less common with the use of a larger jitter buffer. Conversely, these issues will be more


common if the jitter buffer is short. There is a direct relation to the overall audio delay as discussed in the
previous section. A long jitter buffer can avoid warping the voice signal but it has to pay with a higher audio
delay in general.
However, each strategy for stretching the voice stream or even pausing accumulates more and more delay
in the signal. Once packets are received again and the buffer gets filled, decoders try to get rid of the
accumulated delay again. Usually, speech pauses get shortened for that.
Effects on the voice are very different. Extending or shortening speech pauses are almost imperceptible.
They do not change the speech signal itself. It is different if the voice signal itself gets stretched or
compressed, which influences the talk spurts, they are getting longer or shorter. A pure extension or
compression as a re-sampling will change the spectral distribution too, since the voice pitch gets higher or
lower and may sound unnatural. Therefore, time-warping with pitch preservation is used. This way a
moderate stretching or compression of the voice becomes much more acceptable.
In consequence, packet delay jitter widely becomes compensated fully; uncompensated delays can be – to
some extent – masked by smart voice processing. Finally, uncompensated packet delay jitter affects the
temporal structure of the voice stream and can be measured physically, but its effects on speech quality
depend on the used processing strategy in the decoder and can only be evaluated with a speech quality
algorithm.
Measuring and evaluation of variable audio delay

The following measurements illustrate the occurrence of variable audio delay. Variable audio delay can only
usefully be measured if a voice stream is transferred. To emulate talk spurts as in a real conversation,
human speech consisting of words forming sentences must be used. Only in this case the relation and
occurrence of speech pauses and active spurts is as in a human conversation.
Variable audio delay can be physically measured when sub-dividing the original signal into short segments,
looking for each segment in the received signal and measuring the delay. The delay can be given as an
absolute value in between the segments or – more illustrative – relative to a mean of the evaluated talk spurt
or group of sentences.
For those measurements an intermediate result structure of ITU-T P.863 ‘POLQA’ can be used. POLQA
aligns each speech segment of the received signal to a corresponding segment in the reference. The
positions of the segments in the signals can be used as measure for variable delay. In case all segments
have the same offset in their positions, the delay is constant for all segments. If there is no constant offset,
we have variations in the delay. Figure 12 shows an extreme example of variable delay. The audio signal on
the top line is the reference audio clip. On the lower line – the received audio clip - there are extended
pauses and at the end a skipped part of speech. The red audio from the reference clip is the part that was
skipped. The white areas in the received clip show the parts where pauses have been extended.
Figure 12: Example of an aligned pair of reference and degraded signal

This example is directly taken from the time-alignment procedure of POLQA. In consequence, POLQA can
deliver information about the occurrence and the amount of variable delay too.


The following measurements in real VoLTE and 3G calls show the range of delay variations in 6s talk spurts,
it means the difference from the minimum to the maximum of measured short-term delays.
The following detailed pictures illustrate the relative delay during a transmitted speech sample in a VoLTE
channel (Figure 13). It can be seen that the delay increases (stepwise) during the first sentence and gets
adjusted at the end of the speech pause again.
Figure 13: Example of variable delay in a VoLTE live test sample

In CS 3G voice channels delay changes are almost impossible, but can happen in conjunction with inter-cell
handovers. The example in Figure 14 shows a change in delay of 20ms that is one AMR frame.
Figure 14: Example of variable delay in a 3G live test sample

However, most delay changes in circuit-switched networks happen if inter-technology handovers and
especially if changes from AMR-WB transcoding-free to regular AMR occur.
Usually, such a handover causes an interruption in the voice stream. If this interruption didn’t occur it may
lead to a perceptible ‘glitch‘, since the coded and differential voice information of the two subsequent
received channels does not match. Thus a listener does not perceive the delay variation in this case rather
the desynchronized decoder or just a gap.
The following picture shows the occurrence of delay changes for the two sets, VoLTE and 3G. The data is
the same as used for the analysis in section ‘Audio quality in VoLTE’ obtained in summer 2014.


Figure 15: Occurrence of delay changes for VoLTE and 3G live network samples
In 3G almost all samples (94%) have no delay changes, the threshold of 20ms corresponds to one AMR
frame and there are only 6% measurements above that value. For VoLTE the picture is different, about 70%
of the samples show delay changes of >20ms.
But does this variable delay matter from the user perspective in terms of audio quality? This analysis cannot
be done one to one, since the test samples are also affected by other live distortions too. Especially for 3G,
delay changes are usually combined with handovers and potential gaps and short muted speech parts which
also have an obvious effect on MOS values.
However, an indication of speech quality degradation can be derived if the MOS scores are analysed
separately for different amounts of variable delay. The diagram in Figure 16 shows the P.863 averages for
samples with a dedicated range of delay changes vs. time warping.
Figure 16: P.863 MOS-LQO statistics in relation to delay changes in the test sample
For VoLTE with variable delay the situation is quite stable. Even if we can measure a variable delay
physically in the signal, it hardly decreases the speech quality. This underlines that the jitter compensation
strategies as focusing on speech pauses and moderate stretching with pitch-preservation can maintain a
good audio quality.


4 Conclusion
Voice over LTE is becoming today’s reality in more and more live networks. Within this transition period a
careful benchmarking of the VoLTE services itself as well as of legacy technologies is extremely important to
guarantee a quality standard that is comparable to well-known 2G and 3G services. For quite a while both
technologies will be used in parallel. It is not only important to benchmark the technologies to each other but
also to benchmark cross-technology connections. This is the daily experience for the majority of VoLTE
users’ calls and in the near future VoLTE to CS interconnections will be the majority of mobile calls in
general.
This transition period may last the next few years and requires a dense supervision by benchmarking and
optimization teams. Measurements on the application layer using POLQA help to guarantee the best voice
call experience. As can be seen in the data presented above, optimizing certain elements such as jitter
buffers can help improve conversation quality and when done correctly, it can do this without a terrible
compromise to audio quality. This report gives first example values for voice call performance and highlights
the trade-off between speech quality and audio delay. The results indicate that VoLTE may provide the best
voice call performance overall, appearing to be better than today’s 3G networks.
However, the integration of VoLTE functionality into handsets plays an important role in the end user voice
experience, much more than in the past. Benchmarking systems used to test VoLTE networks must use
commercially available devices designed for use on the network they are testing. It is also critical for the
measurement system to emulate real world voice call use cases along with having the full complement of RF
(Layer 1, 2, and 3), SIP, IMS, and RTP, and all other TCP/IP measurement capabilities to really understand
what is behind good or bad network performance. This will ensure that network operators will have a true
image about the user’s satisfaction when using VoLTE.
The collected and analyzed results in this study show already impressive and rapid improvements made
within the months from early commercial launch. The data collection was made in a period where one
operator has already changed to transcoding-free AMR-WB processing across technologies while others
have still to do this step. The recognition of this advantage could be done on the application layer by
measuring high quality in wide audio bandwidth in combination with a short transmission delay. Both are
valid indicators that this technology is enabled and will live up to expectations.
Conversely, the road ahead to a 100% IP based communications world will be full of pitfalls. The upcoming
change over from 2G/3G systems to all IP based LTE is happening now and with that there will be more re-
farming of spectrum, continued explosion of LTE bands deployed, carrier aggregation, WiFi offloading, etc.
etc. As network operators strive to deliver all the promises of high speed data networks, it will be critical for
them to keep their eye on voice performance. Nothing could be more damaging than customer churn
created by a poor VoLTE experience that is only consuming 12kpbs of data bandwidth.
Chapter 4 | Conclusion 14

Benchmarking of Volte Services A First Field Experience: A Rohde & Schwarz Company

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Benchmarking of Volte Services A First Field Experience: A Rohde & Schwarz Company

Загружено:

Авторское право:

Доступные форматы

A Rohde & Schwarz Company

Benchmarking of VoLTE Services

A first field experience

Part Number: 12-070-200912-4

Copyright 2000 - 2015 SwissQual AG. All rights reserved.

Apple is a trademark of Apple Computer, Inc.

DIMENSION®, LATITUDE®, and OPTIPLEX® are registered trademarks of Dell Inc.

ELEKTROBIT® is a registered trademark of Elektrobit Group Plc.

Google® is a registered trademark of Google Inc.

INTERNET EXPLORER®, SMARTPHONE®, TABLET® are registered trademarks of Microsoft Corporation.

Java™ is a U.S. trademark of Sun Microsystems, Inc.

Linux® is a registered trademark of Linus Torvalds.

NOKIA® is a registered trademark of Nokia Corporation.

Oracle® is a registered US trademark of Oracle Corporation, Redwood City, California.

SAMSUNG® is a registered trademark of Samsung Corporation.

SIERRA WIRELESS® is a registered trademark of Sierra Wireless, Inc.

TRIMBLE® is a registered trademark of Trimble Navigation Limited.

U-BLOX® is a registered trademark of u-blox Holding AG.

UNIX® is a registered trademark of The Open Group.

A first field experience

1 VoLTE as a new voice service ............................................................................................................ 1

2 VoLTE - Technicals very simple ......................................................................................................... 2

3 How to benchmark VoLTE? ................................................................................................................ 2

A first field experience

CONFIDENTIAL MATERIALS iii

A first field experience

1 VoLTE as a new voice service

Chapter 1 | VoLTE as a new voice service 1

A first field experience

2 VoLTE - Technically very simple

Figure 1: Basic flow in a VoLTE to VoLTE connection

3 Testing and Benchmarking VoLTE

Chapter 2 | VoLTE - Technically very simple 2

A first field experience

VoLTE Call Setup

Figure 2: Example of SIP message flow in a VoLTE to VoLTE call

Chapter 3 | Testing and Benchmarking VoLTE 3

A first field experience

Figure 3: Call Set-Up times for mobile to PSTN connections

Figure 4: Call Set-Up times for mobile to mobile connections

Chapter 3 | Testing and Benchmarking VoLTE 4

A first field experience

VoLTE Audio Transmission

Figure 5: Voice codec information in VoLTE RTP header

Figure 6: IP Throughput in a VoLTE call

Chapter 3 | Testing and Benchmarking VoLTE 5

A first field experience

Audio quality in VoLTE

The resulting question is:

Figure 7: Examples for listening quality for VoLTE in comparison to 3G calls

Chapter 3 | Testing and Benchmarking VoLTE 6

A first field experience

Figure 8: Examples of quality distribution for VoLTE and 3G calls

Audio quality and transcoding

Chapter 3 | Testing and Benchmarking VoLTE 7

A first field experience

Audio delay and conversation quality

Chapter 3 | Testing and Benchmarking VoLTE 8

A first field experience

Absolute mean audio delay

Figure 9: Examples of audio delay in mobile to PSTN calls

Chapter 3 | Testing and Benchmarking VoLTE 9

A first field experience