Вы находитесь на странице: 1из 5

multimedia scheme or because the call

is being placed onto the PSTN.

Voice over IP Figure 1 displays processing that

must occur between the user’s voice
input and output. The diagram illus-
trates the necessary steps to achieve
packetized voice, from data processing
Hey by the digital signal processor (DSP) to
baybee... what’s transmitting packets over IP. There are
happening...? numerous packet-handling processes
that must be encountered; hence, a
nontrivial amount of latency (time
delay) is present, which affects per-
ceived voice quality.

Sounding good Once a user dials a telephone num-
ber (or clicks a name hyperlinked to a
on the Internet telephone number), signaling is


required to determine the status of the
called party—available or busy—and
to establish the call. Signaling System
7 (SS7) is the set of protocols (stan-
Princy Mehta and Sanjay Udani dards for signaling) used for call setup,
teardown, and maintenance in the
Public Switched Telephone Network
(PSTN). It is currently the one being
used in North America to establish and

T he bulk of information
conveyed over public
telecommunication net-
works is voice. To do
this, circuit-switched
Internet instead of solely over a dedi-
cated circuit. Finally, two VoIP appli-
cations can communicate directly with-
out accessing the PSTN.
terminate telephone calls. SS7 is
implemented as a packet-switched net-
work and typically uses dedicated
links, nodes and facilities. In general, it
is a non-associated, common channel
networks are employed. Components of VoIP out-of-band signaling network allow-
While circuit switching The Public Switched Telephone ing switches to communicate during a
provides adequate voice quality, it can Network (PSTN) is the collection of all call. SS7 signals may traverse real or
be highly inefficient. In contrast, the the switching and networking equip- virtual circuits on links that also carry
Internet’s packet-switched networks are ment that belongs to the carriers that are voice traffic.
much more efficient but ill suited for involved in providing telephone service. However, the industry is moving
voice without judicious implementation. In this context, the PSTN is primarily toward a converged network infrastruc-
Voice over Internet Protocol (VoIP) the wired telephone network and its ture to provide a more efficient and
wants to provide the efficiency of a access points to wireless networks, such effective way of handling increased
packet-switched network while rivaling as cellular. The overall technology call volumes as well as deliver new
the voice quality of a circuit-switched requirements of an Internet Protocol and enhanced services. The integration
network. (IP) telephony solution can be split into of SS7 and IP will provide significant
Because voice applications are real four categories: signaling, encoding, benefits. Figure 2 depicts a type of
time, they are intolerant of lengthy transport and gateway control. VoIP network utilizing an SS7-to-IP
delays, packet losses, out-of-order The purpose of the signaling proto- gateway. SS7 provides the call control
packets and jitter. All these problems col is to create and manage connections on either side of the traditional PSTN,
gravely degrade the quality of the voice between endpoints, as well as the calls while H.323/Session Initiation Protocol
transmitted to the recipient. themselves. Next, when the conversa- (SIP) provides call control in the IP
Unfortunately, wireless networks exac- tion commences, the analog signal pro- network. (Neither H.323 nor SIP alone
erbate the problems that are intrinsical- duced by the human voice needs to be has a complete set of IP telephony pro-
ly prevalent in their wire line counter- encoded in a digital format suitable for tocols.) The media gateway provides
parts: a higher frequency of dropped transmission across an IP network. The circuit-to-voice conversion.
packets, larger latency and more jitter. IP network itself must then ensure that
VoIP can be implemented in several the real-time conversation is transport- H.323
ways. A Public Switched Telephone ed across the available media in a man- H.323, ratified by the International
Network (PSTN)-based telephone can ner that produces acceptable voice qual- Telecommunication Union-
communicate with a VoIP application, ity. Finally, it may be necessary for the Telecommunication (ITU-T), is a set of
and vice versa. These telephones can IP telephony system to be converted by protocols for voice, video, and data
also communicate with each other a gateway to another format-either for conferencing over packet-based net-
where part of the call is routed over the interoperation with a different IP-based works, such as the Internet. The H.323

36 0278-6648/01/$10.00 © 2001 IEEE IEEE POTENTIALS

protocol stack is designed to operate location. This mobility can be augment- For instance, the evolution of H.323
above the transport layer of the under- ed via wireless VoIP. from versions 1 through 4 has focused
lying network. Therefore, H.323 can SIP supports five facets of establish- on decreasing call setup delay from sev-
be used on top of any packet-based ing and terminating multimedia com- eral round trips to be on par with SIP’s
network transport, for instance TCP/IP, munications: 1.5 round trips. This reduces its signal-
to provide real-time multimedia com- • User location: determination of the ing overhead. Obviously, this conver-
munication. end system to be used for communication; gence is highly desirable. (Both support
H.323 specifies protocols, including • User capabilities: determination the majority of required end-user func-
Q.931, H.225, H.245, and ASN.1, for of the media and media parameters to tions comparatively equally, such as call
real-time point-to-point audio commu- be used; setup, teardown, call holding, call trans-
nication between two terminals on a • User availability: determining the fer, call forwarding, call waiting and
packet-based network that does not called party’s willingness to engage in conferencing.)
provide a guaranteed quality of service communications;
(QoS). The scope of H.323, however, is • Call setup: “ringing,” establishing Voice coders
much broader and encompasses net- call parameters at both called and call- An efficient voice encoding and
working multipoint conferencing ing party; decoding mechanism is vital for using
among terminals that support not only • Call handling: including transfer the packet-switched technology. The
audio but also video and data commu- and termination of calls. purpose of a voice coder (vocoder)-also
nications. SIP can also initiate multiparty calls referred to as a codec (coding/decod-
In a general H.323 imple- ing)-is to use the analog sig-
mentation, three logical enti- nal (human speech) and
ties are required: gateways, transform and compress it
gatekeepers and multipoint into digital data. A number
control units (MCUs). of factors must be taken
Input/Output Digital Signal Packet
Terminals, gateways, and into account including
Processing Handling
MCUs are collectively bandwidth usage, silence
known as endpoints. It is DSP compression, intellectual
possible to establish an Coding property, look-ahead and
H.323-enabled network with frame size, resilience to
just terminals, which are Buffering and Jitter Buffer loss, layered coding, and
H.323 clients. Yet for more Packetization fixed-point vs. floating-
than two endpoints, a MCU point digital signal proces-
is required. It can be com- TCP/IP sor (DSPs).
bined with a terminal, gate- Protocol Stack The bit-rate of available
way or gatekeeper. Fig. 1 VoIP processing narrowband vocoders
and handling ranges from 1.2 to 64 kbps,
Network Interface
SIP Device
with an inevitable effect on
Session Initiation Protocol, the quality of the restituted
SIP, defined by the Internet PHY voice. There is ordinarily,
Engineering Task Force Layer but not always, a trade-off
(IETF), is a signaling protocol between voice quality and
for telephone calls over IP. bandwidth used. Using
Unlike H.323, however, SIP was using a multipoint control unitMCU or a today’s most efficient vocoder allows
designed specifically for the Internet. It fully-meshed interconnection instead of a quasi-toll quality bandwidth usage to be
exploits the manageability of IP and multicast. Gateways that connect PSTN as low as 5 kbps. Toll quality is recog-
makes developing a telephony applica- parties can also use SIP to set up calls nized as the standard of a long-distance
tion relatively simple. SIP is an applica- between them. The protocol is designed PSTN call. As newer and more sophisti-
tion-layer control (signaling) protocol for as part of the overall Internet Engineering cated algorithms are developed, this
creating, modifying and terminating ses- Task Force (IETF) multimedia data con- bit-rate will decrease. This will permit
sions with one or more participants. trol architecture. It incorporates many more samples to be squeezed more effi-
SIP can be employed to initiate ses- protocols, for example Resource ciently while minimally sacrificing
sions and invite members to sessions Reservation Protocol (RSVP) and Real- quality, if at all.
that have been advertised by other Time Transport Protocol (RTP), for prop- The algorithmic delay introduced by
means, such as via multicast protocols. er functionality and operation. a coding/decoding sequence is the
The signaling protocol transparently frame length plus the look-ahead size. A
supports name mapping and redirection H.323 vs. SIP vocoder with a small frame length has a
services. This allows the implementa- H.323 and SIP are competing to shorter delay than one with a longer
tion of intelligent network telephony obtain dominance of IP telephony sig- frame length, but it introduces a larger
subscriber services. These facilities also naling. Currently, there is no clear-cut overhead. Most implementations choose
enable personal mobility-the ability of winner. However, the standards appear to send multiple frames per packet.
end users to originate and receive calls to be evolving such that the best fea- Thus, the real frame length to take into
and access subscribed telecommunica- tures of each are being implemented in account is the sum of all frames stacked
tion services on any terminal in any the other’s protocol. in a single IP packet. The smaller the

connection and negotiate the
media format that will be used.
3/SIP RTP does not guarantee deliv-
H.32keeper ay
ate gatew
g SS7 Call r ery or prevent out-of order
SS7 ato
termin delivery, nor does it assume that
ay the network can reliably deliver
SS7 packets in sequence.
twork PSTN
IP ne
VoIP ay SS7
gatew Real-Time Control Protocol
Call o H.32
3/ (RTCP) is based on the periodic
PSTN VoIPay SIP transmission of control packets
gatew to all participants in the session.
SS7 It uses the same distribution
Fig. 2 SS7-based VoIP network mechanism as the data packets.
The underlying protocol must
provide multiplexing of the data
and control packets. RTCP per-
frame size, the more frames in an IP Transport forms the following functions:
packet; thereby, there is minimal influ- Once signaling and encoding occur, • Provide feedback on the quality of
ence on latency. Real-time Transport Protocol (RTP) and the data distribution (primary function);
ITU-T specs Control Protocol Table 1 Summary of ITU vocoders
The International Telecommunication (RTCP) are uti-
Union-Telecommunication (ITU-T) has lized to move Voice coder Bit-rate Frame length Expected MOS
a rigorous process in approving the voice pack- G.711 (PCM) 64 kbps 1 ms 4.1
vocoders. Before a vocoder is chosen, ets. Media G.723.1 (MP-MLQ) 6.3 kbps 30 ms 3.9
the ITU evaluates its mean opinion score streams are
(MOS) and often requires toll quality or p a c k e t i z e d G.723.1 (ACELP) 5.3 kbps 30 ms 3.65
better. To determine the MOS, trained according to a G.726 (ADPCM) 32 kbps 0.125 ms 3.85
evaluators rate the overall quality of predefined for- G.729A (CS-ACELP) 8 kbps 10 ms 3.7
speech samples and assign a subjective mat. RTP pro-
score. Three popular ITU-approved vides delivery monitoring of its payload • Carry a persistent transport-layer
vocoders are summarized in Table 1; types through sequencing and time identifier for a Real-time Transport
the expected MOS can range from a stamping. RTCP offers insight on the Protocol (RTP) source, canonical
scale of 1 (bad) to 5 (excellent). performance and behavior of the media name;
stream, such as voice stream jitter. RTP • Controls the rate in order for RTP
and RTCP are intended to be inde- to scale up to a large number of partici-
Future voice coders pendent of the signaling protocol, encod- pants; and
One recently established vocoder is ing schemes, and network layers imple- • Conveys minimal session control
the Mixed Excitation Linear Predictive mented. information.
(MELP) vocoder, which utilizes a
miniscule 2.4 kbps. Another high quali- RTP Gateway control
ty speech vocoder is being developed Real-time Transport Protocol (RTP) Gateways are responsible for con-
based on the Multi-Band Excitation provides end-to-end delivery services verting packet-based audio formats into
(MBE) model operating at both 2.4 for data with real-time characteristics. protocols understandable by PSTN sys-
kbps and 1.2 kbps. The trend in indus- Those services include payload type tems. The aforementioned signaling
try appears to be developing vocoders identification, sequence numbering, protocols provide more services than
that utilize less bandwidth than their time stamping and delivery monitoring. are necessary, such as service creation
predecessors do. Applications typically run RTP on top and user authentication, which are irrel-
Since the early 1990s, the ITU has of User Datagram Protocol (UDP) to evant for gateways. Vendors have grav-
forged ahead from the 64 kbps G.711 to make use of its multiplexing and check- itated towards simplified Device
the more recent G.723.1 specification sum services. In fact, both protocols Control Protocols rather than all-
that consumes merely one-twelfth of contribute parts of the transport proto- encompassing signaling protocols.
that bandwidth. This bandwidth savings col functionality; however, RTP may The IETF standard Media Gateway
commonly comes at the cost of lower be used with other apposite network- Control Protocol (MGCP) is a merger
quality and robustness to hostile net- layer or transport-layer protocols. between the Internet Protocol Device
work environments. Given the RTP does not intrinsically provide Control and the Simple Gateway
inevitable increase in the average user’s any mechanism to ensure timely deliv- Control Protocol. The Megaco protocol
bandwidth over time, perhaps this ery or provide other Quality of Service (H.248), which is still evolving, is
effort would be better directed at guarantees. Instead, RTP relies on MGCP’s progeny. It contains all of
improving quality first, then addressing lower-layer services to provide them. A MGCP’s functionality, plus superior
bandwidth. signaling protocol also must set up the controls over analog telephone lines

and the ability to transport multiple those of traditional data applications. packets (or their order), the data will
commands in a single packet. Because they are innately real-time, arrive at very inconsistent rates. The
Media gateways will be the junc- voice applications tolerate minimal variation in inter-packet arrival rate is
tions that provide paths between delay in delivery of their packets. jitter, which is introduced by variable
switched and packet networks for Additionally, they are intolerant of transmission delays over the network.
voice. When media gateways are ini- packet loss, out-of-order packets, and Removing jitter to allow an equable
tially set up for communication, a jitter. To effectively transport voice stream requires collecting packets and
vocoder approach normally is used. traffic over IP, mechanisms are storing them long enough to permit the
Megaco-related standards will enable required that ensure reliable con- slowest packets to arrive in time to be
support of existing and new applica- veyance of packets with low and con- played in the correct sequence. The jit-
tions of telephone service over hybrid trolled latency. ter buffer is used to remove the packet
telephone networks containing an Another approach utilizes Resource delay variation that each packet
assortment of technologies. Reservation Protocol (RSVP) which is encounters transiting the network. Each
a relatively new protocol developed to jitter buffer adds to the overall delay.
Wireless networks enable the Internet to support QoS.
An emerging trend for implementing Using RSVP, a VoIP application can Latency
VoIP-and, in general, connecting com- reserve resources along a route from Latency is the time delay incurred in
puting devices-is in wireless networks. source to destination. RSVP-enabled speech by the Internet Protocol (IP)
A wireless local area network (WLAN) routers will then schedule and prioritize telephony system. One-way latency is
is a data transmission system designed packets to fulfill the QoS. RSVP is part the amount of time measured from the
to provide location-independent net- of the Internet Integrated Service (IIS) moment the speaker utters a sound until
work access between computing devices model, ensuring best-effort service, real- the listener hears it. Round trip latency
by using radio waves rather than a cable time service, and controlled link sharing. is the sum of the two one-way latency
infrastructure. WLANs give users wire- While QoS is an extension to IPv4- figures that compose the user’s call.
less access to the full resources and ser- the current version of IP-IPv6 (the suc- The lower the latency, the more natural
vices of the LAN across a building or cessor of IPv4) will inherently support interactive conversation becomes;
campus environment. QoS. However, IPv6 also has a much accordingly, the additional delay
For voice applications, wireless net- larger packet header, so it is possible incurred by the VoIP system is less
works aggravate the problems already that while QoS will alleviate much of noticeable. In PSTN calls, the round
prevalent in wireline networks: a higher the jitter and congestion voice packets trip latency of calls originating and ter-
frequency of dropped packets, larger presently suffer, it could come at the minating within the continental United
latency and more jitter. Furthermore, cost of increased latency. IPv6 headers States is under 150 ms.
there are additional security issues: it is necessitate 40 bytes, compared with 20- In a VoIP implementation used to
relatively easier for an unauthorized byte IPv4 headers, thus doubling the reduce costs, studies suggest that users
device to surreptitiously eavesdrop on a overhead. This may pose trouble for will tolerate one-way latency of up to
conversation. Finally, interference vocoders that only succeed with 200 ms. The 1996 ITU Recommendation
between different wireless technologies diminutive packets. Nevertheless, this G.114 for one-way end-to-end transmis-
must be considered when they are both larger packet overhead can be partially sion time limit is:
operating on the same frequency band. offset if IPv6 provides for efficient • Under 150 ms: acceptable for most
compression schemes for the header. user applications;
QoS • 150 to 400 ms: acceptable provid-
The basic routing philosophy on the ed administrators know of the transmis-
Internet is “best-effort.” This attitude Packet loss sion time impact on the quality of user
serves most users acceptably but it is UDP cannot provide a guarantee that applications; and
not adequate for the time-sensitive, packets will be delivered at all, much • Over 400 ms: unacceptable for
continuous stream transmission less in order. Packets will be dropped general network planning purposes.
required for VoIP. under peak loads and during periods of Two difficulties are echo and talker
Quality of Service (QoS) refers to congestion. Due to time sensitivity of overlap that result from a high end-to-
the ability of a network to provide bet- voice transmissions, the normal TCP- end delay in a voice network. Echo-
ter, more predictable service to selected based retransmission schemes are not wherein the speaker’s voice is reflected
network traffic over various underlying appropriate. Approaches used to com- back-becomes a problem when the
technologies, including IP-routed net- pensate for packet loss include interpo- round-trip delay is more than 50 ms.
works. QoS features are implemented lation of speech by replaying the last Since echo is perceived as a significant
in network routers by: packet and sending redundant informa- quality obstacle, the VoIP system must
• Supporting dedicated bandwidth; tion. Packet losses greater than 10 per- address the need for echo control by
• Improving loss characteristics; cent are generally intolerable, unless implementing echo cancellation. Talker
• Avoiding and managing network the encoding scheme provides extraor- overlap-the problem of one caller step-
congestion; dinary robustness. ping on the other talker’s speech-is made
• Shaping network traffic; and worse when the one-way delay is greater
• Setting traffic priorities across the than 250 ms. The end-to-end delay bud-
network. Jitter get, therefore, is the major constraint and
Voice applications have different Inasmuch as IP networks cannot driving requirement for reducing latency
characteristics and requirements from guarantee the delivery time of data through a packet network.

phone system can never do: transport The Internet Society, Mar. 1999.
Bit-rate vs. voice quality high fidelity stereo audio. A potential • Oliver Hersent, David Gurle, and
As previously mentioned, many application would be allowing users to Jean-Pierre Petit, IP Telephony: Packet-
developers have focused on designing call another VoIP application to listen Based Multimedia Communications
vocoders that consume progressively to high quality, compressed music, for Systems. Harlow, England: Addison-
lower bandwidth. Moreover, many algo- instance in MP3 format, consuming a Wesley, 2000.
rithms were created for using voice over mere 128 kbps. Of course, there are • “Leveraging the intelligence of
a reliable circuit-switched connection other issues involved, such as the serv- SS7 to improve IP-based remote access
rather than the packet-based network the er’s ability to provide music at this and other IP services,” 3Com Corp.,
Internet utilizes. This effort might be fidelity while being able to scale. May 19, 1999.
misdirected. Most applications of VoIP • Alan Percy, “Understanding laten-
rely on connectivity to the Internet, Summary cy in IP telephony,” Brooktrout
where the vast majority of its users have It remains to be seen when VoIP can Technology, Feb. 1999.
a 28.8 kbps or higher connection. emerge from a specialized application • H. Schulzrinne, S. Casner, R.
Nonetheless, developers are still pursu- to mainstream voice communication. Frederick, and V. Jacobson, “RTP: A
ing ultra-low bandwidth vocoders While VoIP technology may have pro- Transport Protocol for Real-Time
instead of improving the quality of low gressed admirably, as gauged by proto- Applications,” RFC 1889, Jan. 1996.
bandwidth vocoders already in exis- col and vocoder maturity, it still has
tence. Perhaps this effort is intended to plenty of room for improvement as About the authors
allow users to concurrently enjoy other indicated by the following drawbacks: Princy Mehta earned his MS degree
bandwidth-consuming applications, • Erratic quality of voice transmis- in Telecommunications and
such as browsing the World Wide Web. sions; Networking Engineering from the
Some developers are alternatively • Unreliability of IP networks; University of Pennsylvania and his BS
constructing higher quality vocoders • Standards battles; degree in Computer Engineering from
that consume more bandwidth. They • Encroaching/competing wireless Rutgers University. He is currently
are amenable to trading-off bandwidth technologies; and employed with Lockheed Martin Naval
to achieve this quality. It is also critical • Confusing human usability factors. Electronics & Surveillance Systems-
for the vocoder to tolerate mishandled, Reliability cannot be overempha- Surface Systems as a Member of
dropped and out-of-order packets sized. The PSTN operates with at least Engineering Staff. A graduate of the
intrinsic in the User Data Protocol 99.999 percent specified availability and company’s Engineering Leadership
(UDP). Of equal importance, one-way is available even during power outages. Development Program, his professional
latency should be confined to one- This cannot be said of modern VoIP endeavors include Voice over IP and
quarter of one second. Finally, the applications; consequently, VoIP’s reli- network systems security, in which he
vocoder should maintain an optimally ability must improve in the near future is SANS GIAC certified. Prior to his
sized buffer to restrain jitter, echo, and for it to gain wide acceptance and let recent responsibilities, Princy pro-
talker overlap. users sound good on the Internet. grammed in C and shell script, adminis-
Since users may not endure inferior tered a variety of Unix systems, and
performance, the focus should be on Read more about it developed embedded DSP applications
high quality instead of ultra-low bit- • Philip Carden, “Building Voice for multiprocessors. E-mail: <prme-
rate. Manifestly, 64 kbps is too high for over IP,” Netw. Comput., May 2000. hta@seas.upenn.edu>.
users dialing up via analog modems to • Linden deCarmo, “The media gate- Sanjay Udani received his PhD,
connect to the Internet; nevertheless, a way control protocol: A simpler and MSE and BSE/BS Economics degrees
higher quality vocoder could be prefer- more reliable voice over the internet,” from the University of Pennsylvania.
able to a low quality vocoder. In a cor- Dr. Dobb’s Journal, May 2000. He is currently a Distinguished
porate or broadband environment, even • Bill Douskalis, IP Telephony: The Member of Technical Staff in
64 kbps is just noise in the line when Integration of Robust VoIP Services. Verizon’s Technology organization in
the average user is allotted hundreds, if Upper Saddle River, NJ: Prentice-Hall, Arlington, VA. He is an adjunct faculty
not thousands, of kilobits per second. 2000. member at the University of
Another possibility is developing • M. Handley, H. Schulzrinne, E. Pennsylvania, teaching a graduate
higher bandwidth vocoders to allow Schooler, and J. Rosenberg, “SIP: telecommunications course. Prior to
something that the traditional tele- Session Initiation Protocol,” RFC 2543, joining Verizon he was involved in
VLSI and ASIC design at Intel, and
dabbled with large-scale virtual envi-
www.ieee.org/gold ronment network design as well as
power management for mobile comput-
How do you manage your daily personal or work calendar? ing while in graduate school. E-mail:
And the survey says... <udani@seas.upenn.edu>.
Electronic hand-held calendar–28% Paper “Day Timer” –29%
PC calendar–25% “Post-it” notes–11% Note: The full (unabridged) ver-
Calendar on the refrigerator–7% (as of 17 Oct. ’01) sionof this paper is posted at
http://www. cis.upenn.edu/~techre-
• Check out this web site’s connections, programs and resources, and ports/and is listed as Technical Report
• Express your opinion on the latest survey question posted. MS-CIS-01-31.