Voice Fundamentals Book

Voice Fundamentals
Foreword
After more than 100 years of experience in providing global telecommunications solutions, Nortel (Northern Telecom) has acquired Bay Networks, Inc., adding worldclass, IP-based data communications capabilities that complement and expand Nortel's acknowledged strengths. This precedent-setting union creates Nortel Networks, a new company with a widely respected heritage and a unique market position: Unified Networks. Unified Networks create greater value for customers worldwide through network solutions that integrate data networking and telephony. The Unified Networks strategy extends to solutions, products, and services, delivering new economics in networking by reducing costs, introducing higher revenue services, and delivering new value derived through networking. The emergence of the World Wide Web and increasing market deregulation has created a strong demand for networks that provide increased profitability and higher service levels for organizations of all types. Unified Networks from Nortel Networks deliver cutting-edge solutions to reach these new levels of economics. To meet increased needs, Nortel Networks is delivering a new class of customer relationships. Extranets, intranets, Web access, e-mail, call centers, and old-fashioned personal attention are combined to help customers deal with a wide range of new, challenging, and potentially confusing issues. Whether a customer is at the enterprise level, a service provider, or a small business, Nortel Networks delivers Unified Networks solutions designed to meet their unique business challenges. Solutions based on Unified Networks strategies can take many forms, involving many different products and technologies - including those that existed prior to the merger with Bay Networks and many launched since. Unified Networks solutions are differentiated only by their size, scope, and ambition. Each solution is tailored to the unique needs of the customer, and integrates a variety of products, technologies, and services, some of which are described below. Accelar brings together switching and routing into a low-cost, very high-performance package. CallPilot unifies disparate messaging systems, user interfaces, and presentation formats, making messaging more intuitive and easier to use.
Introduction
Voice Fundamentals
Integrated Network Management equips service providers with a common network management platform. IPConnect brings sophisticated telephony services to IP networks. Optivity offers a portfolio of advanced enterprise network management applications and services. Passport supplies enterprises with wide area networking (WAN) solutions that consolidate multiple services, including voice and data, onto a single service provider networking infrastructure. PowerTouch terminal products and advanced telephony and information appliances provide customers with easy and convenient access to network services. Reunion broadband wireless access solutions supply customers with high-speed data, telephony, and Internet services. Symposium advanced customer transaction centers enhance and expedite the way businesses interact with their client base. TransportNode delivers the capacity and reliability of the Optical Internet. Wireless products bring voice, data, and mobility together with an Optical Internet. Network services help to quickly transform disparate networks, architectures, and implementations into high-value Unified Networks.
Regardless of the products, applications, and services used, Unified Networks solutions deliver an integrated approach designed to meet the challenges faced by businesses and organizations of all sizes. Nortel Networks offers a high level of expertise, integrating networks, products, applications, services, and technologies to address the most pressing networking issues faced by businesses today. Unified Networks deliver a new level of economics - based on higher performance, lower cost of ownership, and a range of new high-value applications - designed to improve how business gets done.
Best in class
#1 #2 #2 #1 #1 #1 #1 #1 #1 #1 #3 ATM enterprise switch - VSG 1998 Overall ATM Equipment - VSG 1998 ATM LAN Switch - VSG 1998 Frame relay enterprise switch - VSG 1998 Frame relay switches sold to enterprises - In-Stat 1H98 Worldwide frame relay switch - VSG 1998 Overall Frame relay equipment - VSG 1998 Worldwide enterprise switch - VSG 1998 Overall multiservice switch in 1997 - Dell'Oro Multiservice switches sold to enterprises - Dell'Oro 1H98 Multiservice switches sold to ISPs - Dell'Oro 1H98
II
Introduction
Voice Fundamentals
#1 #1 #1 #1
FRADs - Dataquest 1998 FRAD revenues - Dell'Oro 1H98 Packet Switch - Dataquest 1998 in PADs - Dataquest 1998
Voice Communication
Voice communication has long existed in the world of analog and digital telephone exchanges. Fixed, dedicated, switched services have provided the user with the ability to place telephone calls to practically anywhere in the world. The way that voice is carried and switched within both public and private networks is changing with the evolution towards the Broadband-Integrated Services Digital Network (B-ISDN), as based upon Asynchronous Transfer Mode (ATM) technology. This evolution is accelerating the shift to enterprise networks capable of handling voice, video, and data transmissions over a single, integrated infrastructure. This evolution delivers numerous benefits, including more efficient use of network bandwidth and the ability to offer many different types of voice service. However, there are many issues that first need to be understood, and then overcome, before these benefits can be realized. This booklet examines these issues, highlighting voice communication techniques in use today and investigating those that may become commonplace in the future. Of course, with the limited space available, every technological aspect will not be covered in depth. This booklet is designed to serve as useful introduction to the main subject areas, and additional references have been included for individuals interested in learning more. References are given by a number in square brackets and listed in Appendix D e.g. [3].
Intended Audience
This booklet is intended for a broad range of readers who are involved in the design, implementation, operation, management, or support of enterprise networks carrying both voice and data traffic. It has particular relevance to those with a background in data and/or limited experience in voice technologies, although it will also serve as a useful reference for anyone involved in voice networks today.
Introduction
III
Voice Fundamentals
IV
Voice Fundamentals
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1 A Short History of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
The Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.1 Key Voice Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Basic Operation of the Telephone . . . . . . . . . . . . . . . . . . 2.2.1 Basic Telephony - Signaling . . . . . . . . . . . . . . . . . . 2.2.2 Basic Telephony - The Speech Path . . . . . . . . . . . 2.3 Other Types of Telephones . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 "Two-wire" vs. "Four-wire" Telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 .3 .4 .4 .4 .7 .7 .8
PBX Phone Systems
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
3.1 Introduction to the PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 3.2 Call Routing in a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 3.3 Voice Interfaces on a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Introduction to Digital Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

4.1 The Channel Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Digital Voice - Pulse Code Modulation (PCM) G.711 . . . . . . . . 4.2.1 A-law and -law PCM . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Power of a Digital Signal . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Distortion Resulting from the Digitization Process . . . . . 4.3 The Digital 1.544 Mbps PBX Interface (DS-1) . . . . . . . . . . . . . 4.3.1 Physical Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Framing - D4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Framing - Extended Superframe (ESF) . . . . . . . . . . . . . 4.3.4 Channel Associated Signaling (CAS) on DS-1 . . . . . . . . 4.3.5 Common Channel Signaling on DS-1 . . . . . . . . . . . . . . 4.3.6 DS-1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The Digital 2.048 Mbps PBX Interface (E1) . . . . . . . . . . . . . . . 4.4.1 Physical Interface - G.703 . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Framing Structure - G.704 . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Channel Associated Signaling (CAS) on E1 . . . . . . . . . . 4.4.4 Common Channel Signaling (CCS) on E1 . . . . . . . . . . . 4.4.5 E1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 .13 .16 .16 .17 .17 .17 .18 .18 .19 .20 .20 .20 .20 .21 .23 .24 .24
Table of Contents
Voice Fundamentals
4.5 The Need for PBX Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 4.5.1 PBX Systems Without Synchronization . . . . . . . . . . . . . . . . . . . .26 4.5.2 PBX Systems With Synchronization . . . . . . . . . . . . . . . . . . . . . . .26
Speech Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

5.1 5.2 5.3 5.4 5.5 5.6 5.7 Different Coding Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adaptive Differential Pulse Code Modulation (ADPCM) . . . . . . . Code Excited Linear Prediction (CELP) . . . . . . . . . . . . . . . . . . . Low Delay-CELP (LD-CELP) ITU-T G.728 . . . . . . . . . . . . . . . . . Conjugate Structure-Algebraic CELP (CS-ACELP) ITU-T G.729 Other Compression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . Speech Compression Impairments . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Mean Opinion Score (MOS) . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Quantization Distortion Units (QDUs) . . . . . . . . . . . . . . . . 5.7.3 Speech Compression and Voice-Band Data . . . . . . . . . . . 5.8 Fax Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 .31 .33 .34 .34 .35 .36 .36 .38 .39 .39
Echo and Echo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

6.1 What is Echo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Causes of Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 When Is Echo Control Required? . . . . . . . . . . . . . . . . . . . 6.2.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Echo Suppressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Nonlinear Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Tail Circuit Considerations . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Types of Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Tone Disabling of Echo Cancellers and Echo Suppressors 6.4.5 G.168 (Improved Echo Canceller) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 .42 .43 .43 .44 .45 .45 .46 .47 .48 .49 .49
7 Introduction to Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

7.1 Analog Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Ground Start: 2 Way PBX to Public Exchange Trunk 7.1.2 E&M Trunk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 AC Signaling Schemes . . . . . . . . . . . . . . . . . . . . . 7.1.4 Manual Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Digital Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Channel Associated Signaling (CAS) . . . . . . . . . . . 7.2.2 Common Channel Signaling (CCS) . . . . . . . . . . . . ...... Circuit ...... ...... ...... ...... ...... ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 .51 .52 .55 .56 .58 .58 .59
VI
Table of Contents
Voice Fundamentals
7.2.2.1 7.2.2.2 7.2.2.3 7.2.2.4
Private-to-Public Networking Protocols Public to Public Networking Protocol . Private Networking Protocols . . . . . . . How Does CCS Work? . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.60 .60 .60 .61
Voice Within the Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . .65

8.1 What is an Enterprise Network? . . . . . . . . . 8.1.1 Different Types of Enterprise Networks 8.2 Time Division Multiplexers (TDM) . . . . . . . . 8.2.1 TDM Synchronization . . . . . . . . . . . . ......... ........ ......... ......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 .66 .68 .70
Voice Over Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . .71

9.1 Introduction to ATM . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 The ATM Cell . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 The ATM Adaptation Layers . . . . . . . . . . . . . . 9.1.3 ATM Service Categories . . . . . . . . . . . . . . . . . 9.1.4 Statistical Multiplexing with ATM . . . . . . . . . . . 9.2 Voice and Telephony Over ATM (VTOA) . . . . . . . . . . 9.2.1 Voice Sample Cellification . . . . . . . . . . . . . . . . 9.2.2 Speech Activity Detection . . . . . . . . . . . . . . . . 9.3 PBX Synchronization Across an ATM Network . . . . . 9.3.1 Synchronous Residual Time Stamp (SRTS) . . . 9.3.2 Adaptive Clock Recovery (ACR) . . . . . . . . . . . 9.3.3 Independent Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 .71 .72 .75 .76 .77 .78 .79 .80 .81 .81 .82
10 Voice Over Frame Relay
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 .85 .86 .86 .88 .88 .89
10.1 Introduction to Frame Relay . . . . . . . . . . . . 10.2 Voice Over Frame Relay (VoFR) . . . . . . . . . 10.2.1 Delay on Frame Relay Networks . . . . 10.2.2 VoFR Standards . . . . . . . . . . . . . . . . 10.3 Benefits and Issues of VoFR . . . . . . . . . . . . 10.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Issues . . . . . . . . . . . . . . . . . . . . . . . .
11 Voice Over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91

11.1 What is IP? . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 How Voice Over IP Works . . . . . . . . 11.1.2 Benefits and Issues of Voice Over IP 11.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91 .93 .93 .94
Table of Contents
VII
Voice Fundamentals
12 Voice Switching in the Enterprise Network
. . . . . . . . . . . . . . . . . . . .95
12.1 The Evolution of Voice Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96 12.2 What is Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98 12.3 Why Perform Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99
APPENDIX A - Company and Product Overview . . . . . . . . . . . . . . . . .101 APPENDIX B - Introduction to Decibels . . . . . . . . . . . . . . . . . . . . . . . .109 APPENDIX C - Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .111 APPENDIX D - References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117
VIII
Table of Contents
Voice Fundamentals
1 Introduction
This booklet provides an introduction into many of the fundamentals of voice communication, beginning with analog techniques, and concluding with voice over ATM. Some of the ways that enterprise networks can be created will be discussed, and how to make the most efficient use of available network bandwidth through techniques such as speech compression and speech activity detection (also known as silence suppression). Many other issues are discussed, including how and why echo occurs and techniques that can be used to overcome it, thus allowing network managers to migrate their voice networks onto ATM.
1.1 A Short History of the Telephone

In the mid-1870s, while trying to understand sound and sound communications, Scottish-born inventor Alexander Graham Bell had an idea for a device that would transmit sound over long distances by converting the sound to an electrical signal. This device was later called the telephone derived from the Greek words meaning 'far' (tele) and 'sound' (phone). Bell was not the only person of the time developing a telephone device. However, he was the first one to patent the device in 1876. Further developments were made to the telephone during the late 1870s. Bell created the induction-based earpiece, and Thomas Edison was responsible for the design of the carbon microphone. The incorporation of these enhancements produced a truly practical instrument. Initially the telephone had no mechanism for dialing another number. To make a call a handle needed to be turned which generated an electric current. This current signaled the operator of the local exchange. To connect a caller to the called party, the operator would manually insert a jack plug into the corresponding jack socket. It wasn't until 1889 that Almon B. Strowger developed the automatic telephone exchange. The most unlikely of people to be involved in telephony, Strowger developed the exchange as a way of beating his business rival in Kansas City, USA. The wife of Strowger's main competitor was the operator of the local exchange, and whenever a call came in asking for an undertaker, naturally she passed it onto her husband. To alleviate this problem, Strowger developed the first automatic telephone exchange and the dial telephone, eliminating the need for an operator. Telephone networks have undergone many changes since those early days. However, many of the underlying principles remain the same. The basic "two-wire" telephone used
Section 1 - Introduction
Voice Fundamentals
in most domestic homes today still operates in essentially the same way as the telephones of over 100 years ago.
Section 1 - Introduction
Voice Fundamentals
2 The Telephone
In this section, the basic operation of the telephone is examined with a look at the two basic functions that it offers: signaling and speech transmission. To better understand this critical piece of equipment, it is important to appreciate how human voice and hearing function. To complete this section, other types of telephones will be examined, including proprietary designs and digital telephone sets.
2.1 Key Voice Fundamentals

2.1.1 Frequency
Human speech occurs as a result of air being forced from the lungs, through the vocal chords and along the vocal tract which extends from an opening in the vocal chords to the mouth and nose. Speech consists of a number of different types of sounds, including voiced, unvoiced and plosive sounds. The voiced sounds result from the vocal chords vibrating, thus interrupting the flow of air from the lungs and producing sounds in the frequency range of approximately 50 to 500 Hertz (Hz). Unvoiced sounds result when the air passes some obstacle in the mouth or a constriction in the vocal tract. Finally, plosive sounds result from air being let out with a sudden burst, for example when the vocal tract is closed then suddenly released, or when the mouth is closed and suddenly opened. A person's nasal cavities and sinuses also modify all of these sounds, and all contribute to what we know as normal human speech. The range of frequencies that result from these sound sources, combined with the structure of the vocal tract, nasal cavities, and sinuses vary, depending upon whom is actually speaking. The resulting mix of frequencies determines the unique sound of a person's voice. The range of frequencies produced by speech varies significantly from one person to another as explained above. Normally, frequencies in the range of about 50 Hz upward are generated, with the majority of the energy concentrated between about 300 Hz and 3 kilohertz (kHz). The human ear, on the other hand, can detect sounds over a range of frequencies from around 20 Hz to 20 kHz, with maximum sensitivity in the region between 300 Hz and 10 kHz. Taking these two factors into account, as well as the results of practical testing, the frequency band of 300 Hz to 3.4 kHz has been found to be the key sonic range for speech intelligibility and voice recognition. Reducing this bandwidth quickly reduces intelligibility, while increasing it adds quality yet does not significantly improve intelligibility or voice recognition.
Section 2 - The Telephone
Voice Fundamentals
As a result, the frequency band used in telephone systems is limited to between 300 Hz and 3.4 kHz, delivering a system that provides speech transmission that is quickly recognized and easily understood.
2.1.2 Levels
It is important to ensure that voice signals are transmitted at the correct level across a network, so that end-to-end performance is maintained. Too low a level can result in speech merging into background noise, creating an environment where the listener finds it hard to hear the talker and is encouraged to talk loudly. On the other hand, too high a level will encourage the listener to talk too quietly. Today, international voice communication is part of everyday life. People need to be able to communicate with others anywhere in the world as effectively as if they were in their own country, or even in their own office. This goal is complicated by the way telephone systems have evolved differently in different countries. For example, an analog (the term analog is described in section 4.2) telephone from North America transmits a lower-level electrical signal for a given acoustic volume than a telephone in the UK. Signal levels will be discussed in this booklet in terms of decibels (dBs), and related terms such as dBm, dBm0, and dBr. For readers unfamiliar with these terms, or individuals who simply need a refresher, please refer to Appendix B.
2.2 Basic Operation of the Telephone

Telephones come in many varieties, yet they fall into two main categories: analog, and digital. The original sets designed by A. G. Bell were analog. In fact, most telephones used in domestic environments are still analog. The simplest form of telephone today is the two-wire "loop-disconnect" telephone. It is also known by various other names, including "loop-start" and "POTS" (Plain Old Telephone Service) telephone. It connects to the telephone exchange via two wires that carry the voice signals in both directions, hence the term two-wire telephone. The wires also carry the dialed digits to the exchange and the incoming ringing voltage to the phone. The exchange places a voltage of about 48 volts across the pair of wires to power the telephone and monitor the on-hook, off-hook, and pulse dialing activity.
2.2.1 Basic Telephony - Signaling

To initiate a call, the user lifts the handset. This action closes a switch in the telephone and causes current to flow in a loop, hence the term "loop-start." The exchange detects this current as an incoming call and provides a dial tone to the line. The dial tone signals
Voice Fundamentals
to the user that they may now start to dial. Dialing before hearing the dial tone may result in digits being missed by the exchange. However, today's modern exchanges will usually return dial tone immediately after detecting current flow. Upon hearing the dial tone, the user begins to dial the called number. If the telephone is set to pulse dial, the telephone rapidly opens and closes the loop at a rate of approximately 10 PPS (Pulses Per Second) or 20 PPS. This is also referred to as loop-disconnect dialing. Figure 2.1 shows the progress of a call from the handset being lifted and dial tone being returned, to the first digit being dialed (a 3 in this case).
Figure 2.1 Operation of Loop Disconnect Dialling

The dial speed and the make/break ratio are standards that were set in the past. They reflect the characteristics of switching equipment and direct control switches. The make break ratio varies according to the different dial pulse receivers used in different countries (e.g. North America: 61/39, UK: 67/33, Germany: 60/40). The ratio 50/50 was not chosen because it did not match the characteristics of mechanical relays and switches in the switching systems. An alternative way of sending dialing information, called Dual Tone Multi Frequency (DTMF) is much more common today. In this form of signaling, each number is represented by two tones that are transmitted simultaneously on the voice path for short period of time.
Figure 2.2 DTMF Frequencies
Voice Fundamentals
The frequencies used are shown in Figure 2.2 and defined in ITU-T Recommendation Q.23 [1]. DTMF transmits digits much faster than pulse dialing, and the time taken to send each digit is independent of the digit being sent. An additional benefit of DTMF is that once the call is established, pressing a key on the phone will transmit the tones over the voice path, enabling DTMF to be used to access voice mail, home banking systems, and other tone-based systems. When an incoming call arrives at the telephone set, the exchange applies an AC ringing voltage to the pair of wires. To answer the incoming call, the user picks up the handset. This action applies a loop to the line that is detected by the exchange, which then removes the ringing and connects the voice path through. Recall - Recall is a function usually available on a simple two-wire analog telephone (except for older models). It is often accessed with a button marked "R", and can be used for a number of functions such as accessing additional features from a telephone exchange or swapping between calls on the same line. There are two types of Recall, namely Timed Break Recall (TBR) and Earth Recall (ER). With TBR, pressing the Recall button while the handset is off-hook causes the phone to put a timed break on the line (similar to dialing a "1"). With the phone set to Earth Recall, the phone momentarily applies a ground (earth) to one of its leads, known as the B lead.
Figure 2.3 Two wire Telephone Set Interface
Voice Fundamentals
2.2.2 Basic Telephony - The Speech Path

Apart from transmitting dialing information, the main function of the telephone is to provide voice communications. As already mentioned, the simple telephone has to provide simultaneous voice paths in both directions even though there are only two wires. It achieves this through the use of a hybrid circuit, the purpose of which is to take four-wire speech (i.e. separate paths for transmit and receive) and to combine the two onto a single two-wire path. Figure 2.3 shows a simplified view of the interface between a two-wire telephone and the exchange. Speech from the mouthpiece will pass through the hybrid and onto the telephone line. However, for reasons described below, a certain amount of the speech signal will be reflected back to the earpiece, which is referred to as sidetone. The telephone, the telephone line, and the exchange interface create impedance that determine the relationship between the voltage and current on the line. To enable the maximum amount of power to be transferred from the telephone to the line and into the exchange interface (and vice versa), the impedances should be as close as possible to one another. This is achieved through the use of balance impedance. As long as the balance impedance closely matches the impedance presented by the line and the exchange interface, minimal amounts of signal will be reflected back to the telephone earpiece. If there is significant impedance "mismatch", then sidetone increases as more signal is reflected back to the earpiece. At the exchange interface, the speech signal originating from the telephone mouthpiece is directed towards caller "Y" by the hybrid. In the opposite direction, a speech signal from caller "X" at the exchange is transmitted to the 2-wire line, and as long as the exchange balance impedance matches the line and telephone set, then little of this signal is reflected back to caller "Y".
2.3 Other Types of Telephones

While the simple two-wire telephone set is used extensively in the domestic environment, it is less common in private networks based on Private Branch Exchange (PBX) systems. In this environment, it is common to find feature telephone sets that offer a wider range of facilities than the basic telephone. These feature sets will vary from manufacturer to manufacturer, each one usually being of a proprietary design. Some will be analog, while others will be digital. Most PBX systems today are of the digital variety, although this does not necessarily imply the use of a digital telephone set, and allow both analog and digital phones to be used. With an analog phone, the interface in the
Voice Fundamentals
PBX converts the analog signals to digital PCM (see Section 4.2). A digital telephone performs the analog/digital and digital/analog conversions within the set. Some digital telephones are of a proprietary nature, where the format of the digital data is manufacturer-specific. Many digital sets, however, conform to the Basic Rate ISDN standards as outlined in ITU-T Recommendation I.420 [2]. This defines a digital interface that carries two channels operating at 64 Kbps (known as the B or Bearer channels) and a 16 kilobits per second (Kbps) signaling channel (known as the D or Delta channel). The B channels can be used to carry data or digitized voice in a similar manner to the Primary Rate interfaces as described in sections 4.3 and 4.4. The main difference is that on a Basic Rate interface only two B channels are available. The D channel is normally used to carry Common Channel Signaling (CCS) information for call control in a similar manner to CCS on Primary Rate interfaces (see Section 7.2). However, the specifications also allow the D channel to be used to carry user data, including packet-switched and frame-based data.
2.3.1 "Two-wire" vs. "Four-wire" Telephony

References are frequently made to two-wire or a four-wire telephone sets. Care should be taken when interpreting the meaning of these terms, since they can be confusing. A two-wire telephone means that speech is carried in both directions on the same pair of wires, and requires hybrid circuits to split the two paths into separate transmit and receive functions at the telephone set. It is also possible, particularly in some proprietary telephone sets, to have additional wires for signaling purposes. In this case, a two-wire telephone may actually be connected to the exchange by more than two wires. A four-wire telephone strictly means that speech is carried in each direction on separate pairs of wire and no hybrid circuit is necessary. However, a better definition of four-wire telephony is that the speech is carried on separate paths that may be pairs of wire, or might even be separate channels in a digital system. For example, once into a digital telephone exchange, speech is carried as "four-wire" even though it is carried in timeslots in high-speed digital signals rather than actually on four wires.
Voice Fundamentals
3 PBX Phone Systems

This section is not intended to be a detailed description of the operation of PBX phone systems. However, in order to better understand some of the other sections presented in this booklet, some of its key aspects will be covered. The PBX system will be examined followed by a quick look at how telephone calls are routed within a PBX network. Finally, the different types of voice interfaces normally available on a PBX will be introduced.
3.1 Introduction to the PBX

Simply put, a PBX is the telephone exchange that is privately owned by an organization. Its objective is to provide voice (and often data) communications to its users. In addition to offering simple call setup facilities, it also offers many other features and facilities to make life easier for its users. Users can place calls to others in their organization by simply dialing their extension number. To place a call to a person not connected to the same PBX network requires the call to be routed via the Public Switched Telephone Network (PSTN). This usually involves dialing an access code, such as a "9" or "0", followed by the complete destination number including country code and area code if appropriate.
Figure 3.1 Typical Components of a Digital PBX
Section 3 - PBX Phone Systems
Voice Fundamentals
Most larger PBX systems today are digital. This means that they route connections in a digital form, with speech first being converted from analog into PCM (see Section 4.2). Figure 3.1 shows the typical components used within a digital PBX. The "core" of the PBX is the common control and the switching matrix. The common control acts as the "brains", and controls the overall operation of the PBX. It performs functions such as recognizing that a phone has been taken off hook and connecting a dial-tone generator to the phone, interpreting the dialed digits and routing the call to a particular trunk or line interface, and so on. The switching matrix takes in 1.544 Mbps bit streams comprising multiple 64 Kbps channels and allows them to be connected, or switched, to any other 64 Kbps channel on any other interface. Interfaces to the PBX come in two main types: lines and trunks. Lines connect user devices such as analog or digital telephone sets, or other devices such as data terminals. Trunks are shared links and can carry connections originating from line interfaces on the same PBX or from other trunks also connected to the PBX. Analog trunks can only support one connection at a time, while digital trunks can support many connections simultaneously (see Sections 4.3 and 4.4). The trunks can then be broken down into two additional types: PSTN trunks (also called Central Office trunks) and private trunks. PSTN trunks connect the PBX to the public telephone network, and private trunks (also called "Tie lines" or "Tie trunks") connect the PBX to other PBXs as part of an overall private network.
3.2 Call Routing in a PBX

When a user dials a destination number, the PBX needs to determine how to route the connection in the most efficient manner. The PBX needs to consider many factors, including: Is the number valid? Is this user allowed to connect to the specified destination? Which is the cheapest trunk to use? Is there a trunk free? Figure 3.2 shows a network of PBX systems connected together with inter-PBX trunks, which could be analog, digital, or both. To place a call the user takes Telephone A "offhook" and dials the number for Telephone B. PBX 1 inspects the dialed digits and makes a decision as to which trunk to route the call on. In this case it chooses Trunk 1. PBX 1 seizes Trunk 1 (or a single timeslot of a digital trunk) and passes dialing information across it. The method used for dialing will depend upon what type of trunk it is (see Section 7 for more details on signaling). PBX 2 receives the call, inspects the dialed number, and routes the call accordingly onto PBX 3. PBX 3 inspects the digits, identifies the destination to be located on the exchange, and alerts the user to an incoming call by making the phone ring.
10
Voice Fundamentals
Figure 3.2 Call Routing in a PBX Private Network
3.3 Voice Interfaces on a PBX

The types of voice interface available on PBX systems are many and varied. They typically fall into three categories: Line Interfaces - These are the interfaces on the PBX that connect to desktop telephones. Line interfaces can include any of the types discussed in Sections 2.2 and 2.3, including: 2-wire analog loop disconnect/loop start 2-or 4-wire analog proprietary feature set 4-wire digital set, either proprietary or conforming to Basic Rate ISDN standards Private Trunk Interfaces - These interfaces provide the links between PBXs within a multi-PBX private network. They allow calls to be routed from one PBX to another without the need to involve the public network, avoiding extra call cost. Private trunk interfaces typically include: 2- or 4-wire analog with Ear and Mouth (E&M) signaling 4-wire analog with AC15 signaling Digital trunk supporting CAS or CCS signaling Public Trunk Interfaces - These provide access from the PBX to the PSTN (Public Switched Telephone Network) for outgoing and/or incoming calls. Public trunk interfaces typically include: Ground Start analog trunk: 2-wire, both-way calling Analog Direct Dial In (DDI): 2-wire, typically incoming calls only Digital trunk supporting CAS or CCS signaling
11
Voice Fundamentals
12
Voice Fundamentals
4 Introduction to Digital Voice

As previously discussed, many of today's voice interfaces rely on analog technology. However, once received into a PBX or a wide area network, analog voice is converted to a digital format in order to derive all the available benefits. This section looks at the background to digital voice, followed by a more detailed look at how analog speech is actually converted to a digital format. The integration of digital voice channels into primary rate digital interfaces is then presented, and finally the need for synchronization in a PBX network is examined.
4.1 The Channel Bank

The channel bank was one of the first devices to make use of digital voice in a practical environment. It is a device that takes multiple analog voice channels, digitizes them, and then multiplexes them onto a high-speed digital link. Two main types of channel banks exist today, one supporting up to 24 voice channels, and another supporting 30/31 channels multiplexed onto 1.544 Mbps and 2.048 Mbps digital links. The first channel bank was developed in North America in 1962, and was known as the D1 channel bank. It provided 24 analog inputs, each of which was converted to 8-bit PCM (although the least significant bit was then ignored). The resulting seven bits were used for each voice sample, and one bit was used for signaling, providing a combined data rate of 1.544 Mbps. It was later found that this seven-bit PCM gave unsatisfactory voice quality. Subsequent generations have used eight-bit PCM with "robbed bit" signaling (see Section 4.4.4). Channel banks have evolved beyond simply supporting analog voice circuits. Today, it is common for a channel bank to support multiple interface types other than just the analog voice interface. For example: 2-wire speech with loop disconnect signaling (incoming and outgoing) 2-/4-wire speech with E&M signaling Data 0 - 64 Kbps Data n x 64 Kbps
4.2 Digital Voice - Pulse Code Modulation (PCM) G.711

Digital voice is a representation of analog voice signals that uses binary "1"s and "0"s, also known as bits.
Section 4 - Introduction to Digital Voice
13
Voice Fundamentals
Figure 4.1 Progress of speech signal Analogue to Digital

Figure 4.1 shows the progress of a speech signal entering the telephone, being converted into an analog electrical signal, and then being converted into a digital form. When the talker speaks, they create variations of pressure in the air. The telephone picks up these pressure changes and turns them into an electrical signal that is analogous to the acoustic signal from the talker (hence the term analog). This analog signal is then converted into a digital stream of data bits which represents the digital voice signal. But why transport voice in a digital format? There are a number of reasons, including: Digital transmission is independent of distance - When an analog signal is transmitted on a transmission channel, it is attenuated to compensate for signal losses in the cable. In addition, noise is picked up that will affect the quality of the voice transmission. The signal that arrives at the destination is made up of a combination of the original signal and line noise. Amplifiers can be used to boost the signal back to the original level, but there is no easy way for the amplifier to distinguish between the original signal and line noise, so the noise is also amplified. Digital signals take on one of two levels, represented by binary "0"s and binary "1"s. When noise is introduced into the digital signal, it can be easily removed by regenerating equipment. Thus, the signal that arrives at the destination is an identical replica of the signal transmitted from the source. Multiplexing of voice and data - Since the digitized PCM voice is essentially a data stream running at a bit rate of 64 Kbps, it can be readily integrated with other PCM channels to make up an aggregate connection combining many voice channels in one physical connection. Since it is
14
Voice Fundamentals
Figure 4.2 Analogue to PCM Conversion

essentially data, it can be combined with other 'real' data and transmitted over a common transmission medium. This approach is used for the digital representation of analog voice signals in telephone systems and is defined in ITU-T Recommendation G.711 [3]. A simplified view of the process can be seen in Figure 4.2. The analog signal is sampled at a rate of 8000 times per second. This rate is derived from a theory developed by Harry Nyquist, which states that the sampling rate must be at least twice the maximum frequency of the signal being sampled (i.e. > 2 x 3.4 kHz). This results in Pulse Amplitude Modulation (PAM), which is simply a series of pulses that represent the amplitude of the analog signal at each sample time. Each PAM sample is compared to a range of fixed quantization levels, each of which is represented by a fixed binary pattern. The binary pattern of the closest quantization level is then used to represent the PAM sample.
Figure 4.3 Linear/Uniform and Non-linear/Non-uniform Quantisation
15
Voice Fundamentals
Due to a finite number of quantization levels available, this process introduces an error into the digital representation of the analog signal. The more bits used in each sample results in additional quantization levels and less error. To achieve reasonable quality over the range of speech amplitudes found in networks requires a minimum of 12-bit PCM samples assuming linear quantization (also known as uniform quantization). A view of linear quantization is given in Figure 4.3a. In practice, this number of levels is unnecessary for two reasons. First, average signal levels are normally small, and only the lower quantization levels actually get used. Second, the human ear operates in a logarithmic manner, being more sensitive to distortion in low-level signals than in high-level signals. As a result, a technique known as companding is used. This reduces the number of quantization levels by retaining multiple levels at low signal amplitudes, and reducing the number of levels for high amplitudes. This process of companding is shown in Figure 4.3b.
4.2.1 A-law and -law PCM

There are two common types of PCM, -law and A-law, each of which uses a different rule for the companding process. North America and Japan mostly use -law, whereas other areas of the world require the use of A-law. Both types are defined in the G.711 Recommendation, yet differ in a number of ways. The companding laws are different, and the allocation of PCM codes differs in relation to the amplitude of the PAM samples. With A-law, after converting from PAM to PCM, the even bits of each sample are inverted before being transmitted onto the digital transmission path. This bit inversion was originally used to ensure that a sufficient number of "1"s existed in the digital stream, because any channel that was idle would otherwise produce a pattern of only "0"s. In fact, on a 2.048 Mbps PBX interface this inversion is unnecessary, since the problem of too many "0"s is dealt with at the physical layer (see Section 4.3.1). For these reasons, when operating an international network where both -law and A-law PCM systems are used, it is important to perform a proper conversion between the two. The conversion process is given in G.711, which defines that digital paths between countries should carry signals encoded using A-law PCM. Where both countries use the same law, then that law should be used on digital paths between them. Any necessary conversion will be done in the countries using -law PCM.
4.2.2 Power of a Digital Signal

It is easy to quantify the power levels for an analog interface, since this is something that can be measured directly with a power meter. In the PCM world, there is no equivalent
16
Voice Fundamentals
direct method of measurement. Instead, a specific relationship is defined between an analog signal and a digital sequence. The relationship of power to the digital signal is defined in G.711 through two tables (one for -law and one for A-law) that define a sequence of PCM samples. When decoded, these samples result in a sine wave signal of 1 kHz, at a nominal level of 0 dBm0. This provides a theoretical maximum level of +3.17 dBm0 for -law, and +3.14 dBm0 for A-law. Any attempt to exceed these levels will result in distortion of the signal, simply because there are no more quantization levels.
4.2.3 Distortion Resulting from the Digitization Process

When PCM values are allocated to the PAM samples, a certain amount of distortion results because of the finite number of quantization levels available to quantize the analog signal. Distortion is covered in more depth in Section 7.7.
4.3 The Digital 1.544 Mbps PBX Interface (DS-1)

The 1.544 Mbps PBX interface is common to North America and Japan, and is often referred to as a "T1" or a "DS-1" interface. (In practice, these two terms are often used interchangeably, although this is wrong. DS-1 refers to a particular speed of 1.544 Mbps, while T1 refers to a digital transmission system). It offers 23 or 24 traffic timeslots depending upon the type of signaling being used. In countries supporting DS-1 interfaces, such as in North America, various types of T1 transmission facilities are offered. AMI facilities expect the attached device (e.g. PBX) to provide an AMI electrical signal as described in Section 4.3.1. The main problem with this is that long strings of "0"s do not provide any electrical voltage transitions. This can result in loss of repeater synchronization on the transmission facility. It is therefore the responsibility of the attached equipment to ensure that sufficient "1"s exist to maintain synchronization. The proportion of "1"s to "0"s is known as the "1"s density. An alternative type of facility supports Bipolar Eight Zeroes Substitution (B8ZS), where violation pulses are introduced into the user data stream upon the detection of an excess "0"s count. This technique is similar in principle to HDB3, as seen in Section 4.3.1, and is described below.
4.3.1 Physical Interface

DS-1 is supported via twisted pair cable only, unlike E1 that is supported on both unbalanced coaxial cable and balanced twisted pair cables. DS-1 uses Alternate Mark Inversion (AMI) line coding to electrically encode the signal on the line. However, to overcome any problem of low "1"s density, a process called B8ZS is normally used
17
Voice Fundamentals
Figure 4.4 HDB3 Coding

instead of the E1 HDB3 process. The operation of B8ZS, shown in Figure 4.4, works by replacing strings of eight consecutive binary zeroes with a code that introduces bipolar violations into the 4th and 7th bit positions. This ensures that a sufficient number of voltage transitions exist, while retaining the DC-balanced nature of the signal.
4.3.2 Framing - D4
There are two common types of framing used on a DS-1 interface, D4 and Extended Superframe (ESF). D4 framing is shown in Figure 4.8. This consists of a frame of 193 bits with a repetition rate of 8000 frames per second, giving a data rate of 1.544 Mbps and a 125s frame duration. Each frame contains 24 eight-bit timeslots named timeslot 1 through to timeslot 24, and a single bit called the F or framing bit. All 24 timeslots are normally available for traffic except for when CCS is carried. In this case, timeslot 24 is reserved for the signaling channel. Framing is achieved using the F bit over a sequence of 12 frames, which is also called a superframe. In odd-numbered frames the F bit is called Ft for terminal framing, and performs frame alignment. In even-numbered frames the F bit is called Fs and performs superframe alignment.
4.3.3 Framing - Extended Superframe (ESF)

Today, ESF is more common than D4 due to its capabilities for monitoring the performance of an in-service T1 link. This was not easily possible with D4, since the link would need to be taken out of service in order for performance testing to be carried out. The extended superframe, as shown in Figure 4.9, does exactly what its name implies and extends the 12-frame superframe to 24 frames. The use of the F bit is also changed.
18
Voice Fundamentals
Figure 4.5 G.704 Frame Structure for 2.048Mbit/s E1

Only 6 out of the 24 frames in the ESF are now used for synchronization purposes. Of the remaining 18 F bits, six are utilized for CRC checking to verify the integrity of the ESF, and 12 make up a Facility Data Link (FDL). The FDL is also known as the Data Link (DL) and is sometimes called the Embedded Operation Channel (EOC). The FDL is available for the communication of alarms, loop backs and general performance information between terminating devices such as Customer Service Units (CSUs), which terminate the T1s at the customer premises.
4.3.4 Channel Associated Signaling (CAS) on DS-1

The technique for CAS on a DS-1 is shown in Figures 4.8 and 4.9. The basic process is the same for both D4 and ESF framing. However for D4, only two signaling bits, A and B, are used for each traffic timeslot. With ESF, four bits are used: A, B, C, and D. The process used is called bit robbing, because the least significant bit of each traffic timeslot in every sixth frame is taken away to carry signaling information rather than traffic. Meanwhile, the other seven bits are left alone and continue to carry traffic such as PCM. Any distortion introduced to PCM voice traffic by this bit-robbing technique is negligible and can be ignored. However, for data the distortion can be significant. This is why data support is typically only 56 Kbps rather than 64 Kbps, with only the seven most significant bits being used.
19
Voice Fundamentals
4.3.5 Common Channel Signaling on DS-1

Common Channel Signaling utilizes timeslot 24 to carry signaling information as HDLCbased data messages. See section 5.2.2 for more details and examples.
4.3.6 DS-1 Alarms

DS-1 provides the same alarm conditions, but in a different manner and with a different naming convention. Figure 4.10 gives a comparison between DS-1 and E-1 alarm conditions and the naming conventions associated with each. The method used by DS-1 to provide a remote alarm indication differs depending on whether D4 or ESF framing is being used. With D4 trunks, a remote alarm indication, also called a yellow alarm, is given by transmitting a "0" in the bit 2 position of every timeslot. Putting this alarm indication in the traffic timeslots has two implications. First, it destroys any valid information carried in the traffic timeslots. Second, the receiver must validate the indication for a period of time (typically about 600 milliseconds (ms)) before taking any action, since it is possible that normal traffic could briefly mimic it. With ESF trunks, a remote alarm indication (yellow alarm) is given by using the F bit Facility Data Link to transmit an alternating pattern of eight "0"s, followed by eight "1"s, then eight "0"s, and so on.
4.4 The Digital 2.048 Mbps PBX Interface (E1)

A digital PBX interface running at 2.048 Mbps, sometimes called the "E1" interface, is designed to conform to ITU-T Recommendation G.732 "Characteristics of Primary PCM Multiplex Equipment Operating at 2.048 Mbps" [4]. This in turn refers to the following recommendations: G.703: "Physical/Electrical Characteristics of Hierarchical Digital Interfaces" G.704: "Synchronous Frame Structures Used at Primary and Secondary Hierarchical Levels" G.711: "Pulse Code Modulation (PCM) of Voice Frequencies"
4.4.1 Physical Interface - G.703

ITU-T Recommendation G.703 [5] defines the electrical characteristics for many types and speeds of interfaces, including 64 Kbps, 1.544 Mbps, 6.312 Mbps, 32.064 Mbps, 44.736 Mbps, 2.048 Mbps, 8.448 Mbps, 34.368 Mbps, 139.264 Mbps, 97.728 Mbps, and 155.52 Mbps. European voice applications primarily use the 2.048 Mbps interface.
20
Voice Fundamentals
Figure 4.6 E1 Alarms
One of two types of physical interface may be used: the 75-ohm unbalanced coaxial interface, or the 120-ohm balanced twisted pair interface. Parameters including voltage, voltage and frequency tolerance, and others are specified in G.703. The actual data bits are transmitted using Alternate Mark Inversion (AMI) with High Density Bipolar 3 (HDB3) encoding. The objective of using these is twofold: first, to remove any DC component from the transmitted signal (AMI performs this function), and secondly, to ensure that there are a sufficient number of voltage transitions in the signal. This is referred to as "1"s density, and is important so that the receiving device can derive synchronization, or timing from the signal (HDB3 performs this function). Figure 4.8 shows AMI and HDB3 encoding. AMI employs a three-level signal where binary zeroes are encoded using 0 volts, and successive binary "1"s (marks) are encoded using alternating voltages of 2.37 V for the unbalanced interface and 3 V for the balanced interface. HDB3 is defined in G.703, and works by replacing each block of 4 successive zeroes by a pattern of either 000V or B00V, where B represents an inserted pulse that conforms to the AMI rule and V represents an AMI violation. A violation is where two successive "1"s use the same electrical polarity. The choice of which pattern is used ensures that the number of B pulses between consecutive V pulses is odd, thus retaining the DCbalanced nature of the signal. This is important to help ensure error-free transmission.
4.4.2 Framing Structure - G.704

ITU-T Recommendation G.704 [6] defines the frame structures for a number of different speed links, including 1.544 Mbps, 6.312 Mbps, 2.048 Mbps, and 8.448 Mbps. As shown in Figure 4.5, a frame of 256 bits is defined with a repetition rate of 8000 frames per second, giving a data rate of 2.048 Mbps and a 125 s frame duration. Each
21
Voice Fundamentals
Figure 4.7 Bipolar Eight Zeroes Substitution (B8ZS) Coding

frame comprises 32 eight-bit timeslots named timeslot 0 through to timeslot 31. Timeslot 0 is used for a number of purposes, including frame synchronization and alarm reporting. Timeslot 16 is normally used to carry signaling information, although in some circumstances it may be used to carry a traffic channel. Timeslots 1 to 15 and 17 to 31 are used to carry 30 traffic channels, normally PCM in the case of PBX systems. However, since these timeslots simply represent a 64 Kbps channel they can be used to carry any form of traffic, including data. Alternate timeslot 0s carry different information. Figure 4.5 is concerned with the frame alignment signal 0011011 in even frames, and the A (alarm) bit in odd frames. The purpose of the frame alignment signal is to allow the devices at each end of a link to
Figure 4.8 D4 Framing
22
Voice Fundamentals
synchronize to the frame, enabling them to know where the frame starts and which bits refer to which timeslot. The A-bit, also known as the Remote Alarm Indication (RAI) is set to "0" in normal operation. In the event of a fault condition the A-bit will be set to a binary "1". See section 4.3.5 for more details on alarms. The other bits, Si and Sa4-Sa8, are of less significance, although still important. Si is reserved for international use. One specific use, as given in G.704, is to carry a Cyclic Redundancy Check that can be used for enhanced error monitoring on the link. It is important to note that both ends of a link must be configured in the same way, either with CRC enabled or CRC disabled. Sa4 to Sa8 are additional spare bits that can be used for a number of purposes as defined in G.704. For example, Sa4 can be used as a message-based link for operations, maintenance and performance monitoring.
4.4.3 Channel Associated Signaling (CAS) on E1

Figure 4.5 demonstrates a form of signaling known as Channel Associated Signaling (CAS). With CAS, specific bits of data within the frame are defined to carry signaling information for each of the traffic timeslots. The information for each timeslot is transmitted as a group of four bits, designated A, B, C and D. Since timeslot 16 only has eight bits available to support 30 traffic timeslots, a multiframe structure of 16 frames (designated frame 0 to frame 15) is defined to allow A, B, C, and D bits to be carried for all 30 timeslots. Frame 0 carries a Multiframe Alignment Signal (MFAS) of four zeroes. This allows the receiving system to identify which frame is which, and associate a traffic timeslot with its correct signaling bits. Frame 0 also carries a remote alarm indicator to
Figure 4.9 ESF Framing
23
Voice Fundamentals
signify loss of multiframe alignment. Timeslot 16 of frame 1 then carries A, B, C, and D bits for timeslots 1 and 17, timeslot 16 of frame 2 carries A, B, C, and D bits for timeslots 2 and 18, and so on up to frame 15 of the multiframe, which carries A, B, C, and D bits for timeslots 15 and 31, after which the sequence repeats. A common problem is when the A, B, C, and D bits in timeslot 16 associated with any of the traffic channels 1 to 15 are set to 0000. If this is the case a false multiframe alignment signal will result, causing the whole signaling mechanism to fail. This situation is most common in a configuration where the PBX system is connected to a multiplexer network that provides an idle pattern of 0000 to the PBX for channels that are not routed. In fact, ITU-T G.704 recommends against the use of 0000 for any signaling purposes for timeslots 1 to 15. It also recommends that if B, C, and D are not used, then they should be set to B=1, C=0 and D=1.
4.4.4 Common Channel Signaling (CCS) on E1

Common Channel Signaling utilizes timeslot 16, as does CAS. However, rather than defining specific bits to carry signaling information for each of the traffic timeslots, signaling information is sent as High Level Data Link Control (HDLC)-based data messages. See section 5.2.2 for more details and examples.
Figure 4.10 DS-1 and E1 Alarm Comparison
4.4.5 E1 Alarms
Recommendation G.732 describes various fault conditions and subsequent actions that should be taken. It includes such conditions as power supply failure and codec failure, although this booklet will only describe problems associated with the link itself. Frame Level Alarms - (in reference to Figure 4.6) In the event of one of the following problems occurring on the received signal (Rx), bit 3 in timeslot 0 of odd numbered frames should be set to "1" on the transmit (Tx) signal of that PBX system. Loss of incoming signal Loss of frame alignment Excessive error ratio 1 x 10-3
24
Voice Fundamentals
Loss of frame alignment will occur for a number of reasons, including cabling or equipment faults. In the event of a failure in the transmission system between the two PBXs, the transmission system should automatically apply an Alarm Indication Signal (AIS) condition to the line of a continuous stream of "1"s. Multiframe Alarm - When running Channel Associated Signaling, bit 6 of timeslot 16 in frame 0 of the multiframe is used to indicate loss of multiframe alignment. If multiframe alignment is lost on the receive signal, the PBX will set bit 6 in its transmit signal as to the far end. Loss of multiframe alignment is a rare situation, since in a normal failure condition loss of frame alignment is more likely. A common cause of such a condition is misconfiguration of one end of the link, as described in Section 4.3.3 above.
4.5 The Need for PBX Synchronization

Similar to many digital systems, digital PBX systems operate internally in a synchronous manner where data is moved from one place to another using a common clock, or timing source. When two PBXs are connected together via a digital link, they must be synchronized in order to avoid bit slips and loss of frame synchronization. Digital PBX interfaces typically have some form of buffering mechanism in order to overcome problems associated with jitter, wander, and slight timing inaccuracies. However, in the event that the timing mismatch between two PBXs is too great, the buffer will fill. Once full, it must be emptied, resulting in loss of data and loss of frame synchronization. Frame synchronization should be regained rapidly, so long as the timing mismatch is not too great.
1,544,001Mbit/s
1,543,999Mbit/s
Figure 4.11 Lack of Synchronisation

Section 4 - Introduction to Digital Voice 25
Voice Fundamentals
To understand why synchronization is required, consider the following two examples. The first example shows two PBX systems connected without synchronization, while the second shows PBX 2 synchronized to PBX 1.
4.5.1 PBX Systems Without Synchronization

(With reference to Figure 4.11) The two PBXs are connected via a 1.544 Mbps link. This is only a nominal speed and some deviation is inevitable. It is quite possible that the transmit clock from PBX 1 is running slightly fast, say 1.544001 Mbps. This represents an accuracy of about 5 parts in 10 million, which is not uncommon. PBX 2, on the other hand, is running slightly slowly at 1.543999 Mbps, again an accuracy of about 5 parts in 10 million. Every second, PBX 1 transmits 1,544,001 bits of data onto the trunk, and PBX 2 receives 1,543,999 bits, leaving two bits to be absorbed in a buffer at the input to PBX 2. This will continue with two bits being absorbed in the buffer every second until the buffer is full, at which point it will be emptied causing loss of frame synchronization. The effect of this is hard to predict, but will probably cause clicks on any voice call currently in progress across the trunk, or disconnection.
1.544001 Mbit/s
1.544001 Mbit/s
Figure 4.12 PBX Synchronisation
4.5.2 PBX Systems With Synchronization

(With reference to Figure 4.12) The system has been rearranged to allow PBX 2 to synchronize its internal clock to the data arriving at the digital trunk port. PBX 1 still transmits data a bit fast, at 1.544001 Mbps, but now PBX 2 also receives data at 1.544001 Mbps. The buffer does not fill, and frame synchronization is preserved. This approach is effective in a point-to-point situation, but synchronization also needs to be maintained in a full network scenario such as shown in Figure 4.13.
26
Voice Fundamentals
There are a number of rules to follow with synchronization, one of which is that the whole network should be synchronized back to the same clock source. Figure 4.13 shows a small network of five PBX systems with PBX 1 taking a high-accuracy clock from an ISDN connection to the public network. An ideal clocking arrangement based upon this network would be for PBX 2 to synchronize its system clock to the data on Link 1 coming from PBX 1. PBX 3 would then synchronize to the data coming from PBX 2 via Link 2. PBX 4 would synchronize to PBX 1 via Link 3 and PBX 5 to PBX 4 via Link 6. In this way, all PBX systems are synchronized together. However, this scenario alone would not compensate for a failure. If link 6 were to fail, PBX 5 would lose the clock to which it is synchronized. To overcome this problem, it is normal to have a clock fallback list in each PBX, and in the event of a problem the PBX will search for another valid source. However, one key criteria to follow when creating clock fallback lists is to ensure that the network cannot get into a clocking loop. For example, this is where PBX 4 takes its clocking signal from PBX 3, PBX 3 from PBX 5, and PBX 5 from PBX 4. In this scenario it is likely that errors would occur, causing loss of synchronization on the links. The error level could extend to where synchronization cannot be regained, resulting in a complete loss of one or more trunks. For this reason, it is very important to follow the manufacturer's guidelines when configuring network clocking.
Figure 4.13 Synchronisation in a Network of PBXs
27
Voice Fundamentals
28
Voice Fundamentals
5 Speech Compression
Speech compression, also referred to as voice compression, describes the process of digitizing speech to a bit rate of less than 64 Kbps. It is normal, however, to start with PCM at 64 Kbps and compress it to a lower rate. Ideally, the resulting speech quality will not be affected, however in practice, there will be some degradation that may or may not be apparent to the users. There are many different techniques with different characteristics used for speech compression, which result in bit rates from a few Kbps up to 40 Kbps. PCM speech can be compressed because a large portion of the 64 Kbps bit stream is redundant. Furthermore, it is thought that speech of reasonable quality can be provided at rates as low as 1 Kbps. This has not yet been achieved, in large part because current understanding of the way speech works is less than complete. As time goes by, new and more efficient techniques are being developed to drive the bit rate lower and lower while maintaining acceptable quality. This section looks at a number of speech compression techniques in common usage today, and other new systems poised for entry into the marketplace. The section concludes with information on speech compression impairments, including the negative effects that are introduced into voice telephony when speech compression is used.
5.1 Different Coding Types

Speech compression schemes can be classified into one of three categories: Waveform Coding, Source Coding, and Hybrid Coding. Waveform Coding - Waveform coders attempt to reconstruct a waveform in a form as close to the original signal as possible, based on samples of the original waveform. In theory, this means that waveforms are signal-independent and should work with nonvoice signals such as modem and fax traffic. Typically waveform coders are relatively simple to implement and produce acceptable quality speech at rates above 16 Kbps. Below this, the reconstructed speech quality degrades rapidly. PCM is an example of a waveform coding technique. If linear quantization is used, then at least 12 bits per PCM sample are needed to reproduce good quality speech. This results in a bit rate of 96 Kbps (8000 samples per second x 12). However, the nature of speech and human hearing does not tend to follow a linear pattern. Much of a speech signal is at low levels, and human ears are not sensitive to the absolute amplitude of sounds, but instead to the log of the waveform amplitude. Therefore, in representing speech digitally, more bits are allocated at the lower levels than at the higher levels. This
Section 5 - Speech Compression
29
Voice Fundamentals
process, called companding, results in digitized speech of 64 Kbps. Therefore, even PCM effectively represents a compressed digital speech form. Companding is discussed in more detail in Section 4.2. Other common waveform coding techniques include Adaptive Differential Pulse Code Modulation (ADPCM), and Continuously Variable Slope Delta (CVSD). Source Coding - Source coders (also known as Vocoders) are more complex than waveform coders, but can compress speech to bit rates of 2.4 Kbps and below. To achieve these compression rates, knowledge of the speech generation process is required. In principle, the speech signal is analyzed and a model of the source is generated. A number of parameters are then transmitted to the destination to allow it to rebuild the model and thus recreate the speech. These parameters include such information as whether a sound is voiced (such as vowels) or unvoiced (such as most consonants), amplitude information, and pitch. While source coders provide very low bit rates, the subjective quality of the regenerated speech can be poor. Often, although the speech is understood, recognition of whom is speaking, referred to as talker recognition, is very poor. Furthermore, source coders do not carry non-speech signals very well, such as modem or fax signals. Because of these factors, source coders are not typically used in commercial applications. Their main use has been in military applications where natural sounding speech is not as important as a very low bit rate, which can then be encrypted for security purposes. Hybrid Coding - As the name suggests, hybrid coding uses aspects of both waveform and source coding, bringing together the benefits of high-quality speech from waveform coders and low bit rates of source coders. Hybrid coders operate in a similar manner to source coders, where a model is built on the parameters of the speech signal. Rather than transmit these parameters directly to the destination, hybrid coders are used to synthesize a number of new signals. These new signals are then compared with the original signal to find the best match. The modeled parameters, along with the excitation signal, which represents how the synthesized signal was produced, are then transmitted to the destination where the speech is reproduced. Examples of hybrid coders include Code Excited Linear Prediction (CELP) and its derivatives, some of which are described later in this section.
30
Voice Fundamentals
5.2 Adaptive Differential Pulse Code Modulation (ADPCM)

ADPCM is the most common type of speech compression technique in use today. It offers good quality speech at a range of bit rates from 16 Kbps through 40 Kbps. Its performance is well understood, and while non-standard versions exist, it has been "standardized" in a number of ITU-T Recommendations. These include ITU-T G.721, which covers ADPCM at 32 Kbps, and G.723, which covers ADPCM at 16, 24, and 40 Kbps. Today, G.721 and G.723 have been combined and replaced by G.726, which covers all bit rates, including 16, 24, 32, and 40 Kbps. 32 Kbps ADPCM provides speech quality that is practically indistinguishable from 64 Kbps PCM. However, the quality does degrade at the 24 Kbps and 16 Kbps levels. 40 Kbps was originally implemented to cope with speech-band data, such as modems and fax machines, but it only supports modem speeds up to about 12 to 14.4 Kbps. This means that if a V.34 modem (normally 28.8 Kbps) was used across a 40 Kbps ADPCM link, the modem would drop down to about 12 to 14.4 Kbps. ADPCM imposes an extremely low delay, with less than 1 ms encode and decode times. Along with its reasonable quality, at least at the higher bit rates, its low delay levels has made it particularly attractive for carrying speech across enterprise networks. ADPCM is still the most popular compression algorithm in use today.
Figure 5.1 Differential Pulse Code Modulation (DPCM)

ADPCM coders are waveform coders which, instead of quantizing the speech signal directly, quantize the difference between the speech signal and a prediction of what the speech signal will be. This is achieved by assuming that adjacent samples are usually similar to each other. The value of a sample can be predicted with some accuracy by considering the values of preceding samples. For example, the simplest prediction is that the next sample will be the same as the current one. In this case, Differential Pulse Code Modulation (DPCM) is produced, which is the basis of ADPCM. Figure 5.1 shows
31
Voice Fundamentals
Figure 5.2 Adaptive Differential Pulse Code Modulation (ADPCM)

DPCM in action, where only the difference between one sample and the next is transmitted from the source to the destination. For PCM, the analog speech signal is sampled at regular points, and the absolute level of the sample is encoded. With DPCM, the difference between one sample and the next is encoded. By doing this, fewer bits are needed to encode the signal. Although speech samples don't change rapidly in practice, when they do the changes in signal level are often greater than can be encoded using DPCM. This results in distortion of the signal, as shown in Figure 5.1. However, with ADPCM the amplitude range over which a given number of bits are used to encode a sample varies, or adapts, depending upon the range of amplitudes occurring at the time. This can be seen in Figure 5.2, which shows the principle of encoding the difference between the actual signal and a prediction of what it will be, based on the prediction that the next sample will be the same as the current one. The general rule that is used by ADPCM is as follows: "When signals are quantized towards the limit of the current range, the range used to quantize the next sample is changed." During the first few samples, the difference between one and the next will be relatively small. These differences are encoded using the full range of quantization levels that are available, given the number of bits available for each sample. For example, at 32 Kbps the difference is encoded in four bits, one for the sign (whether the next sample is more positive or negative than that predicted) and three for the magnitude of the difference. A few samples further on, the difference between one and the next is greater, yet can still be encoded using the full range of quantization levels. This is achieved through the
32
Voice Fundamentals
process of adaptation, where the amplitude range that can be encoded with four bits changes depending upon the amplitudes at the time. As previously mentioned, ADPCM supports four different bit rates. The different rates are achieved through the number of bits used to encode the difference between one sample and the next, as seen in Figure 5.3.
Figure 5.3 ADPCM bits
5.3 Code Excited Linear Prediction (CELP)

At bit rates of around 16 Kbps and below, the quality of waveform coders falls rapidly. We have also seen that source coders, while operating at very low bit rates, tend to reduce the talker recognition substantially. Therefore, hybrid schemes, especially CELP coders and their derivatives tend to be used today for sub 16 Kbps compression. Many CELP implementations are proprietary to the manufacturer although we shall also discuss two standardized versions known as LD-CELP as defined in ITU-T G.728 and CS-ACELP as defined in ITU-T G.729. The essence of a CELP encoder is to analyze the incoming speech, and then transmit a number of parameters to the decoder so that the original speech can be reproduced as accurately as possible. These parameters include the mathematical model of a filter which simulates the talker's vocal tract (the vocal characteristics that make a person sound unique), gain information giving the level of the speech, and a codebook index. The codebook index is used to point to a sequence of pre-defined speech samples, known as vectors, which is common to both the transmitter and the receiver. The number of codebook entries and the number of samples within each entry is dependent upon the actual CELP implementation. At the transmitter, groups of PCM speech samples (vectors) from the input speech, typically up to 20 ms in length, are compared to the vectors stored in the codebook. Generating a synthetic speech signal for every entry in the codebook and comparing it to the actual speech input vector does this. The index for the vector that produces the best match with the input speech waveform is then transmitted to the receiver. At the receiver, this waveform is then extracted from the codebook and filtered, using the mathematical model of the original talker's vocal tract. This produces highly recognizable, high-quality speech transmissions. Due to the complexities of speech and the wide range of different human voices, the processing required for CELP is very intensive, typically of the order of 15 million instructions per second (MIPS) or more for one voice channel. However, CELP is a very
33
Voice Fundamentals
popular form of speech compression because of its high speech quality and low bit rates, typically between 4.8 Kbps and 16 Kbps. The practical drawbacks of CELP are evident in two main areas. First, CELP often produces end-to-end delays in the order of 50 to 100 ms. This is due to a combination of the processing overhead and the number of speech samples that are buffered for analysis. Such high delays can cause transmission problems (see Section 6 on echo). Second, since CELP is tuned to human speech, it does not support voice band data well and can cause problems with modems and fax machines, as well as the transmission of DTMF tones.
5.4 Low Delay-Code Excited Linear Prediction (LD-CELP) ITU-T G.728

LD-CELP is based upon CELP, and provides similar speech quality as 32 Kbps ADPCM at a rate of 16 Kbps. It also incurs smaller levels of delay, typically less than 2 ms, as compared with normal CELP delay levels of 50 to 100 ms. LD-CELP uses backward adaption to produce its filtering characteristics, which means that the filter is produced from previously reconstructed speech. At the encoder, A law or -law PCM is first converted to linear PCM. The input signal is then partitioned into blocks of five consecutive input signal samples. The encoder then compares each of 1024 codebook vectors with each input block. The 10-bit codebook index of the best match codebook vector is then transmitted to the decoder. The decoding operation is also performed on a block-by-block basis. Upon receiving each 10-bit index, the decoder performs a table look-up to extract the corresponding code vector from the codebook. The extracted code vector is then filtered to produce the current decoded signal vector. The five samples of the post filter signal vector are then converted to five A-law or -law PCM output samples. Please note that this is a somewhat simplified description of the operation of LD-CELP. A more detailed description can be found in G.728 [15].
5.5 Conjugate Structure-Algebraic Code Excited Linear Prediction (CS-ACELP) ITU-T G.729
CS-ACELP is another speech compression technique that is based upon CELP. It was originally designed for packetized voice support on mobile networks, although a different scheme known as RPE-LTP has been adopted on the GSM mobile telephone system (see Section 5.6).
34
Voice Fundamentals
CS-ACELP operates at a rate of only 8 Kbps, yet still provides speech quality similar to that of ADPCM at 32 Kbps. Furthermore, it has been shown to operate well in tests even when packets are lost. The coder operates on speech frames of 10 ms, corresponding to 80 samples at a sampling rate of 8000 samples per second. For every 10 ms frame, the speech signal is analyzed to extract the parameters of the CELP model (linear prediction filter coefficients, adaptive and fixed codebook indices and gains). These parameters are then transmitted to the destination in a specified frame format. At the decoder, the filter and gain parameters are used to retrieve the filter information and simulate the filter. The speech is then reconstructed by taking the codebook entry of 80 samples and passing it through the reconstructed filter. Speech is then be converted to A or -law PCM and transmitted to the interface. Please note that this is a somewhat simplified description of the operation of CS-ACELP. A more detailed description can be found in G.729 [16].
5.6 Other Compression Techniques

Continuously Variable Slope Delta (CVSD) - CVSD was one of the earlier speech compression schemes designed into TDM multiplexers and was very popular in the early 1980s. It is a waveform coder operating directly on the waveform of the signal. However, rather than starting with PCM, as is the case with many other compression schemes, it often relies on analog rather than digital techniques. With CVSD coding, the sending end compares the analog input voltage to an internal reference voltage. If the input signal is greater than the reference, a "1" is transmitted and the reference voltage increased. If the input signal is less than the reference, a "0" is transmitted and the reference voltage is decreased. Through this process, the receiver can reconstruct the sending end's reference voltage which itself approximates the input analog signal. Therefore the talker's analog signal can also be reconstructed. CVSD techniques are not standardized and are proprietary to individual manufacturers. CVSD can be used at practically any speed because the bit rate is a function of the chosen sampling rate. The subjective quality at 32 Kbps is not as good as ADPCM at 32 Kbps, which has therefore tended to replace CVSD. The quality degrades even further at lower bit rates, making CVSD practically unusable at rates of less than 16 Kbps. CVSD has poor voice band data support, with DTMF tone transmission becoming unreliable at rates of about 16 Kbps.
35
Voice Fundamentals
Digital Speech Interpolation (DSI) - DSI is the generic term for systems that make use of the gaps in speech by stopping the transmission of digitized voice samples, thus reducing the effective bit rate of the voice channel. It relies on the probability of the speech activity factor (SAF) during a conversation being less than 1. The SAF is the amount of time that one person is talking in an average conversation, and is normally about 0.4 or 40%. An example of where DSI is utilized is in DCME (Digital Circuit Multiplication Equipment) as defined in ITU-T G.763. If a particular voice channel is compressed using 32 Kbps ADPCM, by applying DSI with an SAF of 0.5 the effective bit rate is reduced to 16 Kbps. Regular Pulse Excitation-Long Term Prediction (RPE-LTP) - RPE-LTP is a coding scheme that has been adopted for use on the GSM (an acronym for Group Spciale Mobile and Global System for Mobile communications) mobile telephone system. RPE-LTP operates at 13 Kbps (a recent version operates at 6.5 Kbps). The speech quality is reasonable, although not as good as that offered by LD-CELP or CS-ACELP. Its main advantage over these other compression schemes is its relative simplicity. For this reason, it has also been adopted as a de facto compression scheme for Internet telephony. It is well-suited for this application since compression can be performed in real time on PC platforms with relatively low processing power such as 66 megahertz (MHz) 486 computers.
5.7 Speech Compression Impairments

The two main impairments introduced by speech compression are delay and distortion. Delay can be measured easily, and its effects on echo are discussed in more detail in Section 6. Distortion, on the other hand, is very difficult to measure since it is very subjective and can be perceived differently by different listeners. In addition, distortion affects more than just speech. Many speech compression systems are optimized for speech, and do not carry signaling tones or the tones from modems and facsimile machines particularly well. In order to quantify the amount of distortion resulting from speech compression, two methods are typically applied: Mean Opinion Score (MOS), and Quantization Distortion Units (QDUs).
5.7.1 Mean Opinion Score (MOS)

ITU-T Recommendation P.80 ("Methods for Subjective Determination of Transmission Quality") [17] describes methods that are suitable for determining the level at which
36
Voice Fundamentals
telephone connections may be expected to perform. It includes definitions of Mean Opinion Scores (MOS), where a number is allocated to the quality of a speech path with the meaning of the number given in Figure 5.4. To determine this number, a series of spoken sentences are transmitted to listeners via the speech compression system under scrutiny. For each sentence, the listeners must allocate a number according to the table in Figure 5.4. The Figure 5.4 Mean Opinion Scores (MOS) numerical average of the results from all the listeners is then used to provide a MOS score for the particular speech compression system (see ITU-T P.80 for a more accurate description of this process). While not exhaustive, Figure 5.5 shows a graph of the speech quality achieved with a number of different speech coding techniques in terms of their Mean Opinion Scores. Note that the use of the term CELP covers Figure 5.5 MOS for Different Speech Coding Schemes a range of different techniques, including proprietary systems and standardized ones such as LD-CELP and CS-ACELP. By using compression schemes such as CS-ACELP, a similar quality to ADPCM at 32 Kbps and regular PCM at 64 Kbps can be achieved. This general quality of speech is often known as "toll quality".
37
Voice Fundamentals
5.7.2 Quantization Distortion Units (QDUs)

ITU-T Recommendation G.113 ("Transmission Impairments") [18] describes a method of quantifying distortion on a telephone connection. It defines the Quantization Distortion Unit (QDU) which results from one conversion of analog to PCM and back to analog again (A or -law PCM). ITU-T Recommendation G.113 recommends that no more than 14 QDUs should be introduced in an international telephone connection. Currently, there is no method to determine QDU values objectively. It can only be measured subjectively using methods such as those found in ITU-T Recommendation P.83 ("Subjective Performance Assessment of Telephone-Band and Wideband Digital Codecs") [19]. When applied to speech compression, the QDU value given represents the equivalent number of analog-PCM-analog conversions necessary to provide the same subjective quality as the speech compression scheme. For example, 32 Kbps ADPCM has a QDU rating of 3.5 (as given in ITU-T G.113). This means that a conversion from analog to ADPCM and back to analog gives the same subjective quality as 3.5 tandem conversions of analog-PCM-analog (the .5 is derived from an averaging process used in the assessment of the QDU value). Some speech compression schemes have been allocated QDU values in G.113, a selection of which are shown in Figure 5.6. The values for ADPCM and LD-CELP include a conversion from analog to PCM to ADPCM/LD-CELP to PCM and back to analog. For telephone connections that incorporate a number of speech coding/compression processes, the QDU values of the individual processes can be added together to determine the overall end-to-end Quantization Distortion. For example, Figure 5.7 shows a telephone call between telephones A and B which traverses PBX systems 1, 2, and 3. En route, the speech is compressed twice in multiplexing systems. If 32 Kbps ADPCM is used, then the overall QDU value allocated to this speech path is 6,
Figure 5.6 Some QDU Values
38
Voice Fundamentals
Figure 5.7 Tandem Calculation of QDU Values

comprised of a value of 2.5 from each of the PCM-APDCM-PCM conversions, and a value of 1 from the conversion from analog to PCM and back to analog again.
5.7.3 Speech Compression and Voice-Band Data

ITU-T Recommendation G.113 discusses the effect of speech compression on voiceband data. Currently, only three schemes are covered: LD-CELP at 16 Kbps and ADPCM at 32 Kbps and 40 Kbps, although this list will be expanded in the future. The following is an extract from G.113 Annex D: 1) The 16 Kbps LD-CELP (Recommendation G.728) algorithm supports voiceband data only up to 2.4 Kbps. 2) The 32 Kbps ADPCM (Recommendation G.726) supports voice-band data up to 4.8 Kbps. 3) The 40 Kbps ADPCM (Recommendation G.726) supports voice-band data up to 9.6 Kbps with 14.4 Kbps for only non-tandem connections. This list is intended as general guidance only. In some circumstances, the actual data rates supported may exceed those given, although it is equally likely that lower rates will result. To put this into context, imagine a V.34 modem which normally operates at a rate of 28.8 Kbps. If this device is operated this over an ADPCM link at 32 Kbps, the modem may downspeed all the way to 4.8 Kbps.
5.8 Fax Relay

The ITU-T define a number of different types of facsimile (fax) transmission known as Groups 1, 2, 3, and 4. The most commonly used is Group 3 as defined in ITU-T Recommendation T.4, which operates at a data rate of up to 14.4 Kbps.
39
Voice Fundamentals
Figure 5.8 Fax Relay Operation

Beyond normal telephone conversations, facsimiles represent the second most common type of speech-band traffic carried on enterprise networks today. However, as covered in Section 5.7, if a fax is carried on circuits using speech compression, the audio tones used for fax transmission are sufficiently distorted to reduce the effective data throughput. The amount of this distortion depends on the type of speech compression being used. Two approaches can be taken to overcome this problem: First, fax traffic can be transmitted on 64 Kbps PCM circuits, although this is very inefficient in terms of bandwidth usage. Second, a technique such as fax relay can be used where the fax tones are first demodulated and transmitted as actual data rather than as digitized speech band tones. This approach supports full-speed fax transmission, and also offers additional bandwidth savings over the use of some speech compression schemes. Fax relay is highlighted in Figure 5.8. The speech interfaces at the two multiplexers can either be analog or digital depending upon the system in use. If a digital interface is used, then the fax relay function would apply to all channels carried in the digital signal. Audio signals arrive at the interface of Multiplexer 1 (or Multiplexer 2 if the call originates from that end). If the signal is speech, it is automatically directed towards the speech compression section. However, if it is a fax signal, then it is directed towards the fax demodulation function, which literally demodulates the fax signal into a corresponding data signal for transmission over the wide area network as data. At the destination, this data is converted back to an audio fax signal for connection to the destination fax machine. The automatic detection of the fax signal is achieved by monitoring the audio signal for a 2100 Hz tone followed by a fax call preamble. This preamble represents data communicated between the two end fax machines, and is used to set up the call. It carries such information as the fax machine's telephone number, its speed capabilities, and so on.
40
Voice Fundamentals
6 Echo and Echo Control

With the introduction of more voice networks running on frame- or cell-based systems, combined with the growth in the number of international networks and mobile phone networks, (all of which introduce delay), echo and echo control is becoming a more significant issue. Anyone who has heard echo on a telephone circuit will appreciate how annoying it can be, and how it can make a speech path unusable. This section starts by looking at what echo is and how it is caused, followed by a discussion and description of how echo can be removed from voice circuits.
6.1 What is Echo?

When a person is taking part in a telephone conversation, some of the energy of their speech will be reflected back to their telephone set and can be heard in the earpiece. The effect that this has depends upon two key factors: the amount of the reflected signal, and the length of time it takes to return to the earpiece (round-trip delay). If the delay is sufficiently small, the sound will simply be perceived as sidetone (see Section 2.2.2). However, if the delay exceeds a certain threshold, the reflected signal will start to be perceived as an echo. This threshold is fairly subjective and will vary from person to person. Typically, a round trip delay of up to about 10 ms will be heard as sidetone. From 10 ms to about 30 ms, the delay will add a hollow or "tunnel" sound to the speech, and above 30 ms the reflection will be perceived as a distinct echo. Also significant is the level of the echo signal, since the lower the returned signal the more tolerant we are of it. Figure 6.1 shows a rough guide as to the amount of echo a talker can tolerate, in relation to the round-trip delay. The Echo Return Loss (ERL) relates to how much of the talker's speech is reflected. A small number represents a large echo.
Figure 6.1 Tolerance to Echo

Section 6 - Echo and Echo Control 41
Voice Fundamentals
6.1.1 Causes of Echo

There are two main factors that contribute to echo: the delay introduced by the network carrying the speech communications path, and any point in the network that reflects the speech signal. Without the delay, the reflection is not a problem, and without the reflection the delay is not a problem. Figure 6.2 displays a typical speech path between two telephone sets and the most common cause of echo, hybrid echo. Both telephones in this example are two-wire sets, which means they connect to the exchange over a single pair of wires. At the exchange, the two-wire path is converted into a four-wire path using a hybrid (see Section 2.2), and the speech is transported over the wide area network to another exchange using a fourwire path. At the other exchange, the four-wire path is converted back to two wire for connection to the other telephone. Section 2.2 showed that this conversion from a fourwire to a two-wire path and vice versa is imperfect. If the impedance presented by the phone and the line does not perfectly match the impedance of the exchange interface, then a certain amount of signal is certain to be reflected.
Figure 6.2 Causes of Echo

This can be seen in Figure 6.2, where the speech signal from Telephone A is converted from the two-wire path to the four-wire path, transported across the telephone network, and converted back to a two-wire path onto Telephone B at the destination. At Telephone B's hybrid, some of the signal is reflected back towards Telephone A. In practice, there are multiple points of reflection in a network, as shown in Figure 6.3. This example assumes a four-wire path between Telephones A and B. Even for a simple call there are several potential points of reflection: a) represents the reflected signal at the hybrid internal to Telephone A, which should simply be perceived as sidetone; b) is the signal reflected at the hybrid of the telephone exchange to which Telephone A is connected, and again should be perceived as sidetone; c) is the reflected signal from the hybrid at the far end exchange interface; d) the reflected signal from the hybrid in
42
Section 6 - Echo and Echo Control
Voice Fundamentals
Figure 6.3 Various places that echo occurs

Telephone B. Both c) and d) represent reflected signals that will be perceived as echo if the round trip delay exceeds a threshold of about 30 ms as discussed in Section 10.1. A common misconception is that these are the only causes of reflections. While the reflections at c) and d) are normally the dominant ones contributing to echo, there is also a significant effect from the acoustic coupling between the mouthpiece and earpiece of Telephone B. Furthermore, with today's increased use of hands-free phones in both static and mobile systems, acoustic echo is becoming more of an issue. Note that this example shows a four-wire path between the two ends. In today's digital networks, this is the most likely arrangement. However, if for any reason the transmission path includes any other two-wire circuits, then these will create additional reflections back to the source telephone.
6.2 Echo Control Devices

6.2.1 When Is Echo Control Required?
In practice, echo control is needed when the round-trip delay of a voice communications system exceeds 20 to 30 ms. While not an exhaustive list, the following describe some of the primary causes of delay in voice networks today: International circuits - Due to the sheer distances involved, some delay is inevitable. These circuits will usually require echo control. Satellite circuits definitely require echo control, since the round-trip delay caused by a single satellite hop is typically in excess of 500 ms. Frame/Cell-based systems - Many frame/cell-based systems, such as frame relay or ATM, require echo control because of the delay imposed by storing voice samples
43
Voice Fundamentals
before transmission. Furthermore, buffers used to reduce jitter impose additional delay. These buffers are used at the output of the network to overcome problems caused by frames/cells arriving from the network at irregular times. Speech Compression - Some speech compression techniques impose significant delays because of the time it takes to perform compression, as well as the need to store up speech samples. For example, CS-ACELP (ITU-T G.729), which compresses speech to 8 Kbps, introduces a one-way delay of approximately 15 ms. Some typical one-way delays from various causes are as follows: Cable (Copper or fiber) 1 ms per 200 km Digital Switch 0.75 to 1.6 ms Speech compression 0.5 to 100 ms Satellite hop 250 ms Frame/Cell-based system up to 50 ms
6.2.2 Echo Control Devices

There are two techniques that are primarily used for controlling echo: echo suppression and echo cancellation. In both cases, the echo suppressor and echo canceller sit in the four-wire path of a communications network as shown in Figure 6.4. As indicated earlier in this document, the primary source of echo is the 2- to 4-wire hybrid at the two-wire telephone interface. Speech from Telephone A is reflected as
Figure 6.4 Position of Echo Control Device

echo at the 2- to 4-wire hybrid at exchange 2, and vice versa. It is the responsibility of echo control device 1 to remove the echo caused at exchange 1, and similarly for echo control device 2 to remove the echo caused at exchange 2. For echo to be removed for a particular user, it is the responsibility of the far end echo control device. This raises an interesting point. Since the echo control device will be removing echo from its local tail, then the transmission delay (the delay between echo control devices) has no bearing on its performance. In fact, the transmission delay could be many
44
Voice Fundamentals
seconds and the echo control device will still work, although in practice, a delay of this length would be unusable from the point of view of the users.
6.3 Echo Suppressors

Echo suppressors are relatively simple devices used for echo control. First devised some 30 years ago, they are still used in some transmission networks today. They have been standardized in ITU-T Recommendation G.164 [20] and basically operate as follows: An echo suppressor is designed to remove the echo as shown in Figure 6.5. This is accomplished by placing a large loss (attenuation) in the transmit path a signal is detected on the receive path. While this reduces the level of the returned echo, its main
Figure 6.5 Echo Suppressor

drawback is that the echo detector will also prevent the speech from Telephone A from reaching the remote party when both parties are talking simultaneously (termed "doubletalking"). To reduce this effect during double-talk, the echo suppressor must be able to operate in a second mode where the transmit attenuation is reduced to a very small value, and a small amount of attenuation is also placed in the receive path. In this approach, while the received speech level is slightly reduced, the echo is effectively attenuated twice due to both the receive and transmit attenuations. Furthermore, during double-talk, the echo will tend to be masked out by the speech from Telephone A. Nevertheless, echo suppressors impare the quality of speech communication, and have been superseded to a large extent by the superior echo canceller.
6.4 Echo Cancellers

As with the echo suppressor, an echo canceller is a voice-operated device connected on the four-wire side of a circuit. Its operating performance has been standardized in ITU-T G.165 and G.168 [21]. However, rather than attenuating the echo, it calculates an estimate of what the echo will be, and then subtracts this from the actual returned signal. The result is that only the echo is removed, while the desired signal is unaffected.
45
Voice Fundamentals
Figure 6.6 Echo Canceller

Figure 6.6 shows the operating principles of an echo canceller, which cancels the echo on its local tail circuit, with the echo being returned from the hybrid towards Sin. The echo canceller operates by developing a mathematical model of the echo that it anticipates will occur, and subtracting this from the signal in the echo return path. (the signal into Sin). This model is based upon two main factors: first, the difference in the signal level out of Rout and the echo returned to Sin, also known as the Echo Return Loss (ERL); and second, the round trip delay of the tail circuit. The process of developing the model is known as convergence. (Note that this is a fairly simplistic view, because echo cancellers also need to be able to cope with multiple echoes occurring at different times and at different levels). For example, if the round trip tail circuit delay is 5 ms, and the ERL is 10 dB (the echo into Sin is 10 dB less than the signal out of Rout), then the echo canceller will store the signal seen on Rin in memory for 5 ms and replay it into the subtractor at a level of 10 dB less than at Rin. In this way, only the echo is cancelled without affecting the speech from Telephone A. In practice, this cancellation process is not perfect and a small amount of echo, known as residual echo, usually remains at the output of the subtractor. An additional function known as the nonlinear processor (NLP) is used to remove this residual echo.
6.4.1 Nonlinear Processor

The function of the nonlinear processor is to take a signal in from the echo subtractor, and, if it is of sufficiently small value, stop it from passing completely. The method of implementing this function is up to the manufacturer of the echo canceller.
46
Voice Fundamentals
However, one method described in G.165 is known as a center clipper, and is shown in Figure 6.7. Signals that are larger than the clipping level are let through without change, while signals that are smaller are reduced to the zero level. The net effect is to virtually eliminate echo.
6.4.2 Tail Circuit Considerations

The tail circuit capability (or round-trip Figure 6.7 The Nonlinear Processor delay) of an echo canceller is primarily dictated by the length of time that it can store the synthetic echo. In a digital echo canceller, this capability is linked to the amount of memory used, and is typically in the region of 8 to 128 ms. If the actual tail circuit delay exceeds the capability of the canceller, the canceller will be unable to store the synthesized echo signal for long enough to cancel the echo. This is a common cause of poor echo canceller performance. For example, Figure 6.8 shows an international network where echo cancellers are used to overcome the echo caused by long distance transmission delay. A call from Telephone A to Telephone B may be echo-free, yet a call from Telephone A to Telephone C results in echo at Telephone A. This echo is caused by the extra delay for the call to Telephone B, which exceeds the tail circuit capabilities of the echo canceller. To overcome this problem, the tail circuit capability of the echo canceller could be increased by adding additional memory (assuming this is possible), or another echo canceller may be installed into the circuit closer to Telephone C.
Figure 6.8 Tail Circuit Considerations

Section 6 - Echo and Echo Control 47
Voice Fundamentals
6.4.3 Types of Echo Cancellers

Up until now, the focus of this document has been to consider and illustrate how echo cancellers operate on individual speech channels. In practice, there are many different types of echo cancellers operating on either analog or digital signals. G.165 describes three types: Types A, C, and D, which are described as follows: Type A - An echo canceller that operates on one analog speech circuit using analog techniques for echo cancellation. Type C - An echo canceller that operates on a 64 Kbps digital signal, and uses digital signal processing techniques for echo cancellation. In today's telephony environments, devices called bulk echo cancellers are quite common. These devices combine 24 or 30 digital echo cancellers for 1.544 Mbps and 2.048 Mbps digital voice trunks. An example of the logical operation of a canceller is shown in Figure 6.9. Type D - An echo canceller that operates on one analog speech circuit, yet converts the signal to digital PCM for cancellation and then re-converts the signal back to analog. This type of echo canceller is common in environments where single analog circuits need echo control.
1Mbit/s to Transmission 1 TS24 Circuit
1Mbit/s to Transmission Circuit TS24
Figure 6.9 1.544Mbit/s, 24 Channel Bulk Echo Canceller
48
Voice Fundamentals
6.4.4 Tone Disabling of Echo Cancellers and Echo Suppressors

Under certain circumstances, it is important that echo suppressors and echo cancellers are automatically disabled. One such instance is in environments where voice band data traffic from modems and fax machines is present. The operation for suppressors and cancellers is different and is as follows: Echo Suppressors - Echo suppressors are designed to be automatically disabled in order to prevent the introduction of suppression and receive loss when data or other specified tone signals are transmitted. Most modems and fax machines transmit a 2100 Hz tone when transmission begins. Upon detecting this signal, the echo suppressor will automatically disable itself. The echo suppressor will then stay disabled throughout the data phase until the circuit goes quiet, at which point it is automatically re-enabled. Echo Cancellers - The operation is slightly different for echo cancellers, since for lowspeed modems (<9600 bps) echo cancellation can improve the effective signal-to-noise ratio and actually improve the bit error ratio achieved. This is also the case for Group 3 fax machines (9600 bps, V.29). It is preferable that echo cancellers do not disable in the presence of these low-speed devices. However, for modems faster than 9600 bps (V.32, V.32 bis, V.34 etc.) there is no improvement in the signal-to-noise ratio, and it is better for the echo canceller to be disabled. High-speed modems typically feature built-in echo cancellation. For an echo canceller to differentiate between the types of modems in use, V.32 and faster modems start up with a 2100 Hz tone that includes a series of phase reversals which the echo canceller can recognize (the phase reversals are defined in ITU-T V.25). This allows the echo canceller to disable itself for 2100 Hz tones with phase reversals, but not for simple 2100 Hz tones.
6.4.5 G.168 (Improved Echo Canceller)

G.168 defines the performance of what has become known as the Improved Echo Canceller. G.165 has long been the benchmark recommendation for echo cancellers. However, networking technology has moved on at such a rate that an echo canceller performing to G.165 alone is often not capable of dealing with many of today's real-life voice networking issues. As a result, G.168 has been produced to address these issues and ensure that an echo canceller will perform adequately under wider network conditions than as specified in G.165.
49
Voice Fundamentals
Some of the key areas addressed are as follows: Use of Composite Source Signal - In determining the performance of echo cancellers, many of the tests defined in G.168 use a new test signal known as the Composite Source Signal (CSS). The CSS has been chosen because it produces test results that are similar to those that real speech would produce. This is in contrast to the white noise source as used in G.165 which does not produce very realistic results. Operation for Higher Level Signals - A G.165 echo canceller does not define requirements for handling signal levels in excess of -10 dBm0. In the past, this has not been a problem, since the average levels in a public voice network have been in the region of 16 dBm0 to -28 dBm0. However, in today's digital networks, particularly private networks, the average levels have increased to between -10 dBm0 and -22 dBm0. This can lead to error-prone performance of echo cancellers. This situation has been compensated for by G.168, which is designed for signal levels up to 0 dBm0. Improved Convergence Time - G.165 used a white noise source to assess the convergence performance of an echo canceller. In practice this was found to be ineffective, resulting in echo cancellers that would frequently take a significant amount of time to converge on real speech echoes. G.168 uses the CSS, and tightens up on the test requirements for convergence time. Three tests are included that are designed for situations where the Non-Linear Processor (NLP) is both enabled and disabled, and when echo exists in the presence of noise. Injection of Comfort Noise - Although the use of a nonlinear processor (NLP) is not required in an echo canceller, without it the echo cancellation performance is limited. The main problem with the NLP is that during operation, all signals, including low-level background noise, are removed. Normally, a user would not even notice background noise at this level. However, when background noise is cutting in and out due to the operation of the NLP, the effect can be quite irritating. A G.168 canceller recognizes this problem by allowing "comfort noise" to be injected into the transmit path (towards the network and the far end talker) of the echo canceller during the time that the NLP is activated. Proper Operation During Facsimile Transmission - Due to significant levels of facsimile traffic in today's networks, a number of tests have been defined in G.168 to ensure that an echo canceller will quickly converge on signals from facsimile machines. Please note that the list given above does not include all the tests that are given in G.168. For more detailed information, please consult the G.168 Recommendation.
50
Voice Fundamentals
7 Introduction to Signaling Systems

The previous sections have established many of the basic building blocks of telephony. This section looks at the wide variety of signaling techniques used for call control in several situations, including between telephone exchanges, and most specifically between dealer boards used in financial transaction systems. The section begins with a look at some of the analog signaling systems still in use today. This includes PBX interfaces, as well as interfaces used in financial transaction environments where analog interfaces are still used extensively. The section is completed with a look at digital signaling, including Channel Associated Signaling (CAS), Common Channel Signaling (CCS), and examples of both types of systems.
7.1 Analog Signaling Systems

Before digital techniques were developed, signaling systems were analog-based. Today, most voice communication systems employ digital telephone exchanges, but there are still a large number of analog interfaces in use. Except for locations where ISDN is installed, all domestic telephones still use analog signaling as described in Section 2.2. Some even use Loop Disconnect dialing as opposed to DTMF. Many organizations, especially financial trading companies that require dedicated point-to-point voice connections, still use a large number of analog interfaces. In today's high-tech world of fast computers and high-speed digital communications, it is somewhat surprising that so much analog technology is still in use, and will continue to be used for many years to come. The most common PBX trunk signaling methods available are Loop Start/Loop Disconnect (see Section 2.2), Ground/Earth Start, E&M, and AC15.
7.1.1 Ground Start: 2 Way PBX to Public Exchange Trunk Circuit

A Ground Start trunk from a public exchange can be used to connect to either a telephone set or a PBX system, although the latter is more common today. Ground Start is a modification of Loop Start, and is used to limit the likelihood of call collision, or "glare". This occurs when both ends of a connection seize (begin to use) the circuit simultaneously. On a Loop Start circuit, this is most likely to occur when the exchange seizes the circuit at the start of the quiet interval in the ring cycle. This is not a major problem with a simple telephone line, since there is a good chance that the incoming call is directed to the person attempting to make the outgoing call. However, on a shared trunk used with a PBX phone system, this could result in two people being connected, neither of whom want to speak to the other.
Section 7 - Introduction to Signaling Systems
51
Voice Fundamentals
Ground Start signaling minimizes the chance of glare by adding additional signals, including the application of a ground to one of the leads to indicate an incoming call. The PBX system uses this approach, rather than intermittent ringing, to identify an incoming call. Once detected, the PBX will prevent call collision on the same circuit. The following section describes the signaling sequences between a public exchange and a PBX Ground Start trunk for calls in both directions: Outgoing Call From PBX: 1) The PBX applies a ground (earth) to the Ring/B lead to seize the trunk. 2) The Exchange busies the trunk and acknowledges the seizure by applying a ground (earth) to the Tip/A lead. Dial tone may be returned. 3) The PBX removes the Ground/Earth condition from the Tip/A lead and applies a DC loop between Tip/A and Ring/B. 4) The PBX dials the outgoing number by pulsing the loop on and off or by transmitting DTMF tones. Incoming 1) 2) 3) 4) Call to PBX: The exchange grounds the Tip/A lead to seize the trunk. The exchange applies ringing to the Ring/B lead. The call arrives at the PBX operator/attendant. To answer the incoming call, the PBX applies a DC loop between Tip/A and Ring/B.
Note that in both cases, the call-in-progress state is the same as for Loop Start. This is important, since if the PBX fails, for example due to power loss, it is still possible to make outgoing emergency calls. In this case, the PBX should automatically switch the trunk to a telephone set with Earth Recall facility. Pressing the Earth Recall button on the telephone will apply a ground to the Ring/B lead just as if the PBX were making an outgoing call. The telephone set would then provide a holding loop and send the digits using pulsing, or DTMF.
7.1.2 E&M Trunk

Most PBXs, as well as other types of equipment such as financial transaction systems (dealer boards) offer analog E&M signaling interfaces. E&M interfaces come in two forms, two-wire E&M and four-wire E&M. Two-wire E&M normally uses four physical connections, including two for the two-wire voice circuit plus the E&M signaling leads. Four-wire E&M normally uses six physical connections, including two pairs for the fourwire voice circuit and the two signaling leads. Exceptions do exist, such as with the less
52
Voice Fundamentals
common Type II E&M, where additional leads associated with the E&M leads themselves are used to overcome problems of differences in earth potential and/or noise.
Figure 7.1 E&M Connections via a Signalling Circuit
Figure 7.1 shows a typical arrangement for connecting two PBX phone systems or dealer boards via an E&M signaling circuit. It shows examples of both two- and four-wire E&M and the connection configuration. The labeling convention for the connections is done with respect to the customer equipment - the M (Mouth) lead is the signaling output, and the E (Ear) lead is the signaling input. On the four-wire interface, T(ip) and R(ing) are the PBX voice output and T1/R1 are the PBX voice input. The signaling circuit should be labeled with the names of the connections to be made to the PBX, with the M connection as the input from the M on the customer equipment. However, not all equipment that forms part of the signaling circuit follows this convention. Technicians configuring E&M interfaces should be aware of the correct meaning of E/M, T/R, and T1/R1, and which are inputs and which are outputs on their particular system.
Note - In Figure 7.1 and others, we have referred to T(ip)/R(ing) as the speech pair for a two wire interface and T/R and T1/R1 as the transmit and receive speech pairs for a four wire interface. These may also be referred to as A/B for a two wire interface and A/B transmit and A/B receive for a four wire interface.
53
Voice Fundamentals
Figure 7.2 E&M Type IB/V Arrangement

There are many different physical types of E&M that exist. They vary in terms of how the Ground and Battery (-48 VDC typically) are sourced, and how many wires are used to convey the signaling information. In North America there are seven different types: IA, IB, IC, ID, II, III, and V. The international standard protocol is similar to IB and has national variants in some countries. For example, in the UK Signaling System DC5 (SSDC5) is used. This is a variant of type IB and also similar to type V. Figure 7.2 shows the arrangement for a Type IB/V signaling interface where the speech path, whether two or four wire, is assumed but not shown. A typical sequence of events in a call will be as follows: Idle - M from both PBXs = open circuit. Seize - The calling PBX applies a ground to its M lead to seize the trunk. Seize Acknowledge - An optional signal, often referred to as a "wink", indicates acknowledgement of the seize signal and tells the originating PBX to start sending dialing information. It is applied as a brief grounding of the reverse signaling path. NOTE: Depending on how the interface is configured, a wink can also act as a Delay Dial signal (an indication to the calling end to stop dialing being sent) which must be followed by another wink as a Proceed To Send signal (an indication to proceed with dialing). Dialing - The forward M lead is pulsed at 10 PPS or 20 PPS. Alternatively, DTMF tones may be transmitted on the speech circuit. Answer - Application of a steady-state ground to the reverse signaling path. Release - The end clearing the call will remove the ground from its M lead.
54
Voice Fundamentals
Release Acknowledge - Removal of the ground from the M lead in the opposite direction to the release. The Signaling Circuit - (see Figure 7.1) The signaling circuit may be provisioned in many ways. It may be part of a private multiplexer network, or it may be provided by a carrier as a stand-alone circuit. If it is a stand-alone circuit it is normal for the carrier to convert the E&M signals into some other form of signaling, such as digital or AC (Alternating Current) signaling, before transportation over the wide area network.
7.1.3 AC Signaling Schemes

Prior to the more extensive use of digital transmission systems, AC signaling schemes were used (and still are in many instances) in preference to DC signaling (e.g. E&M), for transmission of inter-PBX analog connections over long distances. When both the speech and signaling information are carried on the same path, a simple amplifier can be used to regenerate, or boost, the signal of both types of information. In the case of a DC signaling system, a complete end-to-end DC path is required, and additional regenerating equipment is needed for the DC signaling as well as the speech path. Where interfaces such as E&M need to be transmitted over long distances, they were first converted to AC for transmission. AC15 has several variations, all of which make use of a "tone-on-idle", and utilize an inband tone of 2280 Hz for transmission over a four-wire speech path: AC15-A: Designed for use between PBX phone systems within a private network. Provides call control (setup, cleardown etc.) using the 2280 Hz tone. Address information (dialing) is carried either by pulsing of the 2280 Hz tone on and off, or by use of Multi Frequency (MF) tones. AC15-B: Designed for use between PBX phone systems within a private network. Provides call control using the 2280 Hz tones. Address information and access to supplementary services (additional features such as call transfer, conferencing etc.) are supported using MF tones. AC15-C: Designed to support long external extensions from a PBX. Provides call control using the 2280 Hz tone. Address information is carried either by pulsing of the 2280 Hz tone on and off, or by use of MF tones. Signaling converters can be used to convert between telephone signaling and AC15-C. AC15-D: Designed for use between PBX phone systems in different countries. Provides call control using the 2280 Hz tone. Address information is carried either by pulsing of the 2280 Hz tone on and off, or by use of MF tones.
55
Voice Fundamentals
An example of an AC signaling system as used on an inter-PBX trunk is shown in Figure 7.3. This demonstrates AC15-A, and shows the signals used for simple call setup. Other signals exist, but cannot be covered within the space restrictions of this booklet.
Figure 7.3 AC15 Example

The example shows two PBX phone systems connected together via some form of transmission facility. This could be a simple analog leased line, or some form of multiplexing equipment as discussed in Section 8. A call is placed from a telephone on PBX 1 to a telephone on PBX 2. PBX 1 interprets the destination number and chooses to route the call via the AC15 connection. Idle - Both PBXs apply low-level 2280 Hz tones (-20 dBm0). Seize - The seizing PBX removes the 2280 Hz tone directed towards the other PBX. Digit Signals - 10 PPS pulses are transmitted by the calling PBX as pulses of tone-on during the break period. Optionally, DTMF tones may be used. Answer - Removal of tone by called PBX. Clear Forward - Re-application of tone by calling PBX. Clear Back - Re-application of tone by called PBX. Note that the clear down signals begin with Tone-on at a high level (-10 dBm0) to ensure proper clear down operation in the presence of noise, followed by a low level (-20 dBm0) for the idle state.
7.1.4 Manual Signaling

Manual signaling is a form of signaling that carries no dialing or routing information. Commonly used on older PBX systems, particularly Private Manual Branch Exchanges
56
Voice Fundamentals
(PMBXs), its most common current application is for signaling between dealer boards in financial trading companies. Outside of this environment, these signaling systems are unlikely to be encountered today. There are two types of manual signaling, commonly referred to as Manual Ringdown and Auto Ringdown. Manual Ringdown - Used in point-to-point applications where the caller has to take a specific action, such as pressing a key or button to alert the called party. Some common interfaces supporting Manual Ringdown include: Gen/Gen (Sometimes known as "Figure 2") Balanced Battery/Balanced Battery (Sometimes known as "Figure 5") B-wire Earth/Gen More details of each one are given below. Auto Ringdown - Used in point-to-point applications where the caller seizes the line by picking up a handset, which automatically alerts the distant end to an incoming call. Some common interfaces supporting Auto Ringdown: Loop/Gen (Sometimes known as "Figure 1") Gen/Loop (Sometimes known as "Figure 6") More details of each one are given below. Gen/Gen - Sometimes referred to as Ring/Ring, Magneto or Local Battery signaling. Gen/Gen manual ringdown is where the customer's equipment sends a ringing voltage (Gen) to the transmission line when making a call, and expects to see ringing from the line to indicate an incoming call. It typically uses a 17, 25, or 50 Hz AC signal of 70 to 80 V applied to the A and B wires of a two-wire circuit, or to the phantoms of a four-wire circuit. Balanced Battery/Balanced Battery - Sometimes referred to as Balanced Battery Bothways. The customer's equipment sends a balanced battery signal to the transmission line when making a call, and expects to receive a balanced battery signal from the
The term phantom refers to the ability of two circuits to perform the function of three. In the case of a four wire speech circuit this can be achieved by passing a DC voltage or even ringing voltage as shown in the diagram.
57
Voice Fundamentals
line to indicate an incoming call. It is effected by applying the same DC voltage, typically -24 V or -50 V to the A and B wires of a two-wire circuit, or to each wire of the phantoms of a four-wire circuit. B-Wire Earth/Gen - As the name suggests, the customer's equipment puts a ground on the B lead of the line when making a call, and expects to see ringing from the line to indicate an incoming call. This type of signaling is generally used to connect a telephone to the line, but requires the earth recall button on the telephone set to be pressed to signal to the far end. Simply picking up the handset, thus applying a loop, is not sufficient. Loop/Gen - Gen/Loop - Loop/Gen describes a customer equipment interface which provides a DC loop to indicate an outgoing call, and expects to see "generator" or ringing from the line to indicate an incoming call. This is synonymous with the operation of a basic telephone set, except in this case no dialing information is sent. The term Gen/Loop refers to the interface on the signaling circuit itself. It expects to see a loop as an incoming signal from the customer equipment, and applies ringing as an outgoing signal to the customer equipment.
7.2 Digital Signaling Systems

7.2.1 Channel Associated Signaling (CAS)
CAS is a method of carrying signaling information for the many voice timeslots of a digital trunk. Specific bits in the frame are allocated to, or associated with, specific voice timeslots. This process is described in Section 4.3.3 for an E1 trunk and section 4.4.4 for a DS-1 trunk. In practice, which out of the available A, B, C and D bits are used varies considerably from one signaling scheme to another. Examples of CAS include A, AB and ABCD bit signaling. A-bit signaling means that only the A-bit is used while the other bits: B, C and D are set to an idle pattern, normally B=1, C=0 and D=1 (as recommended in G.704). AB signaling means that only the A and B bits are used, and so on. Figure 7.4 shows how simple A-bit CAS can be used to set up a call across a digital trunk between two PBXs. Telephone A places a call to Telephone B. PBX 1 routes the call via the digital trunk by seizing a voice timeslot and changing the associated A-bit to a zero. PBX 1 then sends dialing information by alternating the state of the A-bit between "0" and "1", with "1" representing a break and "0" a make. This method is equivalent to Loop Disconnect. An alternative method of dialing is to use DTMF tones on the voice timeslot. Once the call is answered, an answer signal is passed back across the
58
Voice Fundamentals
trunk by changing the state of the reverse A-bit to "0." At the end of the call, one end of the link will clear down by changing its A-bit back to "1." This call has no bearing on the other voice timeslots in the same digital trunk or their associated signaling bits. It is quite possible, and in fact normal, that calls are being set up, are in progress, or being cleared down simultaneously on other timeslots.
7.2.2 Common Channel Signaling (CCS)

In the early 1980s, the ITU-T (then called the CCITT) began intensive development of the ISDN series of standards. Among these is a set of global ISDN signaling standards, now known as DSS1, that govern access to public ISDN services from a private network. These standards have since become the basis of many CCS signaling schemes in use today.
Figure 7.4 A-bit CAS Example
However, throughout the 1980s, and in the absence of defined global standards, individual countries and PBX manufacturers introduced national, or proprietary, standards for PBX-to-public network and PBX-to-PBX signaling. As a consequence, equipment manufacturers and users suffered from trade barriers caused by technical incompatibilities, which hindered the cost-effective deployment of international corporate networks. The following section is an overview of some of the signaling systems that have been developed. Some are the result of international standards and agreements, while others are of a proprietary nature.
59
Voice Fundamentals
7.2.2.1 Private-to-Public Networking Protocols

These protocols are designed to provide the signaling interface between a PBX phone system and a public network. ISDN - Based upon ITU-T Recommendations as described in ITU-T I.110 "Preamble and General Structure of the I-Series Recommendations for the Integrated Services Digital Network (ISDN)" [8]. ISDN is sometimes referred to as "Q.931", the User-Network interface Layer 3 specification for basic call control. It is also referred to as Digital Subscriber Signaling System Number 1 (DSS1). Euro-ISDN - In order to overcome the previous incompatibilities between different countries, Euro-ISDN is designed to be a common ISDN implementation which all European countries conform to. A Memorandum of Understanding (MoU) has been signed by most network operators in Europe to show their commitment to the implementation of particular features and services as defined in a series of ETSI specifications. The relevant ETSI specifications are based upon ITU-T Recommendations. EuroISDN is also referred to as EDSS1 (European DSS1). Other Country-Specific Systems - Some countries also have their own countryspecific signaling systems. For example Digital Access Signaling System No. 2 (DASS2) is UK-specific, 1TR6 is German-specific and Version Numeris 2 (VN2) and VN3 are unique to France. In practice, these proprietary systems are being phased out and replaced by Euro-ISDN. The rate at which this occurs depends on the specific country, with some countries being clear leaders in the deployment of Euro-ISDN.
7.2.2.2 Public to Public Networking Protocol

Signaling System No. 7 (SS7) - A set of signaling protocols in extensive use worldwide to provide signaling between public exchanges. Defined in a range of ITU-T Recommendations as summarized in ITU-T Q.700 "Introduction to CCITT Signaling System Number 7" [9], and also referred to as "CCITT Number 7," "SS7" and "CCS7."
7.2.2.3 Private Networking Protocols

Private networking protocols offer inter-PBX signaling, including basic call setup and supplementary services to provide features accessible on a network-wide basis. NIS A211-1 interface - Primarily used in North America, this interface is based on the CCITT ISDN I and Q standards, Series Recommendations, ISDN standards established by ANSI/ECSA-T1, and the National ISDN-1 basic call and supplementary services requirements. The specification describes the distribution of services across a network, highlighting the connectivity between a central office (CO) and a user (for example, a PBX
60
Voice Fundamentals
system). This provides the basis for the definition of ISDN call control for PBX to CO applications and the signaling protocol requirements for Primary Rate Interface (PRI). Digital Private Networking Signaling System (DPNSS) - A set of signaling protocols originated by British Telecom who, in conjunction with many equipment manufacturers, developed DPNSS for interworking between PBXs. It is now one of the most powerful CCS private networking systems available, and has been adopted as a de facto standard in many countries other than the UK, including parts of North America. DPNSS is defined in British Telecom Network Requirement 188 [10]. QSIG - Also known as Private Signaling System Number 1 (PSS1). It is an internationally recognized signaling system defined in European Telecommunications Standards Institute (ETSI) specifications, European Computer Manufacturers Association (ECMA) specifications, and International Standards Organization (ISO) specifications. As with Euro-ISDN, QSIG is based on ITU-T Recommendations, ensuring compatibility between public ISDN and private ISDN networks. In practice, the move towards QSIG in many countries is very slow, due to its lack of features when compared to systems such as MCDN and other proprietary systems. However, the development of new features and facilities continues, and as time goes on, more and more users will migrate to QSIG. For further details on QSIG, please refer to reference [11]. Other Proprietary Systems - Many PBX vendors have designed their own proprietary systems, including the Nortel Networks Meridian Customer Defined Networking (MCDN) system, British Telecom's DPNSS, Siemens' CORNET, and Alcatel's ABC. This list is not comprehensive, and does not include all proprietary PBX systems.
7.2.2.4 How Does CCS Work?

In contrast to CAS signaling, CCS does not define specific signaling bits to be associated with traffic timeslots. Instead, it uses data protocols to transfer messages related to the traffic timeslots such as call setup, cleardown, transfer and so on. Being messagebased, there is no limit to the number of functions that can be performed, although the extent and number of functions is limited by standardization. CCS protocols are commonly used with digital trunk interfaces, although some systems also allow them to be used with analog trunks. In this case, a digital path is required for signaling, although the actual trunk circuits may carry analog speech. This booklet will limit its coverage to Primary Rate digital (PRI) interfaces.
61
Voice Fundamentals
CCS is sometimes referred to as an out-of-band signaling system, because signaling does not necessarily follow the same path as the voice channels that it supports. Normally the signaling channel is carried in timeslot 24 for an DS-1 trunk, in timeslot 16 for a E1 trunk, and supports the traffic timeslots in the same trunk. In some systems, the CCS signaling channel in one trunk supports the traffic channels in other trunks. This leaves timeslot 24/16 to be used as an additional traffic channel for trunks not carrying the signaling channel. Most CCS protocols are based upon the layering structure as given by the lower three layers of the OSI model. A description of each layer follows: Layer 1 - Describes the physical connection of the interface to the line together with electrical characteristics, line coding, and framing. Layer 1 also defines the timeslot to be used for signaling: timeslot 24 for 1.544 Mbps DS-1, and Timeslot 16 for 2.048 Mbps E1. Layer 2 - Describes functions used to ensure the reliable transfer of messages over a single physical link, and are typically High-level Data Link Control (HDLC)-based. Reliable transfer is maintained through the use of sequence numbers and a Cyclic Redundancy Check (CRC) on each frame. Errored or missing frames are requested to be retransmitted. For example, Link Access Procedures on the D Channel (LAPD) is an HDLC-based system used with NIS, "Q.931 ISDN", Euro-ISDN, and QSIG. Layer 3 - Describes the actual messages used for call control. This is the layer that actually characterizes the different protocols.
Figure 7.5 Layer 2 and Layer 3 Message Format

62 Section 7 - Introduction to Signaling Systems
Voice Fundamentals
Figure 7.5 shows an example of the Layer 2 and Layer 3 message formats. The numbers across the top of Layer 2 represent the order in which the data bits are actually transmitted across the link, starting with 1, then 2, and so on. Layer 2 Start Flag - A bit pattern of 01111110 used to identify the start of a message. Address Bytes - Used to identify the receiver or transmitter of a frame. Control Bytes - Used for various functions, including the carrying of sequence numbers. Byte 2 is not always used. Information - This is the actual Layer 3 message. FCS - Frame Check Sequence. Used for error detection. End Flag - A bit pattern of 01111110 used to identify the end of the message. Layer 3 Protocol Discriminator - This specifies the protocol that should be used to interpret the information within a particular message. Call Reference - Uniquely identifies the call that a message relates to. Message Type - Identifies the type of message, e.g. Setup, Connect, Disconnect, etc. Other Information Elements - Provides additional information depending upon what type of message is being sent, including Called Number, Calling Number, type of call (voice, voiceband data, data, etc), Channel Identification, (timeslot) etc. In order to demonstrate the operation of CCS, consider the example of a basic call setup using QSIG between two PBX phone systems. Figure 7.6 shows the Layer 3 messages transferred across the link. It does not show the underlying Layer 2 messages that are used to ensure reliable, error-free communication across the link. To initiate a call, the user of Telephone A picks up the handset and dials a number that corresponds to Telephone B. PBX 1 identifies that Telephone B can be reached via the digital trunk. The following Layer 3 messages are then passed across the trunk between the two PBXs: Setup - Used to initiate call establishment. It contains various Information Elements that define parameters such as the destination number, the calling number, the type of call, the timeslot on which the call will be carried, and others.
63
Voice Fundamentals
Figure 7.6 Message flows for basic call setup with QSIG
Call Proceeding - Sent to the calling PBX to indicate that the requested call establishment has been initiated, and no more call establishment information will be accepted Alerting - Informs the calling PBX that the called user is being alerted. The PBX will, for example, apply ring-back tone to the calling user. Connect - Indicates that the called user has answered the call, and that the calling party can be connected to the agreed traffic channel. Connect Acknowledge - Used to acknowledge receipt of the Connect message. Disconnect - Used to request that the connection be cleared. Release - Indicates that PBX 2 has disconnected the channel, and intends to release the channel and the call reference on receipt of the Release Complete message. Release Complete - Indicates that PBX 1 has released the channel and call reference.
64
Voice Fundamentals
8 Voice Within the Enterprise Network

The successful operation of enterprise networks often translates into success of a corporation's own business. More and more tasks are being integrated into the enterprise network, including voice, data and video communications requirements. For successful transmission of voice within the enterprise network, it is important to understand how voice is integrated and the possible problems that may be encountered. This section looks at a number of technologies used for the support of voice in the enterprise network, including Time Division Multiplexing (TDM), Asynchronous Transfer Mode (ATM), and frame relay. Also touched on is the subject of voice over IP. TDM systems have been in existence for a considerable period of time, and are covered in Section 10.2. Voice support over frame relay is still an emerging technology, although standards are being produced. Due to the importance of ATM and frame relay in the process of creating Unified Networks, these technologies are covered in their own sections: Section 9 for voice over ATM, and Section 10 for voice over frame relay. Voice over IP is also an emerging technology, and is covered in Section 11.3
8.1 What is an Enterprise Network?

The term "enterprise" has many meanings, but in the context of this booklet it means a company, government agency, or any organization that is not a carrier or service provider. An enterprise network typically means that the network is being used to provide private voice and data communications for an organization. The enterprise network often comprises a number of different physical networks, including a PBX network for voice communications, a packet-switched data network for data requirements, a router network for LAN communications, and a multiplexer network to consolidate traffic onto the same transmission links between sites. A general view of an enterprise network is given in Figure 8.1. The technology used for the backbone could be TDM, ATM, or frame relay, with each one imposing its own characteristics upon the transmission of the various communication types. As far as voice is concerned, most of today's requirements are for connecting together PBXs at different sites. The connections are usually made via digital interfaces, except where limited numbers of voice channels are required, in which case analog interfaces may be used. It is also common to transport voice signals via a multiplexing network. This allows voice traffic to be integrated with data or video communication traffic, reducing network operating costs and improving resilience to network failures. In addition, further bandwidth savings can be realized by through voice compression and, in the case of ATM and frame relay networks, silence suppression.
Section 8 - Voice Within the Enterprise Network
65
Voice Fundamentals
Figure 8.1 General view of an enterprise network
A number of different interface types used for voice communications on telephone exchanges have already been discussed. Common interfaces available on multiplexing networks include: Analog Voice: 2-wire/4-wire with E&M signaling FXS for connection to a telephone set FXO for connection to a PBX extension interface 1.544 Mbps DS-1: 23/24 voice channels, CAS or CCS 2.048 Mbps E1: 30 voice channels, CAS or CCS
Digital Voice:
8.1.1 Different Types of Enterprise Networks

Only certain types of networks are suited to the transport of voice. Some of the different network technologies and their voice support capabilities are discussed below: Time Division Multiplexer - TDM techniques were developed back in the 1950s and 1960s, but it wasn't until the early 1980s when their capability for true multisite networking of voice and data resulted in increased acceptance. TDM networks have always been well-suited to voice, in addition to offering low, predictable levels of delay and speech compression capabilities. Since the early 1980s, a great deal of research and development has been directed to creating interface types and improved speech compression techniques.
66
Voice Fundamentals
The biggest drawback that TDM networks experience is that once bandwidth is allocated for a channel, it continues to be allocated whether or not the channel is in use. This has resulted in manufacturers and users looking at other ways of supporting voice and data, such as Asynchronous Transfer Mode (ATM) and frame relay networks. Voice over Asynchronous Transfer Mode - ATM was chosen by the ITU-T as the communications technology upon which Broadband ISDN would be based. It was chosen because of its ability to support all communications requirements including voice, video, and data, and because it is scalable from speeds as low as a few Mbps up to many Gigabits per second (Gbps). ATM will be discussed in more depth in Section 9. ATM provides the low delay levels and delay variations that are required for voice transmission. Voice over Frame Relay - As a technology, frame relay was originally designed as a high-speed version of X.25, suitable for efficient transfer of data across a Wide Area Network (WAN). Since its introduction in the early 1990s, the use of frame relay has grown at an almost exponential rate. Originally it was widely believed that frame relay was not a suitable technology to support speech. This was due to factors such as unpredictable delay levels, and the possibility of frames being discarded in the network. Early frame relay networks based on medium-speed backbone links were not particularly well suited to speech. However, today's frame relay networks are often formed using a frame relay interface and a cellbased backbone. These backbones are typically high-speed networks based on technologies such as ATM, which provide fast, predictable delivery of time-sensitive information. Today, speech support over frame relay is becoming more and more popular. Section 10 of this booklet discusses voice over frame relay. Voice over X.25 - X.25 packet switching was first standardized in 1976 as a means to transport data reliably, via modems, across inherently noisy analog lines. It incorporates techniques, or protocols, which ensure that data carried by the network will reach its destination error-free. Even if the data becomes corrupted, the network will automatically re-transmit it without additional effort required from the user. This capability adds significant overhead to transmissions by introducing large delays, and in cases where re-transmissions occur, variable delays. X.25 is not a suitable technology for voice support for these reasons. Voice over IP (VoIP) - Today, Internet Protocol (IP) networks have practically become the corporate standard for LAN and WAN data traffic, with users ranging from small
67
Voice Fundamentals
home offices to large multinational organizations. As IP networks have grown in popularity, their capabilities have increased to support a wide range of applications and services. Up until now, most of these have been data-orientated, however, other realtime requirements are emerging, including the support of voice and video. Section 11 looks at VoIP, how it operates, and some of the issues involved.
8.2 Time Division Multiplexers (TDM)

TDM technology has existed since the development of channel banks where a number of analog voice channels are digitized, and each one allocated a specific timeslot on the link between two channel banks. This is the essence of the framed E1 and DS-1 digital voice trunks as shown in Figures 4.5 and 4.8/4.9, and illustrates time division multiplexing in its simplest form. However, when referring to TDMs, we usually mean multiplexers which provide communication between many locations served by the multiplexer network, as shown in Figure 8.1. The method used for communication between one multiplexer and the next usually conforms to a proprietary format from the product manufacturer. This allows a wide range of channel speeds to be supported, rather than being oriented around the 64 Kbps speed as with DS-1 and E1. The connection between two multiplexers is often called the aggregate link and typically covers a wide range of speeds from as low as 9.6 Kbps up to multiple Mbps, depending upon the specific system. Multiple links may be used to support bandwidth demands, and/or for resilience.
Figure 8.2 Typical TDM Frame
The term TDM comes from the method by which channel information is multiplexed onto the aggregate links. A simple view is shown in Figure 8.2.
68
Voice Fundamentals
As with G.704 and D4/ESF framing for digital PBX connections, a TDM link uses data frames which consist of a series of timeslots. However, the frame length, number of timeslots, and frame repetition rate is usually a proprietary system created by the equipment manufacturer. Figure 8.2 shows how data from a number of channels may share the bandwidth on an aggregate link. The timeslots within the frame provide a particular bandwidth, depending on the frame repetition rate and the number of timeslots per frame. For example, each timeslot may represent a bandwidth of 100 bits per second (bps). This would mean that the 9.6 Kbps channel would require 96 timeslots per frame, the 19.2 Kbps channel would use 192 timeslots, and the 64 Kbps PCM channel would require 640 timeslots per frame. In addition to the actual data, other information needs to be carried in the frame, including the following: Frame synchronization bits. Supervisory communications data between the multiplexers to carry alarms, status, maintenance, and configuration information. Signaling for the channels, including E&M for voice, RTS/CTS for data, etc.
The actual layout of the frame will be determined by a number of factors, including the number and speed of the channels, amount of supervisory overhead, the speed of the aggregate, etc. Once a frame is built for a particular configuration, it will remain
Figure 8.3 PBX & TDM Synchronisation
69
Voice Fundamentals
unchanged until one or more of the parameters change. As a result, one of the drawbacks of TDM is that if a device stops sending data into a particular channel, and unless the multiplexer is reconfigured, timeslots are still allocated which wastes bandwidth.
8.2.1 TDM Synchronization

In Section 4.5 we saw that PBX systems linked via digital connections need to be synchronized, otherwise errors will occur which usually result in noise or clicks on the voice signal. The same is true for TDM multiplexers. They are also synchronous devices, and often use clock fallback lists in the same way as PBX systems do. In fact, synchronization issues are one of the biggest contributors to TDM network problems today. We can take this concept one step further by considering the operation of a network of PBXs connected, using digital trunks, via a TDM network. In this case all the PBXs and multiplexers need to be synchronized to the same network clock. This is not difficult, since it is typical to have at least one digital link from a PBX into the public carrier network for PSTN access. These links provide the PBX systems with highly accurate Stratum 1 (1 part in 1011) clock sources, each of which is derived from the same source, providing synchronization services. The PBX can then provide this clock to the TDM multiplexers for synchronization. This approach is highlighted in Figure 8.3.
70
Voice Fundamentals
9 Voice Over Asynchronous Transfer Mode (ATM)

ATM is now attaining a significant share of the networking market, and one of the reasons is its ability to support voice. This section looks at how voice transmission can be achieved, using some of the existing mechanisms designed for the support of various traffic types over ATM. Before looking at how voice is supported, a brief introduction to ATM will be presented.
9.1 Introduction to ATM

ATM is the technology that was originally chosen by the ITU-T to support Broadband ISDN (B-ISDN). It is defined by the ITU-T in Recommendation I.113 ("Broadband Aspects of ISDN" [12]) as being "A service or system requiring transmission channels capable of supporting rates greater than the primary rate" (1.544 Mbps or 2.048 Mbps). ATM was chosen because of its ability to handle all types of traffic, including voice, video, image, and data within a single network. In 1991, the ATM Forum was formed with the objective of accelerating the use of ATM products and services through a rapid convergence of interoperability specifications. ATM Forum specifications refer extensively to ITU-T Recommendations and many references are made to these Recommendations in this booklet. For example, it is the ATM Forum who are developing the standards for voice over ATM, known as Voice and Telephony Over ATM (VTOA), although the standards refer extensively to various ITU-T Recommendations. ATM is a compromise between traditional circuit switching and packet switching. It offers the asynchronous nature of packet switching, where data is only sent when it needs to be, while requiring connections to be set up before any data is actually sent, as with circuit switching.
9.1.1 The ATM Cell

ATM makes use of fixed-length packets, called cells, for transportation of all traffic over the network. The ATM standards define a 53-byte cell with a 5-byte header and a 48byte payload. This provides a balance between shorter cells for delay-sensitive traffic such as voice and video, and longer cells for delay-tolerant traffic such as inter-LAN data. Two cell formats are defined: one for an interface between two ATM networks (Network-to-Network Interface or Network Node Interface (NNI)), and one for the interface between an ATM network and a user or user network (User Network Interface (UNI)). Both of these formats are shown in Figure 9.1.
Section 9 - Voice Over ATM
71
Voice Fundamentals
Generic Flow Control (GFC) - Available for flow control at the UNI. Virtual Path Identifier (VPI) - Used as part of addressing function. Virtual Channel Identifier (VCI) - Used as part of addressing function. Payload Type Indicator (PTI) - Identifies the type of cell plus other functions. Cell Loss Priority (CLP) - Indicates whether the cell may be discarded during congestion. Header Error Control (HEC) - CRC check to detect errors in the header.
Figure 9.1 ATM NNI and UNI Cell Formats

These fixed length cells deliver several benefits, including the ability to switch data at extremely fast speeds using hardware, statistical multiplexing of data from many sources, and a very wide range of transmission speeds. Transmission speeds can range from as low as 1.544 Mbps up to 2.488 Gbps and beyond.
9.1.2 The ATM Adaptation Layers

The process of inserting actual user traffic into the ATM cell and extracting it at the destination is accomplished by the ATM Adaptation Layer (AAL) as defined in ITU-T Recommendation I.363 [13]. This function lies in the end ATM devices, and is not a function of the network itself. The network deals purely with switching ATM cells. A number of different AALs have been defined to deal with the many different types of user traffic. The choice of AAL depends upon the nature of the traffic - whether it is variable or constant bit rate, connection-oriented or connectionless, and whether a timing relationship is required between the source and the destination. Currently four AALs have been defined, including AAL1, AAL2, AAL3/4, and AAL5. To demonstrate the function of AALs, Figure 9.2 shows a simple point-to-point ATM network with connections to a PBX and to a LAN. Traffic from each interface is converted to a form that will fit into ATM cells through the use of the ATM Adaptation Layer (AAL). The cells are then transmitted to the trunk interface of the transmitting device and sent to destination device. Due to the different nature of the traffic originating from the PBX and from the LAN, two different AALs are used, specifically AAL1 and AAL5.
72
Voice Fundamentals
Figure 9.2 A Simple ATM Network

AAL1 - AAL1 was devised to support circuit emulation, and provides a service to the attached equipment equivalent to a leased line. In ATM terminology, this is known as a Constant Bit Rate (CBR) service, and is illustrated in Figure 9.3. In effect, AAL1 is a continuous process where cells are filled from the interface and transmitted at regular intervals into the ATM network. To support these regularly transmitted cells, a fixed amount of bandwidth needs to be allocated within the ATM network.
Figure 9.3 Voice Transport via AAL1 Circuit Emulation
The 1.544 Mbps (DS-1) or 2.048 Mbps (E1) bit stream from the PBX is converted into blocks of 47 bytes, and a 1 byte AAL header is added. This header contains a sequence number that is used by the destination device to detect missing or misinserted cells. The combination of 48 bytes is then inserted as the payload into the ATM cell, and transmitted to the destination device. The next 47 bytes are then handled in a similar manner. At the destination device, the 48-byte payload is extracted from the ATM cell and the ATM header discarded. The 1-byte AAL header is inspected to ensure that no error has occurred, and the 47-byte data is transmitted to the PBX interface. If a cell is found to be missing, then it needs to be replaced with a dummy cell to ensure timing integrity of
73
Voice Fundamentals
the interface connected to the PBX. The content of the dummy cell will depend upon the actual application, however, for circuit emulation of 1.544 Mbps and 2.048 Mbps circuits, the contents should be all "1"s. This description covers what is known as an unstructured AAL1 service. AAL1 also provides a structured service where additional information is carried to allow the delineation of a structure of the information carried within the payload. Please refer to ITU-T I.363 for further details. AAL2 - A more sophisticated circuit emulation approach has been developed which overcomes the need to preconfigure bandwidth, which is inherent in the AAL1 approach. Using AAL2, multiple channels can be interleaved on a single virtual circuit. Furthermore, AAL2 permits variable bit rate (VBR) coding of voice with standard compression, silence detection and suppression, and dynamic encoding rates. Taking advantage of the "bursty" characteristics of voice communications, AAL2 increases networking efficiency and improves performance for data traffic, making all available bandwidth available for data transmissions when not required to support voice traffic. Another benefit is the ability to engineer delay, which is critical to ensure gain without sacrificing voice quality. AAL2 is not well-suited for switching because of the complexity involved in adding a layer and mechanism in hardware for switching. AAL2 trunking will typically be deployed in applications that require transport of multiple channels in point-to-point trunk scenarios. Examples are digital circuit multiplex equipment and service provider access applications.
Figure 9.4 ATM Adaptation Layer Example - AAL5
74
Voice Fundamentals
AAL5 - In the case of traffic from a LAN, it is not appropriate to use AAL1. This type of information arrives at irregular intervals, and is in the form of a block of data known as a frame. AAL5 is designed to perform the process of adapting LAN frames to fit into ATM cells. Figure 9.4 demonstrates the operation of AAL5. AAL5 takes user data, in this example a 218-byte LAN frame, and adds overhead, which includes padding data and an AAL5 trailer. This produces a new frame that is a multiple of 48 bytes. It then segments the frame into blocks of 48 bytes each, and inserts them into the payload of the ATM cells. An ATM header of 5 bytes is added, and the cell is transmitted into the network and onto its destination. Please note that, in contrast to AAL1, AAL5 is not a continuous process. Cells are only transmitted when there is a frame to send. When there are no frames, no cells are generated and no bandwidth is consumed in the ATM network.
9.1.3 ATM Service Categories

Within its Traffic Management Specification (currently version 4.0) [28], the ATM Forum defines five service categories available for use to transport different types of traffic across an ATM network. These include: Constant Bit Rate (CBR) Real-Time Variable Bit Rate (rt-VBR) Non Real-Time Variable Bit Rate (nrt-VBR) Unspecified Bit Rate (UBR) Available Bit Rate (ABR)
The reason that multiple categories exist is to support the different characteristics and Quality of Service (QoS) requirements associated with various traffic types. For example, speech has a real-time characteristic and requires guaranteed bandwidth and minimal delay, while most data types are non real-time, do not necessarily require guaranteed bandwidth, and can often tolerate significant levels of delay. Detailed descriptions of each category are not provided in this booklet, and the ATM Forum Traffic Management Specification [28] can be consulted for additional information. Voice traffic can be handled in a number of ways. The flow of speech using AAL1 as described in Section 9.1.2 represents a CBR service, since there is a constant flow of information that requires particular QoS parameters. However, voice can also be treated as rt-VBR by taking advantage of the fact that in a typical conversation there are significant gaps (due to gaps between words and while the other person is talking). This can be seen graphically in Figure 9.5, where the left-hand graph shows a constant flow of speech samples, including samples representing silence (CBR). The right-hand graph
75
Voice Fundamentals
Figure 9.5 Constant Bit Rate vs. Variable Bit Rate

shows bandwidth only being used while the person is actually talking (rt-VBR). In practice, the actual amount of bandwidth used will vary depending upon what is being spoken, hence the term VBR. Section 9.2.4 looks at a feature known as Speech Activity Detection (SAD) that exploits the VBR nature of speech.
9.1.4 Statistical Multiplexing with ATM

One of the key benefits of using ATM in preference to other technologies is its ability to statistically multiplex all types of traffic onto the same data link. Figure 9.6 shows a comparison between bandwidth utilization on TDM and ATM links, both of which are carrying voice, LAN, and SNA channels. In the case of TDM, there is a finite amount of bandwidth allocated to each channel. The channel cannot exceed its allocated bandwidth. Even if it is not being used at a particular time, the resulting unused bandwidth cannot be used by any other channel.
Figure 9.6 Statistical Multiplexing with ATM
76
Voice Fundamentals
This is in contrast to ATM where, at any time, the overall bandwidth used is the sum of the combined bandwidth of all the channels. Any spare bandwidth is available for any channel to use. The difference can be seen by simply looking at the amount of total bandwidth necessary to support the traffic being sent. Both diagrams show the same amount of bandwidth for each traffic type. In the case of the TDM, a larger amount of total bandwidth is needed to support all three traffic types, and the spare bandwidth (shown in white) is not available for use. In the case of ATM, the overall amount of bandwidth required is smaller, and the spare bandwidth is available for other channels to use.
9.2 Voice and Telephony Over ATM (VTOA)

There is a lot of activity within the ATM Forum for the development of standards for Voice and Telephony Over ATM (VTOA). A high level view has been provided of voice support over ATM, along with a look at some of the issues involved and a summary of the key standards. Techniques for VTOA include the support of a complete structured PBX data stream (over DS1 or E1) as well as the support of single or multiple voice channels (sometimes referred to as trunking) over a single Virtual Channel Connection (VCC). In the former case, circuit emulation techniques are typically used. In the latter case, voice samples, whether they be PCM or compressed speech samples, need to be inserted into ATM cells using an ATM Adaptation Layer. The choice of AAL depends upon the specific application in use, and a number of different AALs are proposed including AAL1, AAL5 and AAL2. A summary of the currently defined standards is as follows: AF-SAA-0032.000 - "Circuit Emulation Service Interoperability Specification": Supports structured (fractional) and unstructured DS-1 and E1 CBR services over AAL1. AF-VTOA-0078.000 - "Circuit Emulation Service Interoperability Specification Version 2.0": Defines circuit emulation services using AAL1 for both structured and unstructured services. The structured services are modeled on fractional DS1/E1/J2 circuits, and allow one or multiple (as needed) voice channels to be carried on a single VCC. This is in contrast to unstructured services which define transparent transmission of the data streams and are modeled on regular leased lines. AF-VTOA-0083.000 - "Voice and Telephony Over ATM to the Desktop": This specification defines the use of AAL5 for voice support over ATM to the desktop. It provides the ability to support switched connections with one 64 Kbps CBR voice channel per
77
Voice Fundamentals
VCC. It also defines interworking support with Primary Rate Interface (PRI) and Basic Rate Interface (BRI) services and interworking with private and public network interfaces. AF-VTOA-0085.000 - "Dynamic Bandwidth Utilization in 64 Kbps Timeslot Trunking Over ATM Using CES": Defines how voice channels from a digital voice interface can be carried only when they are active, thus providing dynamic bandwidth utilization. The method for detecting voice activity is beyond the scope of this booklet. AF-VTOA-0089.000 - "ATM Trunking Using AAL1 for Narrowband Services Version 1.0": Effectively an extension to the capabilities of AF-VTOA-0078.000, this specification provides additional flexibility in how voice channels can be carried across an ATM network. Additional capabilities include call-by-call routing, bandwidth-on-demand, bandwidth sharing, and the support of DS1 and E1 with signaling. In addition to these specifications, AAL2 is defined for the support of low and variable bit-rate trunking applications where single or multiple voice channels are carried in a single VCC (see section 9.1.2).
9.2.1 Voice Sample Cellification

To carry voice over ATM requires a block of voice samples to be inserted into the payload of the ATM cell.
Figure 9.7 Voice Sample Cellification

Figure 9.7 shows the principle behind this cellification process. Before a cell is transmitted onto the network, it first needs to be filled with voice samples. If these are PCM samples (i.e. 8000 samples per second) from a single voice channel, then they will arrive from PBX 1 every 125ms. If the cell payload is 47 bytes long (see example in Section 9.2.2), it will take 47 x 125ms (5.875 ms) before the cell is full and can be transmitted into the network. This demonstrates one of the penalties to be paid when using
78
Voice Fundamentals
voice over ATM (or indeed any cell/frame based system), namely delay. However, 5.875 ms of delay is small, and not significant as far as voice is concerned. At the exit point of the network, the cell enters the destination ATM multiplexer, where the contents of the cell are emptied and passed onto the interface for onward transmission to PBX 2. Due to the nature of a cell-based network, a certain amount of delay may be imposed on the cells transmitted from one end of the network to the other. Furthermore, due to variations in the loading on various cell switches within the network, this delay may vary. This is known as Cell Delay Variation (CDV). To overcome the potential problem of running out of cells at the destination multiplexer due to CDV, a buffer must be introduced at the receive end of the connection (see Figure 9.7). The introduction of this buffer creates additional delay. Section 6 discussed saving bandwidth in enterprise networks through speech compression. The general effect of compression is to reduce the number of bits that are available to be placed into cells at the sending end, increasing the cellification delay. For example, speech transmission was compressed from 64 Kbps to 16 Kbps ADPCM, the cellification delay would rise significantly from 5.875 ms to 23.5 ms. There are two methods that can be used to overcome delay. One approach is to partially fill the cells before transmission, although this counteracts the performance benefits of compressing the speech. The other technique is to multiplex multiple voice channels within one VCC.
9.2.2 Speech Activity Detection

Speech Activity Detection (SAD), also known as silence suppression, is an important function that is performed by some cell-based ATM voice systems. This approach allows valuable bandwidth to be conserved by not transmitting cells during periods of silence. Bandwidth savings can be significant, since in a typical conversation silence accounts for up to 60% of the speech traffic in each direction. However, bandwidth savings should not come at the expense of voice quality. If SAD is not performed properly, it can be a potential source of voice signal impairment. There are four significant aspects of SAD that can affect speech quality: Attack time - This is the time taken to detect the start of a speech burst after a period of silence. A very long attack time causes the front end of the speech burst to be clipped.
79
Voice Fundamentals
Therefore, the interpolation implementation must respond very quickly to the start of a speech burst. Speech threshold - This is the level above which the signal is considered to be speech, and below which the signal is considered to be silence. Traditional silence detection compares the energy of the speech signal to a pre-selected energy threshold level. Typically, speech is detected if the signal stays above the threshold for more than a predetermined period of time; silence is detected if the signal is below the threshold for more than a predetermined period of time (hangtime). Such an approach does not take into account the discrepancy between varying or constant background noise. Consequently, the approach could detect real speech periods as silence when the background noise is low, and real silence as speech when the background noise is high. A more accurate speech detection method adapts the threshold relative to the background noise to prevent premature speech clipping. Hangtime - This is the time taken to detect the end of a speech burst (the start of a period of silence). If the hangtime is set to a very short duration in an attempt to maximize bandwidth savings, inter-syllable pauses can be misconstrued as silence periods. If the hangtime is set to a very long duration, entire periods of silence can be missed. The optimal hangtime is one which strikes a balance between speech quality and bandwidth savings. Comfort Noise - A potential drawback with SAD is that during periods of silence when no cells are being transmitted, the receiving end will hear absolute silence, not even background noise. Many systems overcome this by injecting what is known as "comfort noise" into the voice path at the receiving end during these periods. The level of noise injected will be set to be similar to the background noise of the sending end.
9.3 PBX Synchronization Across an ATM Network

When connecting PBX phone systems directly with digital interfaces, via a TDM network or via an ATM network, the PBXs need to be synchronized to one another. By their very nature, ATM networks may or may not operate in a synchronous manner internally, and may not be synchronized with the attached PBXs. Therefore, a method is required to maintain synchronization on an end-to-end basis. The recommended method for PBX synchronization is to derive timing for the PBXs at each end of a link from a common primary reference source. When this is not possible, ITU-T I.363 suggests two methods for providing end-to-end synchronization when initiating circuit emulation with AAL1. The first of these approaches is known as
80
Voice Fundamentals
Synchronous Residual Time Stamp (SRTS), and the second Adaptive Clock Recovery (ACR).
9.3.1 Synchronous Residual Time Stamp (SRTS)

In principle, the SRTS method conveys information from one end of the connection to the other about the frequency difference between a common reference clock, such as derived from SDH/SONET, and the service clock received from the PBX. For this method to work, it is a requirement that the same derived network clock is available at both ends. This would be the case with transmission via SDH or SONET, or if the ATM multiplexers were connected via a master/slave clocking relationship.
Figure 9.8 Synchronous Residual Time Stamp (SRTS)

The principle of clock transfer is simple, and is demonstrated in Figure 9.8. The service clock frequency from PBX 1 is divided by a fixed value to produce a signal with a period T. The number of cycles of the common reference clock occurring in period T is counted, and this number is transmitted to the destination as the Residual Time Stamp (RTS) using the CSI bit in the AAL1 header. The receiver can then reconstruct the service clock from the RTS and the common reference clock, and use this to drive the bit stream into PBX 2. A more detailed description of SRTS can be found in ITU-T I.363.
9.3.2 Adaptive Clock Recovery (ACR)

A less common approach to end-to-end synchronization is Adaptive Clock Recovery (ACR). Figure 9.9 shows the principles of operation of ACR. At the destination end of a connection over an ATM network, the received information is written into a buffer whenever a cell arrives. Due to the asynchronous nature of ATM and Cell Delay
81
Voice Fundamentals
Figure 9.9 Adaptive Clock Recovery

Variation (CDV) in the network, the buffer fill rate will not be consistent. However, the bit stream that is transmitted to the attached PBX is controlled by a local clock. To synchronize this clock to the speed of the PBX interface at the far end of the network, the local clock frequency is controlled by the fill level of the buffer. If the buffer starts to empty, the clock is running too fast and a phase-locked loop mechanism slows the clock down until the buffer can maintain its mid-point fill level. Conversely, if the buffer starts to fill, the clock is running too slowly and is sped up. If one or more of the cells is lost en route from source to destination, the interface can identify this through the use of the sequence number, or simply by use of a timer. Since a synchronized clock still needs to be generated out to the interface, a dummy cell is inserted into the buffer.
9.3.3 Independent Timing

Another alternative method of providing timing for each end of an ATM connection is to derive it from an independent clock source at each end of the connection. This is currently considered as a feasible mechanism for delivering voice to the desktop in environments where it is likely that a telephone will connect into a personal computer (PC) to provide the service. The independent clock source could be the PC's internal clock, or another source located on the telephone interface card. One potential problem is that these clock sources are inherently inaccurate and clock slips can occur. However, there are several methods that can be used to minimize the effect on voice quality. For example, for speech transmission systems with silence suppression, intelligent buffer management can be used at the network egress point to ensure that any clipping or clicks will not be heard.
82
Voice Fundamentals
10 Voice Over Frame Relay

Over the last few years, the use of frame relay has grown at an explosive rate and continues to do so today. This section begins with a brief introduction to frame relay, followed by a look at how voice is supported along with some of the issues involved.
10.1 Introduction to Frame Relay

Frame relay is a communications standard that was originally defined for use over ISDN, although it has since found wide acceptance for data transport, including LAN traffic in both private and public networks. In 1991, the Frame Relay Forum was established in order to promote the interoperability of products and services from different vendors. The Frame Relay Forum defines implementation agreements that represent an agreement by all members of the frame relay community as to the specific manner in which standards will be applied. As with ATM Forum standards, many reference are made to existing ITU-T Recommendations. Frame relay developed from X.25 but offers much greater efficiencies and speed of operation, albeit at the expense of error protection. However, frame relay is assumed to operate over low error rate digital lines, removing the need for error protection. If frames are corrupted in a network they are discarded, and it is the responsibility of the end devices to identify the loss and initiate recovery by re-transmitting the frame (if possible). Frame relay operates by taking information from the source, such as LAN data, and packaging it into a frame along with a few bytes of overhead. The structure commonly used is defined in ITU-T Recommendation Q.922 Annex A, and is shown in Figure 10.1.
Data Link Connection Identifier (DLCI) - Used for the link addressing function. Command/Response (C/R) - Determines if the frame is a Command or Response. Forward Explicit Congestion Indication (FN) - Indicates congestion to receiver. Backward Explicit Congestion Indication (BN) - Indicates congestion to sender. Discard Eligibility (DE) - Identifies frame to be discarded in preference to others. Frame Check Sequence (FCS) - Used to identify errors in the frame.
Figure 10.1 Frame Relay Frame
Section 10 - Voice Over Frame Relay
83
Voice Fundamentals
Two other formats using 3- and 4-byte addresses are also defined in Q.922, but they are not commonly used. The Frame Relay Forum defines an information field size of at least 1600 bytes, which is sufficient to cope with most LAN requirements. Larger or smaller information fields can be agreed to between the network and the user. In practice, information field sizes of up to 4096 bytes are not uncommon. Frame relay is a connection-oriented protocol where a path must be set up through a network from source to destination before data frames can be sent. Both Permanent Virtual Circuits (PVCs) and Switched Virtual Circuits (SVCs) are supported, allowing multiple Virtual Circuits (VCs) to be carried on the same physical link. The DLCI field is then used by the end devices on a link to differentiate between different VCs, and route the frames to the appropriate destination. Frame relay is a very efficient method of communication over a range of speeds up to and including E3 (34.368 Mbps) and DS-3 (45.736 Mbps). The primary benefit of frame relay derives from its ability to statistically multiplex traffic from different sources. In other words, bandwidth on a frame relay link is only used by a source when that source has traffic to send. If a particular source has no traffic to send, then the bandwidth on the link is potentially free for other sources to use. The rapid growth in frame relay usage over the last few years has been primarily driven by the interconnection of LANs (Local Area Networks). A typical arrangement is shown in Figure 10.2, where multiple LAN routers need to be connected together. Only one
Figure 10.2 Typical Application for Frame Relay
84
Voice Fundamentals
physical link is needed between each router and the network, with different DLCIs being used to differentiate one VC from another. It should be noted that the number used for a particular DLCI only has local significance on a particular link.
10.2 Voice Over Frame Relay (VoFR)

LAN interconnection has been the dominant application for frame relay. However, users are now becoming interested in transmitting other types of traffic, including voice, over frame relay networks. Historically, voice support over frame relay was not thought to be practical due to the potential delays that the network could impose. Frame relay networks have tended to evolve utilizing high-speed switching technologies such as ATM, which will readily support delay-sensitive applications such as voice. As a result, Voice over Frame Relay (VoFR) has become a reality, and the Frame Relay Forum has ratified standards for its support (i.e. FRF.11 and FRF.12). Meanwhile, a large number of vendors have already developed Frame Relay Access Devices (FRADs) for VoFR, using proprietary methods of carrying voice samples within a frame relay frame. A typical view of a network using FRADs is shown in Figure 10.3.
Figure 10.3 Typical Application using FRADs
In this case, the FRADs are taking inputs from a number of different voice and data devices and multiplexing them across the links into the frame relay network, where a number of Virtual Circuits (VCs) are set up between FRADs. Please note that FRAD products often provide a wider range of interfaces than is shown here.
85
Voice Fundamentals
10.2.1 Delay on Frame Relay Networks

One key aspect of supporting voice transmission across a network is that it requires a fixed, minimal levels of delay. While today's frame relay networks offer fairly low delay levels, delay and delay variation (jitter) from one frame to the next continue to be a problem. Voice requires a constant flow of frames, so a mechanism is needed to overcome jitter. This is usually handled by a de-jitter buffer, which stores frames at the input to the FRAD to avoid bursts of traffic into the device. Frame relay is designed to multiplex data from many sources onto a single link, with frames from different sources being multiplexed on a first-come-first-served basis. In the case of voice frames, this is not sufficient due to the possible delay and delay variation caused by buffering frames before transmission across the network. As a result, voice frames are given a higher priority than frames carrying less delay-sensitive traffic such as LAN data. For example, if a LAN frame is queued for transmission at the output of a FRAD, and a voice frame then arrives from the voice interface, the voice frame will "jump the queue" and be transmitted first. However, if the LAN frame is already being transmitted when the voice frame arrives, the voice frame will have to wait until the LAN frame had been completely sent. If the LAN frame is large, which is quite possible, then the voice frame could be delayed for quite a long time. To alleviate this problem, many of the existing FRAD products will fragment large frames into a number of smaller ones, thus keeping delay to a minimum. This approach keeps delay in the range of 5 to 10 ms, which is acceptable for voice transmission.
10.2.2 VoFR Standards

As highlighted earlier, most products supporting VoFR today do so in a proprietary manner, and will only support interoperation between two similar devices. The Frame Relay Forum has approved a new Implementation Agreement (IA), FRF.11 [29] for the standardized support of voice over frame relay (VoFR). FRF.11 describes methods for the following: The transport of compressed voice within the payload of a frame relay frame The support of a diverse set of voice compression algorithms The effective utilization of low-bit rate frame relay connections The multiplexing of up to 255 sub-channels on a single frame relay DLCI The support of multiple voice payloads on the same frame, or different subchannels within a single frame The support of data sub-channels on a multiplexed frame relay DLCI
86
Voice Fundamentals
Figure 10.4 gives an example that demonstrates how FRF.11 describes the support of voice over frame relay. In this example, a single frame is shown carrying two separate voice channels. The frame relay header represents the normal header information (DLCI etc.) as shown in Figure 10.4. LI is the Length Indicator bit, and is used to indicate whether or not a one octet length byte is present. It is clear (set to 0) if the channel is the only channel or the last channel in the frame, in which case the length byte is absent. The LI bits of all sub-frames preceding the last one are set to "1". EI identifies the length of the channel ID as being either 6 or 8 bits, and allows either 63 or 255 channels to be supported in the frame. If set to "0" then it also implies that the payload will be either Embedded ADPCM (G.727) or CS-ACELP (G.729) speech, depending upon whether Class 1 or Class 2 compliance is required. Class 1 refers to support capabilities for high bit-rate frame-relay interfaces, and Class 2 refers to low bit-rate interfaces.
Figure 10.4 VoFR Frame

The payload type identifies what is actually being carried in the payload, including speech, signaling, fax relay data, or a silence information descriptor. FRF.11 provides support for of a wide range of speech coding schemes as summarized in Figure 10.5. Included are details of the coding rates, the number of bytes carried for one voice channel in each sub-frame, and the associated packetization delay. Many of the coding types allow for multiple blocks to be carried in the same sub-frame. M is referred to as the packing factor and can be set over a range, depending on the coding scheme that is being used.
87
Voice Fundamentals
Figure 10.5 VoFR Frame Details

In addition to defining support for a wide range of speech compression techniques, FRF.11 also covers other voice-related services, including signaling information and encoded fax traffic. It describes how signaling can be carried either as Channel Associated Signaling (CAS), Common Channel Signaling (CCS), or as in-band dialed digits.
10.3 Benefits and Issues of VoFR

10.3.1 Benefits
Frame relay networks continue to grow at an explosive rate, and the ability to integrate voice into traditionally data-only environments will deliver great benefits to network managers by minimizing the costs of consolidating voice and data. As with ATM, a range of other benefits also exist. Frame relay offers statistical multiplexing of traffic, making optimum use of the bandwidth available. Speech compression further reduces the bandwidth needed on links (see Section 5), although care needs to be taken when supporting non-voice traffic such as modems or fax machines. To further increase bandwidth utilization, frame relay is well-suited to techniques such as silence suppression, which can result in 50% savings of network bandwidth (see Section 9.2.4 on SAD on ATM).
88
Voice Fundamentals
10.3.2 Issues
Probably the most significant issue of supporting voice over frame relay is delay, as discussed in Section 10.2.1. This is a particularly significant issue in networks where multiple router hops are involved. The level of delay incurred on even a single hop are likely to be high, necessitating the use of echo cancellation techniques as discussed in Section 10. Many products used within frame-relay backbone networks support voice through a prioritization process similar to that described in Section 10.2.1. However, many of these products are cell-based, and expect voice to be carried in cells rather than large frames. This allows cells to be prioritized without significantly affecting the frames that are being held back as lower priority. If the voice frame is as large as allowed by the standard, giving it a high priority may adversely affect the lower-priority frames. As a result, frames should be kept relatively small, ideally at levels below 100 bytes. Although not discussed in the context of frame relay, wherever voice is supported on digital interfaces methods for enabling synchronization between PBXs will need to be provided. This is an area which still requires standardization. Speech compression, which is likely to be used extensively in applications of voice over frame relay, creates its own issues which are discussed in Section 5.7.
89
Voice Fundamentals
90
Voice Fundamentals
11 Voice Over IP
Internet Protocol (IP) networks have become the corporate standard for LAN and WAN data traffic, with users ranging from small home offices to large multinational organizations. In fact, growth in the use of IP has been so great over the last few years that a new version of IP, known as IP version 6 (IPv6), or IP Next Generation (IPng) [22] has been defined to deal with a number of issues existing with the present version of IP (IP version 4). As a result of its popularity, IP has developed to become probably the most flexible networking protocol in use today. IP networks run under the most widely used network operating systems, are highly scalable, and work well with routers, making them ideal for transmitting data over a wide area network. The demand for new applications is growing, including voice and fax over IP, both in corporate IP environments as well as for Small Office Home Office (SOHO) applications. Probably the best known IP network in existence today is the Internet. While voice over the Internet has attracted a certain amount of attention, performance issues including large transmission delays and poor voice quality have prevented the Internet from being used extensively for corporate voice traffic. Part of the problem has been the inability to support guaranteed levels of performance. However, corporate and other private IP networks can be more effectively managed, allowing real-time network traffic such as voice to be supported through detailed network engineering that allows performance levels to be defined.
11.1 What is IP?

IP is a networking protocol which was originally designed primarily to support data. Today, IP version 4 is the most common version, as described in RFC 791 [23]. Other protocols commonly used with IP include the Transmission Control Protocol (TCP) [24] and the User Datagram Protocol (UDP) [25]. TCP is designed to provide a guaranteed delivery service for data packets over IP, while UDP provides a non-guaranteed delivery service. However, neither of these protocols guarantee support of real-time applications such as voice. In order to support real-time applications, other protocols have been developed to run over IP networks. These include the Real Time Protocol (RTP) [26] and the Resource reSerVation Protocol (RSVP).
Section 11 - Voice Over IP
91
Voice Fundamentals
Real Time Protocol (RTP) - Defined in RFC 1889, RTP provides an end-to-end delivery service for real-time traffic applications, including voice and video over unicast (point-topoint), or multicast (point-to-multipoint) connections. It provides services including payload type identification, sequence numbering, time-stamping, and delivery monitoring. In conjunction with RTP, RFC 1890 [27] describes how audio and video may be carried within RTP. RTP was developed to operate independently of the underlying transport and network layers. While it is common to run RTP on top of UDP and IP, it can also be used with other protocols such as ATM or IPX. Integral to RTP is the Real-time Transport Control Protocol (RTCP) which is used to provide feedback on the quality of the received data on a connection. Reservation Protocol (RSVP) - The basic idea of RSVP is to reserve network bandwidth for a flow of information so that a specified Quality of Service (QoS) can be given to that information flow. The general process behind RSVP is to use a "flow descriptor" to request a particular QoS from the network. When the information is transmitted, it is then given the necessary resources in the network to guarantee a certain level of performance. This process is similar to the process used for defining a traffic contract on an ATM network, and turns the connectionless nature (best-try) of IP into a connection-oriented (reliable) service. H.323 - The H.323 standard provides a foundation for audio, video, and data communications across IP-based networks, including the Internet. By complying to H.323, multimedia products and applications from multiple vendors can interoperate, allowing users to communicate without concern for compatibility. H.323 is an umbrella recommendation from the International Telecommunications Union (ITU) that sets standards for multimedia communications over Local Area Networks (LANs) that do not provide a guaranteed Quality of Service (QoS). These networks dominate today's corporate desktops, and include packet-switched TCP/IP and IPX over Ethernet, Fast Ethernet, and Token Ring network technologies. The H.323 specification was approved in 1996 by ITU Study Group 16. Version 2 was approved in January 1998. The standard is broad in scope and includes both stand alone devices and embedded personal computer technology, as well as point-to-point and multipoint conferences. H.323 also addresses call control, multimedia management, and bandwidth management, as well as interfaces between LANs and other networks.
92
Voice Fundamentals
H.323 is part of a larger series of communications standards that enable videoconferencing across a range of networks. Known as H.32x, this series includes H.320 and H.324, which address ISDN and PSTN communications, respectively.
11.1.1 How Voice Over IP Works

Figure 6.4 shows a possible configuration of a network supporting voice over IP. The actual implementation will be vendor-specific, however, this diagram shows one possible solution. To place a call from one telephone to another, a user dials the number corresponding to the destination phone. The local PBX interprets the digits and routes the call to the locally attached Voice over IP Gateway on either an analog or a digital interface card. The gateway maps the destination telephone number to the IP address of the gateway at the location of the destination phone. The source gateway then establishes a route to the destination. If a protocol such as RSVP is available, then it is used to request allocation of bandwidth in the network. The call is connected and proceeds with IP packets carrying digitized voice samples being transmitted between the two ends.
11.1.2 Benefits and Issues of Voice Over IP

As with voice integration into other packet, frame, or cell-based enterprise networks, benefits of significant bandwidth savings can be derived by using techniques such as speech compression and silence suppression.
Figure 11.4 Voice Over IP
93
Voice Fundamentals
Voice over IP is an emerging technology, and many issues exist which need to be understood. The main concern is the issue of delay, which can include voice coding delays, packetization delays, delays across the wide area network, and de-packetization and anti-jitter delays. As discussed earlier, IP does not provide quality of service guarantees, and other mechanisms are required to provide them. RSVP has been defined in order that bandwidth can be allocated to a particular connection via an IP wide area network. Most routers do not yet support RSVP and when it is supported, the protocol places a considerable processing burden on the routers. Meanwhile, networks such as the Internet can impose delay levels up to several seconds.
11.1.3 Standards
Standards for Voice over IP are set by the Voice Over IP Forum, a group of Internet telephony vendors organized under the auspices of the International Multimedia Teleconferencing Consortium (IMTC). VoIP standards are based on ITU-T Recommendation H.323, "Visual Telephone Systems and Equipment for Local Area Networks which Provide a Non-Guaranteed Quality of Service". This recommendation is based of the Real-Time Protocol/Control Protocol (RTP/RTCP) as described briefly in section 6.3.1. H.323 defines the support of five encoding algorithms: G.711, G.722, G.728, G.723.1, and G.729, with G.723.1 having been chosen as the default for Internet telephony applications.
94
Voice Fundamentals
12 Voice Switching in the Enterprise Network

Today, organizations are under mounting pressure to provide increased levels of connectivity and greater bandwidth availability to geographically dispersed user groups throughout the business. At the same time, they must consider network consolidation strategies to control and even reduce overall networking costs. Although voice traffic predominates on today's networks, sporadic data traffic, on-demand video, and image traffic is making the networking environment increasingly complex. Consequently, users are becoming more conscious of the bandwidth consumption of their applications, and are looking for ways to reduce bandwidth costs. As a result, two objectives have risen to the forefront of network purchasing decisions: 1. Realizing overall network cost savings. 2. Maintaining or enhancing current levels of user-perceived quality (especially with regard to voice services). Cell switching delivers the promise of supporting a variety of network applications - such as voice, video, data, and image - to efficiently share expensive bandwidth resources, while simultaneously delivering service quality that meets or exceeds user requirements. Cell switching is of significant value to voice transport. Section 9.1 demonstrated that by using Speech Activity Detection (SAD), a cell-based system can selectively transmit cells only when they contain useful digitized voice information. When combined with
Figure 12.1 Number of Voice Circuits on a 2Mbit/s Link With and Without SAD
Section 12 - Voice Switching in the Enterprise Network
95
Voice Fundamentals
speech compression, this provides significant bandwidth savings, allowing more channels to share the costly inter-site trunk circuits. Figure 12.1 provides a guide as to the number of voice circuits that can simultaneously share a 2 Mbps trunk, both with and without SAD, assuming that voice traffic occupies about 80% of the 2 Mbps link capacity. Furthermore, when a voice circuit is inactive, the bandwidth allocated can be made available for other traffic.
12.1 The Evolution of Voice Networks

In the 1980s, intelligent corporate PBX networks were built using digital T1/E1 leased line connections as shown in Figure 3.2. Since then, TDM multiplexer solutions have been adopted by many corporations to reduce bandwidth costs by integrating voice traffic with data and video traffic. However, the designs of PBX networks have remained largely unchanged, with PBX systems typically connected via TDM networks. Figures 12.2 and 12.3 show two network scenarios used for connecting PBXs via wide area networks.
Figure 12.2 PBX Network Overlaid Onto Multiplexers Using CAS

Figure 12.2 shows the configuration used when running Channel Associated Signaling (see Section 7.2 for information on CAS) on the PBXs. In this case, each PBX is connected to a multiplexer via one or more 1.544 Mbps links, the number being dictated by the total number of voice circuits required. By using CAS, the multiplexer is able to
96
Voice Fundamentals
interpret the signaling bits for each channel and route them on a per-channel basis along with the channel itself. This enables one PBX digital trunk to serve multiple destinations. In addition, should a channel connection fail within the network, the multiplexers can indicate this back to the relevant PBXs with a Backward Busy signal so that it does not use that particular channel. The multiplexers also offer other benefits, including the ability to compress speech, support automatic channel re-routing in the event of failure, and so on. There are also a number of drawbacks with this approach, all centered around the fact that switching is conducted within the PBXs. For example, a call placed from Telephone A to Telephone B will need to be routed from PBX 1, via PBX 4, and onto PBX 5. If speech compression is used, multiple compressions and decompressions will occur, resulting in greater distortion of the speech. In addition, two timeslots are used on the link into PBX 4 which is wasteful in terms of bandwidth. Figure 12.3 shows the configuration used when running Common Channel Signaling (see Section 7.2.2 for information on CCS) on the PBXs. With CCS, it is far more complicated for the multiplexer to interpret the signaling from the PBX than with CAS. Therefore, to create paths between PBX 1 and PBXs 2 and 4, separate interfaces are required on PBX 1. The number of links required at PBX 1 is dictated by the number of other PBXs connected to it, and is independent of the number of channels that are actually being used.
Figure 12.3 PBX Network Overlaid Onto Multiplexers Using CCS
97
Voice Fundamentals
Again, the benefits derived using this approach include the ability to compress speech and support automatic channel re-routing, as well as the added benefits of using CCS itself. The drawbacks are the same as with CAS, with two notable additions: First, the number of interfaces on the PBXs and the multiplexers is increased when using CAS. Second, if a channel connection should fail within the multiplexer network, the multiplexers have no way of signaling this to the PBXs on a per-channel basis. This is because the multiplexers would need to be able to generate CCS messages themselves. Instead, the multiplexer needs to set a Remote Alarm Indication (yellow alarm) to the PBXs on the interfaces that carry the failed channel. The PBX sees this as a trunk-level alarm and busies out the whole trunk.
12.2 What is Voice Switching?

The current view of voice networks suggests that all PBX phone systems should be able to connect to the corporate enterprise network, and that all PBXs will be able to establish a direct connection to any other PBX. Put more simply, the network would act like a large transit (or tandem) PBX (see Figure 12.4).
Figure 12.4 Enterprise Network Voice Switching

This approach requires that the network can interpret the signaling from the PBX, including such information as the called number, and then based on this information
98
Voice Fundamentals
make an intelligent routing decision. For example, if a call is placed from Telephone A to Telephone B as shown in Figure 12.4, PBX 1 routes the call into the enterprise network. However, with voice switching the enterprise network inspects the dialed number and routes the connection to the appropriate destination. The call is therefore routed directly to PBX 5 rather than needing to be sent via a tandem PBX. Voice switching can be supported with CAS or CCS signaling, although a higher level of functionality can be achieved with CCS due to the additional information carried in the call setup messages. This requires the network to be able to handle the specific protocols used, including NIS, ETSI or QSIQ, etc. It is envisaged that PBX systems will still need to connect to other types of networks, such as PSTN or Virtual Private Networks, so that calls can be routed over a number of different paths. This allows the network to be more resilient to backbone failures, as well as allowing overflow of peak period traffic onto cheaper networks.
12.3 Why Perform Voice Switching?

Voice switching in the enterprise network brings many benefits: Reduction in PBX hardware - Investment in PBX and backbone network hardware can be reduced if the transit capability is undertaken by the wide area network. The number of digital PBX interfaces required is based on traffic volumes, rather than on the number of neighboring PBXs. Maintenance of end-to-end voice call quality - Many users of PBX networks have had bad experiences with speech compression in their networks. The obvious bandwidth savings achieved through compression can be offset by the problems associated with trying to maintain voice quality in a multi-hop PBX network. A large majority of corporations using multiplexers to support their PBX networks either use no compression, or standard 32 Kbps ADPCM designed specifically to operate in a multi-hop PBX environment. By following this and other criteria, such as limiting end-to-end delay, voice quality can be maintained across the network. The use of more exotic (proprietary) protocols, with high levels of delay, has resulted in poor or inconsistent end-to-end voice quality when used in anything other than point-to-point voice environments. The ability to limit speech compression/decompression to a single cycle has risen to one of the prime drivers for voice switching in an enterprise network. Since the wide area network acts like a tandem node in a star network, only one compression/decompres-
99
Voice Fundamentals
sion cycle is encountered per call. This is also a regulatory requirement in some countries, such as Germany, for calls breaking out to the PSTN. Individual channel busy-back - Most implementations of TDM multiplexers can only indicate routing problems to a PBX by busying out all channels on the El/Tl primary rate interface via the RAI (yellow alarm). Because the multiplexing equipment does not provide intelligent signaling on the D-channel, it must rely on physical level attributes to provide feedback to the PBX. This approach is inefficient in congestion or failure scenarios, as all subsequent calls must then be re-routed even if only a single PBX trunk (time slot) cannot be routed by the network. For the network to be effective as a transit PBX, it must work intelligently with the Dchannel call control signaling. For example, if a call cannot be routed due to unavailable bandwidth, then the local network node needs to signal the originating PBX on the Dchannel. The PBX must be informed that the individual call cannot be routed by the network and should use an alternate route, such as the PSTN. Voice/data discrimination - Currently, for voice and data to operate in an integrated PBX network across multiplexers that support compression, the data channels must be segregated from the voice in order to avoid compression. With CCS it is common that at call setup, the type of call is identified in the call setup message (see Section 7.2.2). Since the network is interpreting this signaling, voice and data can be discriminated between and compression used only where appropriate. Bandwidth savings - In traditional PBX networks operating over multiplexer networks, calls that route through tandem PBXs will consume more network bandwidth than those that do not. This is because bandwidth is consumed at the input and output of the tandem PBX, and often demands less efficient network routing paths. In extreme cases, the network design can allow several multiples of the original call bandwidth to be consumed by a multi-hop call. Often this use of bandwidth is unavoidable, but some network managers do not fully appreciate the cost implications. With voice switching in the enterprise network, each call is routed over the most direct path at the time the call is made, resulting in large savings in bandwidth.
100
Voice Fundamentals
APPENDIX A - Company and Product Overview

Nortel Networks - A Brief History
Having patented the telephone in the USA in 1876, Alexander Graham Bell did the same in Canada in 1877, where 75 percent of the Canadian Patent was assigned to his father, Melville. Within two years, Melville Bell had sold his share to National Bell, the predecessor of AT&T. In 1880, the Bell Telephone Company of Canada was formed. The manufacturing branch flourished, and in 1895 was incorporated as a separate company called Northern Electric and Manufacturing Company Limited. In 1899 the Bell Telephone Company of Canada purchased a Montreal wire and cable factory, which later would be called the Imperial Wire and Cable Company Limited. In 1914 the two companies amalgamated to form The Northern Electric Company Limited. Forty four percent was owned by Western Electric, the manufacturing subsidiary of AT&T Company of the United States, and the remainder by Bell Canada. The Western Electric involvement continued until the Consent Decree of 1956, when Western Electric terminated its patent and licensing relationship with Northern Electric. Northern Electric began to achieve technical independence in 1957 by creating its own research and development facilities in Belleville, Ontario, and in 1959 established Northern Electric Research and Development Laboratories in Ottawa, Ontario. At the same time, Bell Canada began stepping up its own development activities. In 1971, Northern Electric and Bell Canada merged their R&D activities to form Bell Northern Research (BNR). In the early seventies BNR began developing what it called the "E-thing", an electronic switching system. By late 1972, Northern Electric had its first electronic switch on the market, called the SG1 (but also known as PULSE) private branch exchange (PBX). Within 3 years, some 6,000 SG1 systems had been sold. The real story behind the E-thing is its evolution to an all-digital switch called the SL-1. The SL-1 design was so successful that it was logical to extend the new, powerful, digital technology to the central offices (COs), or public exchanges. BNR developed the DMS10 for small COs in 1977, followed two years later by the DMS-100 digital switch for large central offices. In 1976 Northern Electric changed its name to Northern Telecom Limited. Since then, Northern Telecom has gone from strength to strength, developing new products,
Appendix A - Company and Product Overview
101
Voice Fundamentals
breaking into new markets, as well as making existing systems more reliable, powerful, and cost-effective. More recently, the Enterprise Networking portfolio of products has been developed to address the emerging and rapidly growing broadband multimedia markets. In 1995, Northern Telecom unveiled a new corporate logo that would be used globally as the company's unique corporate signature. The name Nortel, which the corporation's subsidiaries and joint ventures around the world already used, is a shortened form of Northern Telecom. The legal name of the corporation has, however, remained Northern Telecom Limited. In 1998, Nortel acquired Bay Networks, Inc. to form Nortel Networks, the leading provider of Unified Networks capable of delivering converged data/voice networking solutions. By adding world-class, IP-based data communications capabilities to complement and expand Nortel's acknowledged strengths, a powerful new business entity has been created to carry the success of Northern Telecom into the 21st century.
Nortel Networks
As a leading global provider of digital networks, Nortel Networks delivers solutions for wireless, broadband, enterprise, and carrier networks. Nortel Networks works with customers worldwide to design, build, and integrate digital networks for communications, information, entertainment, education, and business. According to Dataquest (Wide Area Network Equipment Market Share and Forecast Report), Nortel Networks is a leading provider of WAN equipment worldwide, and is the leading provider of enterprise ATM and frame relay solutions. The latest Yankee Group report (Market Analysis Report on Data Communications Hardware) positions Nortel Networks as the global market leader in ATM Wide Area Network equipment. The Nortel Networks worldwide research capabilities include a network of research and development facilities, affiliated joint ventures, and other collaborations fostering innovative product development and advanced design research. With some 38 research and development sites in sixteen countries and territories, R&D spending in 1998 totaled approximately US $2.45 billion. Nortel Networks reported 1998 revenues of US $17.58 billion. The company operates in more than 150 countries and territories, and comprises over 80,000 employees worldwide.
Unified Networks
Unified Networks create greater value for customers worldwide through network solutions that integrate data networking and telephony. The Unified Networks strategy extends to solutions, products, and services, delivering new economics in networking by
102
Voice Fundamentals
reducing costs, introducing higher revenue services, and delivering new value derived through networking. The emergence of the World Wide Web and increasing market deregulation has created a strong demand for networks that provide increased profitability and higher service levels for organizations of all types. Unified Networks from Nortel Networks deliver cutting-edge solutions to reach these new levels of economics. To meet increased needs, Nortel Networks is delivering a new class of customer relationships. Extranets, intranets, Web access, e-mail, call centers, and old-fashioned personal attention are combined to help customers deal with a wide range of new, challenging, and potentially confusing issues. Whether a customer is at the enterprise level, a service provider, or a small business, Nortel Networks delivers Unified Networks to meet their unique business challenges.
Passport Solutions from Nortel Networks

The Passport enterprise network switch (ENS) supports multiple network services on a high-performance, high-capacity networking platform. Passport solutions have gained increased recognition for their ability to fulfill a number of networking roles in enterprise and service provider networks. In enterprise networks, Passport delivers a cost-effective network consolidation switching solution offering lower costs, greater functionality, and readiness for offering new multimedia applications to users. Passport consolidates voice, video, and data services onto a single network using a variety of trunking options. Corporations can use Passport to build private networks over leased lines, evolving existing TDM networking equipment to ATM to support new multimedia applications at a lower cost. Passport can also be connected to a service provider's ATM network, enabling enterprises to take advantage of the performance and cost advantages of public ATM services. In addition, Passport enterprise switches can operate across a hybrid network of private lines and public ATM networks. Passport's scalability offers cost-effective network connectivity for a range of enterprise network site requirements - from the largest corporate headquarters location to the smallest branch office locations. The platform is available in the 16, 5, and 3-slot 6400 series. Nortel Networks also offers the Passport 4400 series of multiservice access devices for branch office locations.
103
Voice Fundamentals
As part of the Nortel Networks product portfolio, Passport enables enterprise customers to deploy a network architecture that is optimized to their requirements. Enterprise network customers benefit from the ability to rapidly deploy new high-performance multimedia services and applications, creating a competitive advantage with the business network. Significant tangible benefits can also be realized by consolidating traffic from all of the enterprise's networks and applications onto a single platform. Passport also offers functionality covering legacy data, IP/LAN through multimedia, and ATM, which can address access through high-capacity backbone requirements. Nortel Networks network management is an open solution based on industry standards. It addresses a wide range of equipment from Nortel Networks, as well as from other vendors. The features and capabilities of Passport services for enterprises are: ATM voice services interLAN switching IBM networking transparent data services managed network services
Passport's flexibility to support a range of services positions it as a best-in-class consolidation switch for enterprises. Passport also supports smooth growth and evolution from traditional data, to frame relay, to ATM services for enterprises.
Management Systems
Nortel Networks management systems provide a comprehensive set of solutions designed to address the operational requirements of enterprise network managers. Unified Management from Nortel Networks delivers the following business and functional values: Increase in operational efficiency Reduce operational costs Gain competitive advantage Managing the LAN, WAN and Telephony network as a system Delivering pragmatic directory-based policy management Monitoring and enforcing end-to-end application service levels Operational simplicity
104
Voice Fundamentals
Unified Management is delivered by Optivity's suite of management applications. Optivity is the leading LAN management solution and is being integrated with leading WAN and Telephony management tools. Optivity is an easy-to-use, pragmatic, and comprehensive management solution for network, policy and service management.
LAN Solutions
Nortel Networks features an integrated Ethernet solution that spans the extended enterprise and consists of innovative, market-leading LAN and WAN products that drive not only price/performance, but offer an array of best-in-class features to meet the demanding needs of business-critical networks. This end-to-end solution delivers innovative priority, security, and resiliency capabilities at a price unmatched in the industry. Until now, Ethernet networks could not provide guaranteed delivery for business-critical applications, such as SAP, Oracle Financials, or voice and video. Now, a breakthrough integrated Ethernet solution from Nortel Networks delivers guaranteed priority and resilience at a fraction of the cost of other Ethernet solutions. The powerful synergy between BayStack enterprise-level stackable switches, market-leading Accelar routing switches, Centillion LAN-ATM switches, Centillion 1000 multiservice switches, and Passport 6400/4400 series multiservice switches delivers a fail-safe networking solution that eliminates downtime and provides guaranteed levels of performance for business-critical applications. This synergistic solution begins in the wiring closet, and extends throughout the wide area network.
Digital Switching
Nortel is a digital switching leader, providing a full line of enterprise network switching systems, including the Meridian 1, Meridian SL-100, Multi-Media Carrier Switch (MMCS), Mercator and Norstar.
Meridian 1
The Nortel Networks Meridian 1 is a digital Private Branch Exchange (PBX) designed to support between 30 and 10,000 lines. It evolved from the highly successful SL-1 system, and was the first PBX designed for use in the global marketplace. This system offers customers a wide range of options to cope with different language needs, as well as corporate networking requirements. Meridian 1 is the best selling telephone system of its type in the world, with a global market of 14 per cent. Since its launch in 1990, over 120,000 systems have been sold, carrying more than 31 million lines in over 100 countries. In Europe alone, 7 million lines have been sold in over 24 countries, giving the Meridian 1 a 15 per cent share of the addressable market in this region.
105
Voice Fundamentals
Flexibility and future-proofing - The Meridian 1 is an example of the Nortel Networks "Evergreen" philosophy, which allows customers to upgrade systems as their organizations evolve and grow. This is achieved through the system's modular architecture which enables customers to add telephone sets, additional lines and extensions, as well as feature options. The system is designed to keep pace with new and emerging technologies, including videoconferencing and broadband transmission, simplifying upgrades and eliminating the need for total upgrade or change of equipment. The different telephone sets - Meridian M3110, M3310 M3820 and M2216 - offer customers access to the range of features and facilities they have programmed into the system, or purchased as optional extras. These range from simple autodial and call forwarding to more sophisticated ACD and conferencing capabilities.
Meridian SL-100
Based on a carrier-grade platform, the Meridian SL-100 is the largest Nortel Networks PBX in the digital switching portfolio, providing a solution to multimedia communications in large commercial enterprises and government environments. It has the flexibility to address current and future capacity and applications needs with its 60,000 digital voice or data line capacity.
Multi-Media Carrier Switch (MMCS)

MMCS is the leading entry-level carrier-grade switch. Designed to provide the bridge between carrier and enterprise networks, the switch delivers advanced carrier functionality and intelligence through such enhanced services as calling card support and VPN.
Mercator
Mercator is a state-of-the-art, voice and data, multi-site communications system that provides sophisticated PBX features to smaller organizations or locations. The system supports the applications used by Meridian 1, as well as providing full interworking with Meridian 1 networks.
Norstar
Norstar is the best-selling small business communications system around the world. Norstar systems are designed and built for integrated communications and are futureready, adapting to changing business needs.
106
Voice Fundamentals
Multimedia Communications
The Nortel Networks suite of Symposium applications are designed to harness today's sophisticated media, creating an integrated portfolio of products and services to improve productivity and enhance customer relationships. Symposium is based on industry standards and commercially-available hardware. Open platform flexibility translates to rapid application development for teleconferencing, electronic commerce, one-to-one marketing, and other customized solutions.
Mobility
Companion provides an in-building wireless communication system to ensure that key individuals can be reached when they are away from their desks. The system consists of centralized controller units, radio base stations and portable telephones that provide facility-wide coverage. The system can be easily integrated with the existing telephone exchange, enabling customers to make use of all regular features while on the move about the building. Because there are no air-time charges, Companion substantially lowers the cost of mobile communication in the workplace.
107
Voice Fundamentals
108
Voice Fundamentals
APPENDIX B - Introduction to Decibels

Power levels used in telecommunication systems can vary significantly from one place to another, resulting in ratios that can involve very large numbers. As a result, decibels are used instead of absolute powers, because by using the logarithm of the power ratio more manageable numbers are obtained. Another factor leading to the use of logarithmic measures is that the ear is logarithmic in sensitivity. For example, a doubling in power delivered to the eardrum will be just enough for most people to notice. To double the perceived loudness requires the power to be increased by a factor of about ten. The ear also has a very large dynamic range, with the range of signal levels that can be perceived as sound extending from power levels at the eardrum of about 0.000000000000001 Watt up to about 0.001 Watt. This range can be more easily described as -120 dBm up to 0 dBm (see below for a description of dBm). Because of these facts, telecommunication networks use logarithmic measures of power and ratios to describe signal levels as follows: The Decibel (dB) - The decibel is a logarithmic measure of the difference in power level between two points. For example, it can be used to express the gain or loss between two ports of a telephone exchange or the gain or loss of a transmission line. If the powers to be related are P1 and P2, then: Ratio in dB = 10 log10 P2 P1 The sign of the dB value (+ or -) identifies which power is greater. If positive, then P2 is greater than P1; if negative, then P2 is less than P1. dBm - dBs refer to ratios of power, they do not refer to an absolute power level. To measure absolute level the term dBm is used, which is the absolute power expressed as the number of dB relative to 1 milliwatt (mW). For example, -6 dBm means a power level of -6 dB (a ratio of a quarter) relative to 1 mW. 0 dBm means a power level of 0 dB (a ratio of 1) relative to 1 mW. dBr - dBr is a term used to express the difference in decibels between any point in a voice network and a reference point chosen as a zero relative level point. dBr is used to express relative transmission levels. It is normal in a digital network to select the digital path as the 0 dBr point.
Appendix B - Introduction to Decibels
109
Voice Fundamentals
dBm0 - A value of power given in dBm0 gives the actual power level of a signal with reference to a point of 0 dBr. It can be used to express the levels of signaling tones, noise, average speech levels, etc. For example, with reference to Figure B.1, if an analog interface on a digital PBX that has a transmit relative level of +6 dBr and a receive relative level of -8 dBr is considered, given an actual signal from the telephone of -9 dBm, this is equivalent to a reference power of -15 dBm0. This is because -9 dBm into the interface will result in a power of -15 dBm at the 0 dBr point. Similarly, in the opposite direction, an actual power of -8 dBm measured at the telephone set results from a power of 0 dBm at the 0 dBr point, and is therefore equivalent to 0 dBm0.
Figure B.1 dBr reference levels
110
Appendix B - Introduction to Decibels
Voice Fundamentals
APPENDIX C - Glossary of Terms

A-law AAL International version of 64 Kbps PCM digital speech. Defined in G.711. ATM Adaptation Layer. Used within an ATM network to convert the data from an end-user application into a form that fits into ATM cells. Available Bit Rate (ATM service category). Alternating Current. Adaptive Clock Recovery. A method of achieving end-to-end PBX synchronization over an ATM network. Signaling system where signaling information is carried as in-band tones on a 4 wire circuit. Defined in BTNR (British Telecom Network Requirement) 181 but also adopted in some other European countries. Adaptive Differential Pulse Code Modulation. Speech compression technique offering speeds at 16, 24, 32 and 40 Kbps. Defined in G.726. Alarm Indication Signal. A sequence of all 1s on a transmission system. Sometimes referred to as Blue Alarm. Alternate Mark Inversion. Physical level encoding technique. Asynchronous Transfer Mode. A communications technology that transports all data types in fixed sized cells of 53 bytes. Broadband-Integrated Services Digital Network. A service or system requiring transmission channels capable of supporting rates greater than the primary rate (1.544 Mbps or 2.048 Mbps). Binary Eight Zeroes Substitution. Typically used on DS-1 digital PBX interfaces to overcome a low density of binary "1"s in the digital stream. Bell Northern Research. Previous name for the Nortel Networks Research and Development organization. The organization now comes under the name Nortel Networks Technology. Basic Rate Interface. Channel Associated Signaling. Used on digital interfaces for signaling. Constant Bit Rate. Refers to a bit stream that has a fixed bit rate. The International Telegraph and Telephone Consultative Committee. Now renamed the ITU-T.
ABR AC ACR AC15
ADPCM
AIS AMI ATM B-ISDN
B8ZS
BNR
BRI CAS CBR CCITT
Appendix C - Glossary of Terms
111
Voice Fundamentals
CCS
CDV CELP CO CPE CRC CS-ACELP CSS CSU CTS CVSD D4 dB dBm dBm0 dBr DC DCME DDI
DL DPCM DPNSS DS-1 DSI
Common Channel Signaling. A data channel is used to carry signaling messages between telephone exchanges, typically timeslot 16 on an E1, and timeslot 24 on a DS-1. Cell Delay Variation. In ATM, the variation in delay that the cells in a particular cell stream may experience across the network. Code Excited Linear Prediction. A form of speech compression with many implementations, some proprietary and some standard. Central Office. North American term for the PTO local exchange. Customer Premises Equipment. Cyclic Redundancy Check. Conjugate Structured-Algebraic Code Excited Linear Prediction. Version of CELP operating at 8 Kbps and defined in G.729. Composite Source Signal. Customer Service Unit. A device which terminates the T1 connection at the customer premises. Clear To Send. Continuously Variable Slope Delta. Speech compression offering rates from 16 Kbps to 64 Kbps depending upon sampling rate. Framing type used on 1.544 Mbps primary rate digital interfaces. Decibel. A logarithmic measure used to describe the power ratios. An absolute measure of power. Equal to the number of decibels relative to 1 milliwatt. 0 dBm = 1 mw. The absolute power at a point in a network relative to the 0 dBr point. Decibels relative. Used to express relative transmission levels. Direct Current. Digital Circuit Multiplication Equipment. Direct Dial In. A facility that allows users the ability to dial from a public network into a PBX, and to a person's extension, without going via the operator. Data Link. Another name used for the FDL on ESF framed T1 links. Differential Pulse Code Modulation. A speech compression technique. The basis of ADPCM. Digital Private Network Signaling System. Digital interface speed of 1.544 Mbps. Digital Speech Interpolation. A technique which makes use of the gaps in speech to reduce the amount of bandwidth required for transmission.
112
Voice Fundamentals
DSS1 DTMF E&M E1 ECMA EDSS1 EOC ER ERL ESF
ETSI Euro-ISDN FDL FRAD FX
FXO
FXS
HDB3 HDLC Hz IP ISDN
Digital Signaling System 1. Generic term for the ITU-T defined ISDN signaling system. Dual Tone Multi Frequency. Ear & Mouth. The signaling leads used in E&M analog trunk interfaces. Refers to a transmission link that operates at a speed of 2.048 Mbps. European Computer Manufacturers Association. European-DSS1. Another term used to describe Euro-ISDN. Embedded Operations Channel. Another name used for the FDL on ESF framed T1 links. Earth Recall. A facility on some telephone sets to provide a momentary grounding of one of the interface leads. Echo Return Loss. The amount of signal reflected back to the four wire path from a hybrid circuit. Extended Superframe. Framing type used on 1.544 Mbps primary rate digital interfaces. Offers enhanced capabilities over D4 framing. European Telecommunications Standard Institute. European wide implementation of ISDN. Based on ETSI specifications which are themselves based on ITU-T Recommendations. Facility Data Link. A data channel carried within the ESF framing structure on a T1 link. Frame Relay Access Device. A device developed to carry voice samples within a frame relay frame. Foreign Exchange. A term used in North America to describe a telephone service from a CO that is remote from (foreign to) the subscriber's local exchange area. Typically an interface on a multiplexer system that connects to a line card on an exchange to extend that interface out to a remote office. Typically an interface on a multiplexer system that connects to a telephone set which is remote from the exchange that the telephone is connected to. High Density Bipolar 3. Used on E1 digital trunks. Defined in G.703. High Level Data Link Control. Hertz. A measure of frequency in cycles per second. Internet Protocol. Integrated Services Digital Network.
113
Voice Fundamentals
ISO ITU-T kHz LAN LAPD LD-CELP Loop Disconnect m-law MCDN MFAS MF MIPS MOS
NLP NNI Nrt-VBR PABX PAM PBX PCM PVC PMBX POTS PPS PRI PSS1 PSTN Pulse Dial
International Standards Organization. Telecommunication Standardization Sector of the International Telecommunication Union. Formerly the CCITT. Thousands of Hertz. Local Area Network. Link Access Procedures on the D channel. Layer 2 protocol used with various Common Channel Signaling protocols. Low Delay-Code Excited Linear Prediction. Low delay version of CELP operating at 16 Kbps and defined in G.728. A technique used for dialing digits with a basic 2-wire telephone set. PCM type commonly used in North America and Japan. Defined in G.711. Meridian Customer Defined Networking. High functionality CCS protocol used between Meridian 1 PBXs. Multi-Frame Alignment Signal. Multi Frequency. Million Instructions per Second. Mean Opinion Score. In our context it is a subjective measure of the quality of speech transmission systems such as with speech compression. Non Linear Processor. A function in an echo canceller used to remove residual echo. Network-to-Network Interface. Non Real Time Variable Bit Rate (ATM service category). Private Automated Branch Exchange. Usually shortened to PBX. Pulse Amplitude Modulation. Used in the process of converting from an analog signal to digital PCM. Private Branch Exchange. Pulse Code Modulation. Standardized method of producing digital speech. Defined in G.711. Permanent Virtual Circuit. Private Manual Branch Exchange. Plain Old Telephone Service. Basic telephone service on a standard 2-wire telephone. No added features. Pulses Per Second. Primary Rate Interface. Private Signaling System 1. Another term used to describe QSIG. Public Switched Telephone Network. See Loop Disconnect.
114
Voice Fundamentals
QD QDU QoS QSIG R&D RAI RPE-LTP RSVP
RTS rt-VBR Rx SAD
SAF
SEAL SOHO SRTS SS7
SVC T1 TBR TDM Toll Quality Tie Trunk Tip/Ring
Quantization Distortion. Quantization Distortion Unit. Quality of Service. Internationally defined inter-PBX signaling standard. Defined in ETSI specifications. Research & Development. Remote Alarm Indication. Also referred to as Yellow Alarm. Regular Pulse Excitation with Long Term Prediction. Speech compression technique used on GSM mobile telephone system. Resource Reservation Protocol A method of reserving network bandwidth, enabling a specified QoS to be assigned to selected data flows. Ready To Send. Data interface signal. Real Time Variable Bit Rate (ATM service category). Received signal. Speech Activity Detection. Often called silence suppression. A technique used to stop the transmission of bits during periods of silence. See DSI and SAF. Speech Activity Factor. The proportion of time that a person is actually making sound during a conversation. Typically 40% of the time. Simple Efficient Adaptation Layer. The same as AAL5 in ATM. Small Office Home Office. Synchronous Residual Time Stamp. A method of providing end-toend synchronization when initiating circuit emulation with AAL1. Signaling System number 7. Internationally defined inter-public telephone exchange signaling standard. Also known as CCS7, CCITT No. 7." Switched Virtual Circuit. Refers to a transmission link that operates at a rate of 1.544 Mbps. Timed Break Recall. Time Division Multiplex. The same quality speech as heard over the public telephone network. The name sometimes used to describe a trunk connecting (or tying) together PBXs in a private network. The names sometimes used for the connections of a telephone circuit. The terms are derived from the plug used on manual exchanges. Tip was connected to the tip of the plug and Ring to the slip ring around the plug.
115
Voice Fundamentals
Tx UBR UNI VBR VC VoFR VTOA WAN X.25
Transmit signal. Unspecified Bit Rate (ATM service category). User Network Interface. Variable Bit Rate. Refers to a bit stream that has a variable bit rate. e.g. speech with Speech Activity Detection operating. Virtual Circuit. Voice over Frame Relay. Voice and Telephony Over ATM. Wide Area Network. Internationally accepted standard for packet switching networks.
116
Voice Fundamentals
APPENDIX D - References
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] ITU-T Recommendation Q.23 - Technical features of push-button telephone sets. ITU-T Recommendation I.420 - ISDN Basic user-network interface. ITU-T Recommendation G.711 - Pulse code modulation of voice frequencies. ITU-T Recommendation G.732 - Characteristics of primary PCM multiplexer equipment operating at 2048 Kbps. ITU-T Recommendation G.703 - Physical/electrical characteristics of hierarchical digital interfaces. ITU-T Recommendation G.704 - Synchronous frame structures used at primary and secondary hierarchical levels. British Telecom Network Requirement BTNR 181 - Signaling System AC15. ITU-T Recommendation I.110 - Preamble and general structure of the I-series recommendations for the Integrated Services Digital Network (ISDN). ITU-T Recommendation Q.700 - Introduction to CCITT Signaling System Number 7. British Telecom Network Requirement BTNR 188 - Digital Private Network Signaling System Number 1 (DPNSS). QSIG - The Handbook for Communication Managers, Interconnect Communications Ltd. ITU-T Recommendation I.113 - Vocabulary of terms for broadband aspects of ISDN. ITU-T Recommendation I.363 - B-ISDN ATM adaptation layer (AAL) specification. ITU-T Recommendation G.726 - 40, 32, 24, 16 Kbps adaptive differential pulse code modulation (ADPCM). ITU-T Recommendation G.728 - Coding of speech at 16 Kbps using low-delay code excited linear prediction (LD-CELP). ITU-T Recommendation G.729 - Coding of speech at 8 Kbps using conjugate structure algebraic code excited linear prediction. ITU-T Recommendation P.80 - Methods for subjective determination of transmission quality. ITU-T Recommendation G.113 - Transmission impairments. ITU-T Recommendation P.83 - Subjective performance assessment of telephone-band and wideband digital codecs. ITU-T Recommendation G.164 - Echo suppressors. ITU-T Recommendation G.165 - Echo cancellers. RFC 1883: Internet Protocol, Version 6 (IPv6) Specification RFC 791.
Appendix D - References
117
Voice Fundamentals
[24] [25] [26] [27] [28] [29]
RFC 793. RFC 768. RFC 1889. RFC 1890. ATM Forum Traffic Management Specification Version 4.0 (af-tm-0056.000) Frame Relay Forum Voice over Frame Relay Implementation Agreement FRF.11
118
Appendix D - References
Voice Fundamentals
For any queries relating to this booklet or Nortel Networks products, please contact Nortel Networks as indicated below: Canada Nortel Networks 8200 Dixie Road Brampton, ON L6T 5P6 Canada T. 1-800-4-NORTEL (1-800-466-7835) United States Nortel Networks 4401 Great America Parkway Santa Clara, CA 95054 T. 1-800-822-9638 Europe, Middle East, and Africa Nortel Networks EMEA, S.A. Les Cyclades-Immeuble Naxos 25 Allee Pierre Ziller Valbonne, 06560 France +33-4-92-96-69-66
Asia Pacific Nortel Networks Asia South Pacific 151 Lorong Chaun 02-01 New Tech Park Singapore 556741 +65-287-2877 Caribbean and Latin America Nortel Networks CALA Inc. 1500 Concord Terrace Sunrise, FL 33323-2815 954-851-8886
Acknowledgments:
Thanks are due to the ITU-T for their permission to make use of diagrams and information from various ITU-T Recommendations as listed in Appendix D. All rights reserved. No part of this booklet may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of Nortel. Copyright 1999 Northern Telecom Limited. All rights reserved. NORTEL, NORTEL NETWORKS, the NORTEL NETWORKS corporate logo, the globemark design, HOW THE WORLD SHARES IDEAS, Passport, Meridian 1, Meridian SL-100 and Norstar are trademarks of Northern Telecom Limited. Bay Networks, BN, BLN, BCN, and Optivity are registered trademarks, and Accelar, BayStack, Centillion, Optivity Enterprise, and System 5000 are a trademarks of Bay Networks, Inc. All other brand and product names are trademarks or registered trademarks of their respective holders. Information in this document is subject to change without notice. Northern Telecom Limited assumes no responsibility for any errors that may appear in this document.
Part Number: GD505-3406EC-A

Voice Fundamentals Book

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Voice Fundamentals Book

Загружено:

Авторское право:

Доступные форматы

Voice Fundamentals

PBX Phone Systems

Introduction to Digital Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

Speech Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

Echo and Echo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

7 Introduction to Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

7.2.2.1 7.2.2.2 7.2.2.3 7.2.2.4

.60 .60 .60 .61

Voice Within the Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . .65

Voice Over Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . .71

10 Voice Over Frame Relay

11 Voice Over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91

12 Voice Switching in the Enterprise Network

1.1 A Short History of the Telephone

2.1 Key Voice Fundamentals

Section 2 - The Telephone

2.2 Basic Operation of the Telephone

2.2.1 Basic Telephony - Signaling

Section 2 - The Telephone

Figure 2.1 Operation of Loop Disconnect Dialling

Figure 2.2 DTMF Frequencies

Section 2 - The Telephone

Figure 2.3 Two wire Telephone Set Interface

Section 2 - The Telephone

2.2.2 Basic Telephony - The Speech Path

2.3 Other Types of Telephones

Section 2 - The Telephone

2.3.1 "Two-wire" vs. "Four-wire" Telephony

Section 2 - The Telephone

3 PBX Phone Systems

3.1 Introduction to the PBX

Figure 3.1 Typical Components of a Digital PBX

Section 3 - PBX Phone Systems

3.2 Call Routing in a PBX

Section 3 - PBX Phone Systems

Figure 3.2 Call Routing in a PBX Private Network

3.3 Voice Interfaces on a PBX

Section 3 - PBX Phone Systems

4 Introduction to Digital Voice

4.1 The Channel Bank

4.2 Digital Voice - Pulse Code Modulation (PCM) G.711

Section 4 - Introduction to Digital Voice

Figure 4.1 Progress of speech signal Analogue to Digital

Section 4 - Introduction to Digital Voice

Figure 4.2 Analogue to PCM Conversion

Figure 4.3 Linear/Uniform and Non-linear/Non-uniform Quantisation

Section 4 - Introduction to Digital Voice

4.2.1 A-law and -law PCM

4.2.2 Power of a Digital Signal

Section 4 - Introduction to Digital Voice

4.2.3 Distortion Resulting from the Digitization Process

4.3 The Digital 1.544 Mbps PBX Interface (DS-1)

4.3.1 Physical Interface

Section 4 - Introduction to Digital Voice

Figure 4.4 HDB3 Coding

4.3.3 Framing - Extended Superframe (ESF)

Section 4 - Introduction to Digital Voice

Figure 4.5 G.704 Frame Structure for 2.048Mbit/s E1

4.3.4 Channel Associated Signaling (CAS) on DS-1

Section 4 - Introduction to Digital Voice

4.3.5 Common Channel Signaling on DS-1

4.3.6 DS-1 Alarms

4.4 The Digital 2.048 Mbps PBX Interface (E1)