Вы находитесь на странице: 1из 5

A NEW SPEECH PROCESSING SCHEME FOR ATM SWITCHING SYSTEMS

Takao SUZUKI, Osamu NOGUCHI, Kiyoshi YOKOTA and Yasuo SHOJI

Digital Communications Laboratories, Oki Electric Industry Co., Ltd. 4-10-3, Shibaura, Minato-ku, Tokyo 108 Japan
ABSTRACT

From the viewpoint of speech communication services for Asynchronous Transfer Mode (ATM) network and i n order t o introduce the necessary conditions for speech processing over an ATM network, w e have developed a new speech processing scheme applied at the end of the ATM network. For this speech processing, speech signals are processed basically by t w o techniques; these are silence deletion for speech compression and low-bit speech coding for 32-kbiVs Adaptive Differential PCM (ADPCM). In order t o reduce speech quality degradation caused by lost ATM cells in network congestion conditions, we have proposed a new cell reconstruction algorithm using waveform substitution for ADPCM-coded speech based on the pitch estimation method. In addition, t o maintain good speech quality we applied some new algorithms for speech processing. Therefore w e confirmed through subjective evaluation tests t h a t the proposed speech processing scheme for the ATM network could provide good speech quality up t o a cell loss rate of about 3% as for public communications. We have developed t w o kinds of custom LSls for implementing these speech processing alogrithms.
1.
INTRODUCTION

network which provides high performance use o f the network and maintains good speech quality. We have carried out this development for experimental equipment which i s combined w i t h a conventional analog 2-wire subscriber interface. The SPU is useful in common for the application of speech terminals t o an ISDN I-interface and a future ATM-interface. In section 2, functions for the ATM interface and speech processing algorithms are proposed. Custom LSls which were used t o implement these speech processing algorithms are given in section 3. Performance evaluations by segmental SIN measurements and subjective tests are made i n section 4. Features o f ADPCM cell reconstruction and feasibility for speech communications are discussed in section 5.
2.

FUNCTIONS AND ALGORITHMS OF SPEECH PROCESSING UNIT

2.1

Functions for ATM Interface

For future multi-media communication systems the integrated packet switching systems using Asynchronous Transfer Mode (ATM) technology could become the most likely leading candidates, and many studies have been carried out i n this field. Broadband architecture for ATM network applications has been set forward w i t h CCITT Study Group XVlll by the Broadband Task Group (BBTG). I 1 1 The ATM systems seem t o have the flexibility for future indistinct communication services. However it is important t o assure the feasibility for speech communications which may well become the earliest application as a major communication service in ATM network. The ATM i s a transport multiplexing technique for Broadband ISDN (BISDN), where the information is packed into fixed-size "ATM cells" as relatively short length packets. For speech communications over an ATM network, in order t o reduce speech quality degradation caused by lost ATM cells during network congestions, we have proposed a new cell reconstruction using waveform substitution for ADPCM-coded speech based on the pitch estimation method. In addition, t o maintain good speech quality we have applied some new algorithms for speech processing. This paper describes the development and evaluation of the Speech Processing Unit (SPU) at the end of the ATM

Fig. 1 shows the functional block configuration for an ATM interface which is composed of a Speech Processing Unit (SPU) and a Cell Assembler/Disassembler (CAD). The functions of the SPU transmitter are speech detection, silence detection and ADPCM encoding. Those of the SPU receiver are ADPCM cell reconstruction, ADPCM decoding, silence reconstruction and noise injection. In an ATM cell assembler of CAD, speech samples are collected until they fill an ATM cell for cell assembly. An ATM cell disassembler of CAD has the functions of cell disassembly, delay jitter compensation and cell loss detection. For the functions o f SPU and CAD for ATM interface support circuit-mode speech services for the adaptation layer functions over an ATM network, we have chosen a short cell information field size o f 32 octets for speech samples which correspond t o ADPCM speech packetization blocks of 8 ms. 2.2 Speech Detection and Silence Detection

On the SPU transmitter, the key technique of speech detection i s precise estimation of the background noise power during pause or silence duration. 121 By calculation of the silence noise power an adaptive power threshold for speech detection and a noise level code for silence reconstruction are obtained. The signal power is estimated w i t h a 8 ms moving window which has 64 speech samples of PCM coding, so it is possible t o precisely detect the power variation of the speech waveforms. The short-time power P , made by a low-pass operation method 151 using the previous power P , and the input signal X , can be calculated as :

49.6.1.
CH2655-918910000-1515$1 .OO 0 1989 IEEE

1515

___-

-_

P , = (1-

-). 64

P,-1

+-

64 x n 2

......

The speechlsilence detection V made by comparing the power P , with the adaptive power threshold PTH can be given as :
1 (speech) 0 (silence)

(if Pn 2 PTH) (if Pn < PTH)

,....

(2)

The final decision of talkspurt or active speech duration is the combination of the speech detection time tV and the hangover time t H as follows:
tA tA

tV

tH

.......

(3)

The speech signal i s delayed by 8 ms relative t o the speech decision t o minimize front-end clipping. This 8 ms delay i s suitable t o ADPCM speech packetization for an ATM cell size of 32 octets. For the adaptation of PTH,the silence noise power Ps is produced by t w o operational methods. Calculating the average power PN during 256 ms, the minimum value of PN is renewed as long as the silence duration is continued. This renewal PN becomes Ps. If the speech duration is continued up t o 5 s, the minimum value of PN within 5 s becomes Ps. Using Ps with these operations, PTH is calculated by:
PTH = a Ps
.......

discarded. The speech information loss caused by the discarded ATM cell leads t o the degradation of speech quality. In order t o reduce this degradation, speech information reconstruction must be carried out. 141 In the predictive residual waveform of 32-kbiVs ADPCM, t h e voiced part o f speech consists o f periodic pulse waveforms corresponding t o the speech source pitch. We confirmed the presence of the periodicity of the residual w a v e f o r m o f 3 2 - k b i t l s ADPCM b y observing t h e autocorrelation value. To reconstruct waveforms it i s important t o compensate f o r t h e continuous pulse waveform i n the voiced part of ADPCM-coded speech. Therefore, w e propose a new cell reconstruction algorithm that estimates the pitch period in residual waveforms just b e f o r e a cell i s discarded. We use t h e m o d i f i e d autocorrelation function Ri(k) at time i 151 for the pitch extraction from a residual waveform of ADPCM-coded speech as follows:
N-1

Ri(k) =

m =O

X(i-m)*X(i-m-k)

.......

(5)

kL 5 k 5 kH where, X(i) is the ADPCM decoded value, and kL, kH are the minimum and maximum number of samples during the existence of the pitch period respectively. A t time j of each ADPCM code sample, the autocorrelation function is given by : R(j, k) = R(j-1, k) + X(j) . X(j - k)
1SjSN

(4)

where, the coefficent a = 10.6and the range of PTH from - 60 dBmO t o - 30 dBmO. To maintain speech quality, w e applied a longer hangover t i m e proportional t o t h e power threshold increase. 131 Therefore control of the adaptive hangover time is carried out i n the range of variable time from 20 ms t o 160 ms. The 16-level silence code based on the silence noise power Ps is encoded as a 4-bit silence code for silence reconstruction at the receiver. 2.3 Silence Reconstruction and Noise Injection

.......

(6)

On t h e SPU receiver, t h e a l g o r i t h m f o r silence reconstruction provides the generation o f pseudo-noise w h i c h has H o t h spectral characteristics o f t y p i c a l background noise. 121 The 16-level pseudo-noise is decoded as the 4-bit silence code from the transmitter. This generated noise is injected into the silence duration i n order t o keep speech communications natural. 2.4 ADPCM Encoding and ADPCM Decoding

where, R(0, k) = 0 and R(N, k) = Ri(k). Ri(k) is calculated from Eq. (5) and (6) for N samples just before the discarded cell, and k corresponding t o t h e maximum value o f Ri(k) becomes t h e p i t c h period estimation value T. In the period of the discarded cell the ADPCM decoder input sample I,(n) at time n i s substitued for the ADPCM code sample I(n-T) before T sample, as follows: I,(n) = I(n-T)

.......

(7)

According t o our experimental study of the ADPCMcoded speech pitch, w e have chosen the practical value for calculating Ri(k) in Eq.(5) as: N = 32, kL= 24, kH = 128. 3. 3.1
DEVELOPMENT

OF SPEECH PROCESSING UNIT L S k

The 32-kbit/s ADPCM coding method for low-bit rate speech coding i s used. The synchronization between ADPCM e n c o d i n g a n d ADPCM d e c o d i n g needs initialization provided at the front-end cell of each active speech duration. It is necessary t o pinpoint the front-end cell for ADPCM synchronization on the receiver side. The 32-kbit/s ADPCM algorithms selected for SPU are G.721 ADPCM and Advanced ADPCM algorithms which guarantee the speech quality of public communications. 2.5 Cell Reconstruction for ADPCM-Coded Speech

Hardware Configuration

The speech information in the ATM cell which is coded with 32-kbiUs ADPCM is switched and transmitted over the ATM network. In the case of network congestion o f communication traffic, t h e ATM cell is statistically

The ATM interface with the adaptation layer functions i s put at the end of the ATM network. It comprises the Speech Processing Unit (SPU) and ATM Cell Assembler and Disassembler (CAD). The SPU is composed of the Advanced ADPCM CODEC LSI [61 and t w o kinds of signal processing LSlS. As shown in Fig. 1, LSI-1 receives the PCM signal from the Subscriber Line Interface Circuit (SLIC) through the Echo Canceller (EC) and carries out speech detection, then passes the active speech information t o the ADPCM encoder. After converting the PCM signal t o the ADPCM code, the cell information is input t o the ATM cell assembler. On the other hand, the cell information from the ATM cell disassembler i s received by LSI-2 and t h e lost cell

49.6.2.
1516

information is reconstructed and then input t o the ADPCM decoder. After converting the ADPCM code t o the P C M signal, the PCM signal is again input t o LSI-2. Then LSI-2 adds the pseudo-noise produced from the white noise t o the silence duration. Then the continuous PCM signal i s transferred t o the SLIC through the EC. 3.2
LSI Architecture

speech samples from 2 males and 2 females. In order t o evaluate an accurate improvement effect of speech quality, we applied 5-point grades as used with the relative opinion score. [7l To obtain reference criteria for normal telephone use, noise and naturalness o f speech were evaluated as fol Iows: Scores 4 3 2 1 0 Impairment Scale lmpercepti ble Perceptible but not Annoying Slightly Annoying Annoying Very Annoying

LSI-1 and LSI-2 have the same LSI architecture but have different micro-programs for the digital signal processor (DSP). The LSI has architecture specifications with powerful performance for autocorrelation and power calculation, suitable memory capacity for cell reconstruction and direct interfacing for ADPCM CODEC, EC and CAD. For example the autocorrelation calculation R (j, k) in Eq. (6) is processed in t w o cycles using the pipeline technique. This makes possible real-time and no-delay operation for cell loss compensation.

Fig. 2 shows the signal processing LSls developed for SPU. The LSI is produced using 2-pm C-MOS technology with a chip size of 8.1 x7.2 mm2. The power consumption i s 90mW and the cycle time is 180 ns.
4.

Fig. 6 shows the relative opion score as a function of the cell loss rate for the proposed pitch estimation method using Advanced ADPCM-coded speech. The proposed pitch estimation method shows a remarkable improvement in speech quality for the discarded cell. It i s possible t o maintain good speech quality up t o a cell loss rate of 3% using cell information of 32 octets.
5.
DISCUSSION

PERFORMANCE EVALUATIONS FOR SPEECH PROCESSING UNIT

5.1

Features of ADPCM Cell Reconstruction

4.1

Segmental S/N Measurements

Using computer simulation, the ADPCM-coded cell information is randomly discarded. The segmental S/N is measured t o confirm the effect on cell reconstruction. The segmental S/N (SNSEG) measurement is given by:
63

E
I=o

s(n-i)Z

When the embedded ADCPCM coding 181 is applied t o the ATM network, it needs, for example, t o divide ADPCMcoded speech into ATM cells of high-priority and those of low-priority. The low-priority cells are discarded i n extreme congestion conditions at a network node. Hence the processing delay for reconstructing ATM cells may be increased by priority processing. The proposed ADPCM cell reconstruction w h i c h substitutes lost cells for ADPCM code based on the speech pitch estimation method has the following features:

.
*

.
5.2

No-priority processing requirement at network. Short processing delay for reconstructing speech. Good speech quality at a few percent cell loss rate Feasibility for Speech Communications

where, S(n-i) is the input signal of the ADPCM encoder, Sr(n-i) is the reconstructed output signal of the ADPCM decoder and P is the number of cells i n the active speech duration. Fig. 3 shows the segmental S/N as a function of the cell loss rate for the G.721 ADPCM. Fig. 4 shows the same relationship for the Advanced ADPCM. In both cases, the improvement of the segmental SIN is obtained over the full range of the cell loss rate for the pitch estimation method as compared with the zero substitution method. 4.2 Subjective Evaluation Tests

The real-time evaluation equipment as shown i n Fig. 5 w i t h a pseudo-configuration o f the ATM network is comprised of t w o SPUs. Using this equipment, we have e v a l u a t e d t h e t o t a l speech q u a l i t y f o r speech communications in an ATM network. The speech terminal is an analog telephone, and the speech signal is transmitted t o the SPU through the 2-wire analog line, the SLIC and the EC. The speech information is randomly discarded in the Digital Impairment (DI) block which corresponds t o the pseudo-network. The informal subjective evaluation test with the relative opinion score was implemented by 20 listeners using 5 s

The proposed speech processing scheme does not require any particular priority processing at each network node in the the ATM network. A t the ATM node composed of transmission and switching systems ATM cells must be transferred at high speed, therefore this speech processing scheme may be preferable in terms of the end-to-end delay over the ATM network. In the case of no-priority processing at a network node, we consider that an ATM network i s a simple ten-node tandem network of queuing systems with an independent process. Assuming that ATM cells are statistically discarded at a output buffer of the node and that the buffer size K is sufficiently small, we apply the queue MIDIlIK model.[gl Fig.7 shows the probability of cell loss as a function of the offered load using the MIDIlIK model. Supposing that a cell loss rate up t o 3% ( ~ 3 x 1 0 - 2 )can be accepted for speech communications by the proposed speech processing scheme, we obtain the high rate offered load as: p = 0.81 when K = 10 and p = 0.92 when K = 20. 5.3 Aspects of PCM Speech Services

We covered 32-kbit/s ADPCM-coded speech for ATM cells in this paper. However the application of regular 64-kbitfs

49.6.3.
1517

PCM speech was also considered. We studied and developed the PCM cell reconstruction of speech processing in the same custom LSI. For PCM speech processing we applied a new cell loss reconstruction algorithm based on the waveform matching method. We confirmed that this PCM cell reconstruction could provide good speech quality u p t o a cell loss rate o f about 8% using a speech information of 32 octets which corresponds t o PCM speech blocks of 4 ms.
6.

REFERENCE [ l ] CCITT Recommendation 1.121, "Broadband Aspects of ISDN", Temporary Document 140, June 1988. [2] J.F. Lynch Jr., J.G. Josenhans and R.E. Crochiere, "Speech/Silence Segmentation for Real-Time Coding via Rule based Adaptive Endpoint Detection", IEEE ICASSP'87 Proceedings, pp. 1348-1351, 1987. (31 Y.Shoji, 0.Noguchi and T.Suzuki, "Development of High Performance DCMS w i t h 3 - b i t and 4-bit Coding ADPCM" IEEE ICC '88 Proceedings, pp.1598-1602, 1988. 141 D.J.Goodman, O.G.Jaffe, G.B.Lockhart and W.C. Wong, "Waveform Substitution Techniques for Recovering Missing Speech Segments i n Packet V o i c e Communications", IEEE ICASSP '86 Proceedings, pp. 105-108, 1986. [SI L.R.Rabiner and R.W. Schafer, "Digital Processing o f Speech Signals", Prentice-Hall, 1978. [61 H.Ando, O.Noguchi, R.Miyamoto, S.Tsukagoshi and N. Yonekura,"DSP LSI Configuration t o Implement and Advanced ADPCM Scheme," I E E E ISCAS '87 Proceedings, pp.919-922, 1987. [71 N.S.Jayant and P.Noll, "Digital Coding of Waveforms," Prentice-Hall, PP.658-665, 1984. 181 D. 0. 8owker and C. A. Dvorak,"Speech Transmission Quality of Wideband Packet Technology", I E E E GLOBECOM '87 Proceedings, pp. 1887-1889, 1987. [9] K.Sriram and W.Whitt "Characterizing Superposition Arrival Processes in Packet Multiplexers for Voice and Data", IEEE J. SAC, SAC-4, 6, pp.833-846, 1986.

CONCLUSION

In this paper we have proposed a new speech processing scheme for an ATM network. As a result of the subjective evaluation tests, we have confirmed that good speech quality for speech communications was obtained up t o a cell loss rate of about 3% for ADPCM speech and that of about 8% for PCM speech. Features o f ADPCM cell reconstruction and feasibility for speech communications were discussed. ACKNOWLEDGEMENT The authors would like t o thank Dr. Atsushi Fukasawa, General Manager of Digital Communications Laboratories, for his encouragement and support. We would like t o thank Mr. Shosaku Tsukagoshi, General Manager of System VLSl R & D Dep. B, for his help i n the development of the LSls. Thanks are also due t o our colleagues f o r their cooperation during the subjective evaluation tests.

nrAnalog 2-wire Analog Telephone


EC

"

ATM Interface (Adaptation Layer)


/

ATM Network
\ /

Speech Processing Unit (SPU)

ATM-SW

r--------------7

I
I I I
1

ATM CELL ASM

I-

I-

ATM CELL DASM

I \

SLIC : Subscriber Line Interface EC : Echo Canceller

Fig. 1

Functional Block Configuration for ATM Interface

49.6.4.
1518

Fig. 2

Signal Processing LSI (LSI-I, LSI-2 f o r SPU)


e 0 y

25

c
,

Fig.5 Evaluation Equipment (Hardware Simulator for ATM Interface)

........

Pitch Estimation (ADPCM) Zero Substitution (ADPCM)

o o x-.

Pitch Estimation (ADPCM)

20
h

..

-. Zero Substitution (ADPCM) ......-. Zero Substitution (PCM)


0 ADPCM Speech with No Cell Loss

m
w

5
aI

15

&
VI a I

10

Cell Length : 32 Octets ADPCM Speech Cell : 8 ms


l 1.0
I I ,,,I 5.0 10.0 Cell Loss Rate (%)

Cell Loss Rate (%)

2.0

Fig.3

Segmental S/N vs. Cell Loss Rate (G.721 ADPCM)


0 0
y

Fig.6

Relative Opinion Score vs. Cell Loss Rate (Advanced ADPCM)

........

Pitch Estimation (ADPCM) Zero Substitution (ADPCM)

0 No Cell Loss

20

Cell Length : 32 Octets ADPCM Speech Cell :8 ms Offered Load Cell Loss Rate (%)

3
m
15

5 m
w

10

VI

Fig.4

Segmental S/N vs. Cell Loss Rate (Advanced ADPCM)

Fig.7

Probability o f Cell Loss vs. Offered Load (M/D/I/K Model)

49.6.5.
1519

Вам также может понравиться