DCT Versus DFT

Speech Communication 24 (1998) 249±257
1
Noisy speech enhancement using discrete cosine transform
a,*
Ing Yann Soon , Soo Ngee Koh a, Chai Kiat Yeo b
a
School of Electrical and Electronic Engineering, Nanyang Technological University, Block S2, Nanyang Avenue, Singapore 639798,
Singapore
b
School of Applied Science, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore
Received 19 June 1996; received in revised form 3 November 1997; accepted 30 March 1998
Abstract
This paper illustrates the advantages of using the Discrete Cosine Transform (DCT) as compared to the standard
Discrete Fourier Transform (DFT) for the purpose of removing noise embedded in a speech signal. The derivation
of the Minimum Mean Square Error (MMSE) ®lter based on the statistical modelling of the DCT coecients is shown.
Also shown is the derivation of an over-attenuation factor based on the fact that speech energy is not always present in
the noisy signal at all times or in all coecients. This over-attenuation factor is useful in suppressing any musical re-
sidual noise which may be present. The proposed methods are evaluated against the noise reduction ®lter proposed by
Y. Ephraim and D. Malah (1984), using both Gaussian distributed white noise as well as recorded fan noise, with fa-
vourable results. Ó 1998 Elsevier Science B.V. All rights reserved.
ReÂsumeÂ
Cet article illustre les avantages apportes par l'utilisation de la Transformation Cosinus Discrete (DCT) par rapport

a celle de la Transformee de Fourier Discrete (DFT) standard, pour le debruitage de la parole bruitee. On montre com-
ment deriver un ®ltre MMSE a partir de la modelisation statistique des coecients DCT. On montre egalement com-
ment deriver un facteur de sur-attenuation base sur le fait que, dans les signaux bruites, l'energie de la parole n'est pas
toujours presente a chaque instant ni dans chaque coecient. Ce facteur de sur-attenuation est utile pour supprimer
tout bruit residuel musical. Les methods proposees ont ete evaluees favorablement par rapport du ®ltre de reduction
de bruit propose par Ephraim et Malah (1994), en utilisant tant du bruit blanc guassien que du bruit de ventilateur
enregistre Ó 1998 Elsevier Science B.V. All rights reserved.
Keywords: Speech enhancement; MMSE amplitude estimation; Noise removal; Discrete cosine transform (DCT)
1. Introduction systems operating in noisy environments. As the

presence of noise degrades the performance of
It is often necessary to perform speech enhance- speech coders and voice recognition systems [1,2],
ment through noise removal in speech processing it is therefore common to incorporate speech en-
hancement as a preprocessing step in these sys-
tems. The other important application of speech
*
Corresponding author. E-mail: eiysoon@ntu.edu.sg.
enhancement is to improve the perceptual quality
1
Speech ®les available. See http://www.elsevier.nl/locate/spec- of speech in order to reduce listener's fatigue.
om. The additive noise may be due to the noisy envi-
0167-6393/98/$19.00 Ó 1998 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 7 - 6 3 9 3 ( 9 8 ) 0 0 0 1 9 - 3
250 I.Y. Soon et al. / Speech Communication 24 (1998) 249±257
ronment in which the speaker is speaking, or it 2. DCT versus DFT

may arise from noise in the transmission media.
The latter problem is especially apparent in ana- DCT [8] is widely used in image compression
logue mobile communications. because of its excellent energy compaction proper-
The topic on speech enhancement is widely re- ty. This is a useful feature for noise removal pur-
searched and many speech enhancement algo- pose too. If the speech energy can be
rithms [1±7] make use of the Discrete Fourier concentrated predominantly into a few coecients
Transform (DFT) to make it easier to remove while the noise energy remains white, reduction of
noise embedded in the noisy speech signal. This noise can be achieved easily. As it was shown in
is often done as it is easier to separate the speech previous work on speech coding [9], DCT provides
energy and the noise energy in the transform do- signi®cantly higher energy compaction as com-
main. For example, the energy of white noise is pared to the DFT. In fact, its performance is very
uniformly spread throughout the entire spectrum, close to the optimum Karhunen Loeve Transform
but the energy of speech, especially voiced speech, (KLT). The same result is also obtained for auto-
is concentrated in certain frequencies. It will be regressive models of speech signals. Although the
shown in this paper, that the Discrete Cosine KLT is optimum in energy compaction and it also
Transform (DCT) outperforms the DFT in terms has been applied successfully to suppress noise in
of speech energy compaction. [10], it is not popularly used. This is because there
Furthermore, most of these algorithms only is no fast transform methods possible for KLT,
attempt to modify the spectral amplitudes of leading to high computational requirements.
the noise corrupted speech signal in order to re- Therefore, DCT which is a very good approxima-
duce the eect of the noise component while tion to KLT for speech signals is used instead.
leaving the noise corrupted phase information in- A simple experiment is devised to illustrate the
tact. It is of interest to note that in [3], the best superior energy compaction of DCT versus DFT.
estimate of the phase of the speech component A clean speech is ®rst divided into frames with
was shown to be the phase of the corrupted sig- 50% overlapping, and the transform is performed.
nal itself. Hence the advantage of using a real The transformed coecients are then sorted ac-
transform, such as the DCT considered in this cording to their magnitude and the n coecients
paper, is that the problem of not correcting for with the lowest energy set to zero. The speech is
the phase will result in less severe consequences. then reconstructed using the weighted overlap
More discussions on this can be found in Sec- add technique [11]. The Mean Square Error
tion 2. (MSE) is computed and plotted against n for both
In this paper, the amplitude estimator for the the DFT and DCT in Fig. 1. It can be clearly seen
DCT is obtained based on the assumption that that the DCT provides better energy compaction.
both the noise and the original speech signal am- Also, the DCT has the added advantage of higher
plitudes can be modeled by zero mean Gaussian spectral resolution than the DFT for the same win-
distributed random variables in the transform do- dow size. For a window size of N, the DCT has N
main. This assumption is supported by the Central independent spectral components while the DFT
Limit Theorem as each transform coecient is just only produces N/2 + 1 independent spectral com-
a weighted sum of the speech samples. An im- ponents, as the other components are just complex
proved amplitude estimator is also obtained by conjugates. This is yet another point in favour of
taking into account that speech is not always the use of the DCT.
present in the noisy signal. This is obtained by in- Lastly, as mentioned earlier in the introduction,
corporating a self adaptive estimator of the proba- techniques using the DFT only attempt to correct
bility of speech presence based on the proposed the noisy amplitude but not the phase component.
statistical model. This improvement helps to re- This actually results in an upper bound on the max-
duce the residual noise present, which is musical imum improvement in SNR possible. If DCT is
in nature. used, a higher upper bound is possible. The eect
I.Y. Soon et al. / Speech Communication 24 (1998) 249±257 251
Fig. 1. Comparison of energy compaction.
of non-corrected phase on the speech is discussed in To determine the upper bound of short-time
some detail in [12]. In [12], it was noted that if the amplitude estimation using the DFT as compared
phase is out by more than p=8, the speech becomes to using DCT, another experiment is carried out.
rough. If the phase is replaced by random noise In this experiment, Gaussian distributed white
uniformly distributed between ÿp to p, a rough noise is added to a clean speech, which is then di-
and completely unvoiced speech is obtained. On vided into 50% overlapping frames. The DFT is
the other hand, if the phase is replaced by zero, performed on the frames and the magnitude of
the reconstructed speech sounds completely voiced the noisy frequency component is replaced by the
and monotonous. Therefore it is not correct to view minimum of the magnitudes of the clean frequency
the phase as totally unimportant and especially for component and noisy frequency component. The
high levels of additive noise, the reconstructed reason behind this is that all transform based noise
speech quality will be aected. For DCT, the coef- suppression are basically attenuation schemes. If
®cients are real and can be considered to have a bi- the presence of noise results in a frequency compo-
nary phase value. The phase will depend only on nent having a lower magnitude, it is unlikely that
the sign of the coecient. This provides a better de- the noise suppression ®lter can increase the magni-
gree of noise margin as unless the added noise tude. The best estimate possible would be the noisy
changes the sign of the coecient, the phase is un- magnitude. However, if noise causes the amplitude
changed. Therefore if strong speech energy is pres- of the frequency component to be higher, an ideal
ent in a particular coecient, it is unlikely that the ®lter should lower the amplitude to the original
phase will be corrupted. If the noise energy is much speech amplitude. On the other hand, the noisy
higher than the speech energy in a particular coef- phase components are left unchanged. The ideally
®cient resulting in an erroneous phase, the coe- ®lter magnitude is then combined with the noisy
cient will be highly attenuated thus minimizing phase components to obtain the ideal ®ltered fre-
the eect of the erroneous phase. It is therefore like- quency component. The optimal reconstructed
ly that DCT would perform better than DFT. speech signal is then reconstructed using the
weighted overlap add technique [11]. The process N ÿ1

X p2n 1k
is repeated for dierent levels of noise added. X k ak xn cos ;
For DCT, a similar process is carried out. In- n0
2N
stead of separating the frequency component into 0 6 k 6 N ÿ 1; 1
magnitude and phase as in the case of DFT, the where
frequency component is separated into sign and r r
magnitude. The sign portion will be either positive 1 2
a0 ; ak for 1 6 k 6 N ÿ 1:
or negative. The magnitude of the noisy frequency N N
component is also replaced by the minimum of 2
the magnitudes of the clean and noisy frequency
components. Then the magnitude is recombined Inverse transformation is given by
with the sign portion to obtain the optimal recon- N ÿ1
X
p2n 1k

structed speech using a similar scheme as here xn akX k cos ;
k0
2N
above.
The results can be seen in Fig. 2 which shows 0 6 n 6 N ÿ 1: 3
the global SNR of the reconstructed speech using More details about the DCT and its fast imple-
DCT and DFT versus the global SNR of the noisy mentation can be found in [8,13,14].
speech. The plots are approximately linear and it
clearly shows that using DCT results in a higher
upper bound for speech enhancement than using 3. Minimum mean square error ®lter for DCT
the DFT.
The one dimensional DCT is given as follows: Let the clean speech signal, noisy speech signal
Forward transform of a sequence and the noise signal be denoted by xt, yt and
fxn; 0 6 n 6 N ÿ 1g is given by nt, respectively, and yt be given as follows:
Fig. 2. Upper limits of enhancement using DCT and DFT.

yt xt nt: 4 known in this paper. Methods for estimating kn are
covered in some details in [4,5]. However, an accu-
Also, let the transformed signals of the clean rate estimation of kx is more dicult to achieve.
speech, noisy speech and noise be denoted by This paper uses the approach known as Decision
X k, Y k and N k, respectively, where k denotes Directed Estimation developed by Ephraim and
the position of the coecient in the transform do- Malah [3] to estimate kx . The superiority of this es-
main. With the assumption that the DCT trans- timator is covered in some detail in [15]. The esti-
formed coecients are statistically independent, mate k^x for kx is given by the following equation:
the Minimum Mean Square Error (MMSE) esti-
k^x k ak^x kp 1 ÿ a maxfY k ÿ kn k; 0g;
2
mated amplitude X^ k can be obtained from Y k
as follows: 11
X^ k EfX k j Y kg; 5
where maxf g is the maximum function used to
where Ef g denotes the expectation operator. ensure that a non-negative value is obtained as
Eq. (5) can be rewritten, using Bayes' theorem, as an estimate. k^x kp is the estimated value of kx in
R1 the previous frame, while a is a constant which
ak pfY k j ak gpfak g dak can be adjusted to achieve the best result.
^
X k Rÿ1
1 ; 6
ÿ1
pfY k j ak gpfak g dak The value of a is set to 0.98 in the computer
where pf g denotes the probability density func- simulations of the ®lters. Smaller values of a (e.g.
tion (PDF), and ak is a dummy variable represent- 0.8) are found to result in a higher level of musical
ing all possible values of X(k). tone in the residual noise. On the other hand, if a is
Under the Gaussian distribution assumptions, set to 1, severe distortions in the speech signals
pfY k j ak g and pfak g are given by the following were heard. This observation agrees with that in
equations: [3]. The eect of varying a is discussed in detail
( ) in [16], which states that the value of a has to be
2
1 Y k ÿ ak greater than 0.9 in order to counter the musical
pfY k j ak g p exp ÿ ; noise eect and 0.98 is considered a reasonable val-
2pkn k 2kn k
ue for a. The same value of a is used in [15].
7
ÿa2k

1
pfak g p exp ; 8 4. Further reduction of residual noise using uncer-
2pkx k 2kx k
tainty of signal presence
where kx k EfjX kj2 g and kn k EfjN kj2 g.
Substituting Eqs. (7) and (8) into Eq. (6), X^ k The derivation of the MMSE ®lter in Section 3
can be easily shown to be given by is based on the assumption that speech signal ener-
gy is always present in the sampled speech data.
nk However, it should be noted that even in the pres-
X^ k Y k; 9
nk 1 ence of speech, the signal energy is unlikely to be
where signi®cant for all the transform coecients. Insig-
ni®cant signal energy can be treated as absence of
kx k
nk : 10 speech. Furthermore, actual speech data also con-
kn k sist of periods of silence. Both [3] and [5] have ta-
nk is known as the a priori SNR by some au- ken note of this and modi®ed their ®lters
thors. The above derivation shows that the Wiener accordingly. It was emphasized in [5] that most
®lter is the MMSE amplitude estimator for the real noise ®ltering algorithms are inadequate when
transform case. speech is absent, hence additional attenuation
To use the above formula, both kn and kx have should be applied during periods in which speech
to be known. The value of kn will be assumed to be is absent. The residual noise of commonly used
spectral subtraction, power subtraction and other The conditional probability derived above is
algorithms tends to be musical in nature, and is used to further attenuate the estimated ampli-
considered to be very annoying to some users [6]. tude.
Using further attenuation as suggested by this sec-
tion helps to reduce the residual noise.
One means of doing so is to scale the ®lter out-
5. Results and discussions
put down by the conditional probability of speech
present given the received spectral amplitude,
A total of eight sets of speech data taken from
Y(k). Let the input be represented by two states,
the TIMIT database are used in our evaluation.
H0 and H1 , where
Four of the sentences are spoken by female speak-
H0 : speech absent; ers while the remaining sentences are by male
H1 : speech present: speakers. The duration of the sentences ranges
from 5±10 s. The speech data used are sampled
The modi®ed ®lter output, Ak, taking into ac- at 8 kHz and quantized to 16 bits.
count the conditional probability of speech pres- The proposed enhancement algorithm are tes-
ence given Y k, is then given by ted on the speech data corrupted by two dierent
Ak P H1 jY kX^ k: 12 types of additive noise. The ®rst type of noise is
the widely used Gaussian white noise. It was re-
The approach is logical, since when the value of ported that this type of noise is more dicult to re-
P H1 jY k approaches one, the ®lter will revert move than any other noise source. Attempts at
back to the original ®lter. While when removing white noise usually produces an irritat-
P H1 jY k approaches zero, the ®lter will produce ing residual noise. The second type of noise added
a zero output. Using Bayes' theorem, the condi- to the speeches were recorded fan noise.
tional probability is given as follows: The noisy speech data are then divided into
P H1 P Y kjH1 frames each of which consists of 256 samples with
P H1 jY k :
P H1 P Y kjH1 P H0 P Y kjH0 an overlap of 192 samples with the neighbouring
13 frame. Hanning windowing is then performed on
each frame before it is enhanced individually.
The conditional probabilities P Y kjH1 and The ®nal enhanced speech is reconstructed from
P Y kjH0 can be obtained from the Gaussian sta- the enhanced frames using the weighted overlap
tistical model. and add technique [11]. The overall block diagram
of the ®lter is shown in Fig. 3.
!
2
1 Y k
P Y kjH0 p exp ÿ ; 14 The segmental signal to noise ratio (SEGSNR)
2pkn k 2kn k
is used as an objective test method int the evalua-
tion of the speech enhancement schemes.
1
P Y kjH1 p SEGSNR is chosen over other objective measures
2pkx k kn k such as Signal to Noise Ratio (SNR) and MSE, be-
!
Y k2 cause it seems to have better agreement with listen-
exp ÿ : 15 ing tests. The SNR and MSE measures are
2kn k kx k
normally dominated by the higher energy voice
If an assumption is made that the probabilities portion. On the other hand, the use of SEGSNR
of H0 and H1 are approximately equal, Eq. (13) which is the average of all the SNR in non-over-
can be simpli®ed to the following: lapping segments gives the unvoiced portion its
1 proper weighting.
P H1 jY k p :
ÿnkY k2
1 exp 2kn kkx k 1 nk 1X r2fx
SEGSNR 10 log10 2 ; 17
16 n n rf xÿ^x
Fig. 3. Overall block diagram of noise reduction ®lter.
where n is the total number of non-silence frames. A study of the results show that DCTF outper-
The classi®cation of the frames is performed man- forms EMF in the objective test for all except one
ually using the clean speech. particular speech with low added noise power. The
Both the DCT based speech enhancement ®lters improvement in SEGSNR is signi®cant especially
described in Sections 3 and 4 are implemented and for input SNR smaller than 0 dB. The superior
their results are compared with those of Ephraim performance is also more noticeable for actual re-
and Malah ®lter (EMF) [3]. The DCT based Wie- corded fan noise than white Gaussian noise. This
ner ®lter based on Section 3 will hereafter be is to be expected since the superior energy compac-
known as DCTF while that which is based on Sec- tion property applies to both noise and speech. If
tion 4 will be known as DCTF2. The results are the noise is also concentrated in fewer coecients,
given in Tables 1 and 2. it is also easier to suppress.
Table 1
Segmental SNR tests for white noise corrupted speech
Segmental SNR (dB)
White noise added speech Unprocessed EMF DCTF DCTF2
fd 6.271 11.927 11.817 11.270
fb 2.897 8.614 9.102 8.739
fa 1.772 7.708 8.077 7.52
fc 1.182 8.06 9.407 9.483
mb 3.744 7.78 7.824 7.357
ma )2.735 3.28 4.032 3.849
mc )5.02 1.874 2.438 2.241
md )10.166 )0.0686 1.929 2.087
Table 2
Segmental SNR tests for fan noise corrupted speech
Segmental SNR (dB)
Fan noise added speech Unprocessed EMF DCTF DCTF2
fa )1.045 11.342 13.688 13.318
fd )2.782 10.337 12.930 8.739
mc )13.76 )0.5062 3.373 3.592
md )15.309 )2.737 1.941 2.449
fc )15.631 )1.057 5.103 6.188
mb )19.176 )4.681 0.867 1.728
fb )20.08 )5.101 )1.566 2.749
ma )21.994 )6.99 )0.04 0.95
Fig. 4. Clean speech fb.
Fig. 5. Fan noise corrupted speech.
Fig. 6. EMF restored speech.
The results based on SEGSNR shows that both compared to DCTF or EMF. However, dierences
the DCTF and DCTF2 outperform EMF, espe- between DCTF and DCTF2 are less noticeable for
cially for the case of fan noise. However, for clean- the added fan noise.
er speeches, DCTF is better, while for very noisy Fig. 4 illustrates the original clean speech seg-
speech, the additional attenuation provided by ment from the ®le fb. Fig. 5 shows the same speech
DCTF2 gives a better result. EMF does not pro- segment with fan noise added. Figs. 6±8 show the
duce any musical tones in the residual noise which results of enhancement using the dierent algo-
is approximately white. However, the remaining rithms (EMF, DCTF, DCTF2). The y axis of
background noise is still signi®cant. The output Figs. 4±8 are the amplitudes of the signal while
from the DCTF has signi®cantly less residual noise the x axis units are time in seconds. The audio®les 2
than that produced by the EMF. This is especially http://www.elsevier.nl/locate/specom. 2 are made
the case for high noise situations. However, the re- available for listening. The audio®les have been
sidual noise using the DCTF sounds slightly less converted to 8 bit mulaw sun format from the
uniform. DCTF2 introduces slightly more distor- original 16 bit raw form.
tion for high SNR speech, but as the noise power
increases, it becomes superior to the DCTF. Gen-
erally, speech processed using DCTF2 sounds
much cleaner for higher degree of white noise, as 2
See http://www.elsevier.nl/locate/specom.
Fig. 7. DCTF restored speech.
Fig. 8. DCTF2 restored speech.
6. Conclusions [6] M. Berouti, R. Schwartz, J. Makhoul, Enhancement of

speech corrupted by acoustic noise, IEEE Proc. ICASSP,
Vol. 1, April 1979, pp. 208±211.
This paper clearly shows the feasibility of using [7] E. Munday, Noise reduction using frequency-domain non-
the DCT for noise reduction. The use of DCT linear processing for the enhancement of speech, Br.
based ®lters generates better results as compared Telecom. Technol. J. 6 (2) (1988) 71±83.
to a more complex technique by Ephraim and Ma- [8] N. Ahmed, T. Natarajan, K.R. Rao, Discrete cosine
lah. The improvement is also more noticeable for transform, IEEE Trans. Comput. C-23 (1974) 90±93.
[9] R. Zelinski, P. Noll, Adaptive transform coding of speech
fan noise than for white noise. Although the algo- signals, IEEE Trans. Acoust. Speech and Signal Process. 25
rithm is simulated with the DCT, it should also be (1997) 299±309.
applicable to other types of real transform such as [10] Y. Ephraim, D. Malah, A signal subspace approach for
the Wavelet transform. speech enhancement, IEEE Trans. Speech and Audio
Process. 3 (1995) 251±266.
[11] R.E. Crochiere, A weighted overlap-add method of short
References time Fourier analysis/synthesis, IEEE Trans. Acoust.
Speech Signal Process. ASSP-28 (1980) 99±102.
[1] M.R. Sambur, N.S. Jayant, LPC analysis/synthesis from [12] P. Vary, Noise suppression by spectral magnitude estima-
speech inputs containing white noise or additive white tion ± Mechanism and theoretical limits, Signal Processing
noise, IEEE Trans. Acoust. Speech Signal Process. ASSP- 8 (1985) 387±400.
24 (1976) 484±494. [13] W.H. Chen, C.H. Smith, S.C. Fralick, A fast computa-
[2] B.H. Juang, Recent developments in speech recognition tional algorithm for the discrete cosine transform, IEEE
under adverse conditions, Proceedings of the International Trans. Commun. COM-25 (1977) 1004±1009.
Conference on Spoken Language Process, November 1990, [14] M.J. Narasimha, A.M. Peterson, On the computation of
Kobe, Japan, pp. 1113±1116. the discrete cosine transform, IEEE Trans. Commun.
[3] Y. Ephraim, D. Malah, Speech enhancement using a COM-26 (6) (1978) 934±936.
minimum mean-square error short-time spectral amplitude [15] P. Scalart, J. Vieira Filho, Speech enhancement based on a
estimator, IEEE Trans. Acoust. Speech Signal Process. a priori signal to noise estimation, Proceedings ICASSP,
ASSP-32 (1984) 1109±1121. Vol. 2, 1996, pp. 629±632.
[4] S.F. Boll, Suppression of acoustic noise in speech using [16] O. Cappe, Elimination of the musical noise phenomenon
spectral subtraction, IEEE Trans. Acoust. Speech Signal with the Ephraim and Malah noise suppressor, IEEE
Process. ASSP-27 (1979) 113±120. Trans. Speech and Audio Process. 2 (2) (1994) 345±349.
[5] R.J. McAulay, M.L. Malpass, Speech enhancement using a
soft-decision noise suppression ®lter, IEEE Trans. Acoust.
Speech Signal Process. ASSP-28 (1980) 137±145.

DCT Versus DFT

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DCT Versus DFT

Загружено:

Авторское право:

Доступные форматы

Speech Communication 24 (1998) 249±257

1. Introduction systems operating in noisy environments. As the

ronment in which the speaker is speaking, or it 2. DCT versus DFT

Fig. 1. Comparison of energy compaction.

weighted overlap add technique [11]. The process N ÿ1

Fig. 2. Upper limits of enhancement using DCT and DFT.

Fig. 3. Overall block diagram of noise reduction ®lter.

Fig. 4. Clean speech fb.

Fig. 5. Fan noise corrupted speech.

Fig. 6. EMF restored speech.

Fig. 7. DCTF restored speech.

Fig. 8. DCTF2 restored speech.

6. Conclusions [6] M. Berouti, R. Schwartz, J. Makhoul, Enhancement of

Вам также может понравиться

DCT Versus DFT

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DCT Versus DFT

Загружено:

Авторское право:

Доступные форматы

Speech Communication 24 (1998) 249±257

1. Introduction systems operating in noisy environments. As the

ronment in which the speaker is speaking, or it 2. DCT versus DFT

Fig. 1. Comparison of energy compaction.

weighted overlap add technique [11]. The process N ÿ1  

Fig. 2. Upper limits of enhancement using DCT and DFT.

Fig. 3. Overall block diagram of noise reduction ®lter.

Fig. 4. Clean speech fb.

Fig. 5. Fan noise corrupted speech.

Fig. 6. EMF restored speech.

Fig. 7. DCTF restored speech.

Fig. 8. DCTF2 restored speech.

6. Conclusions [6] M. Berouti, R. Schwartz, J. Makhoul, Enhancement of

Вам также может понравиться

weighted overlap add technique [11]. The process N ÿ1