Vlsi Implementation of Temporal and Transform Domain Speech Enhancement Algorithms

Annual Report on
VLSI IMPLEMENTATION OF TEMPORAL AND TRANSFORM

DOMAIN SPEECH ENHANCEMENT ALGORITHMS
Suman Samui
(Roll no: 13AT91R03)
UNDER THE GUIDANCE OF
Dr. Indrajit Chakrabarti
(Dept of E&ECE)
&
Prof. Soumya Kanti Ghosh
(School of Information Technology)
Advanced Technology Development Centre

IIT Kharagpur
July 2014
Brief Outline of the work done during the year 2013-14

I have joined in the Advanced Technology Development Centre as an Institute Research Scholar on 11th July 2013
under the guidance of Dr. Indrajit Chakrabarti and Prof. Soumya Kanti Ghosh. My Ph.D. research work is on VLSI
implementation of various temporal and transform domain speech enhancement algorithms.
COURSE WORK: - I have completed my course work (as decided by the honourable DSC) successfully. A list of
subjects credited by me during my course work is as follows.
Subject Code
Subject
Grade Obtained
IT60116
Advanced Topics in Speech Processing
EX
IT60108
Soft Computing Applications
CS60052
Architectural Design of ICs
HS63002
English For Technical Writing
Objective of the work:

The main objectives of the work are to attenuate the noise component of a noisy speech in order to enhance
the quality of the processed speech using some transform-domain filtering algorithms and methods and to
carry out a comprehensive evaluation and comparison of their performances and also to implement novel
VLSI architectures of the functional blocks of those speech enhancement algorithms for portable devices like
digital hearing aid.
Summary of the Report

The literature review of various speech enhancement algorithms both in temporal and transform domain is
presented in this report. The strengths and weakness of different transforms such as, discrete Fourier
transform (DFT), discrete cosine transform (DCT), Karhunen-Loeve transform (KLT) and wavelet transform
(WT) are discussed intuitively and axiomatically.
Rest of the report discusses on our proposed multi-band complex spectral subtraction technique for nonstationary noise reduction for digital hearing aid system.
The performance of digital hearing aid depends to a great extent on the efficacy of noise reduction
mechanism adopted. One of the major problems of modern digital hearing aid is that when signal to noise
ratio (SNR) of the input signal becomes very low, the perceived signal quality degrades severely with an
additional annoying artefact known as musical noise. Although spectral subtraction method has gained
popularity due to its simplicity and ease of implementation, an important limitation of conventional spectral
subtraction happens to be stationary noise assumption. However, the real-life noise is non-stationary and
does not affect the speech spectrum uniformly. Moreover, in most of the spectral subtraction algorithms, the
phase of the estimated speech is assumed to be same as that of the noisy signal. Nevertheless, phase plays a
significant role in signal reconstruction, particularly when SNR is low. In this paper, we have proposed a
multiband complex spectral subtraction algorithm assuming non-stationary noisy environment. In the
proposed algorithm, which can dynamically adapt itself to varying levels of noise, the phase of the clean
speech signal is estimated from the estimated magnitude spectrum of noise and speech extracted from
generalised spectral subtraction algorithm. Experimental results show that our algorithm yields better
performance in terms of several objective and subjective measures like perceptual evaluation of speech
quality (PESQ), log-likelihood ratio (LLR) and Itakura-Saito (IS) distance while compared to previously
proposed spectral subtraction methods.
Keywords: Digital hearing aid, multi-band complex spectral subtraction, PESQ, LLR, IS, musical noise
CONTENTS
CHAPTER 1 INTRODUCTION.............................................................................. 1
1.1 Motivation ........................................................................................................... 1
1.2 Objectives ............................................................................................................ 1
1.3 Literature Survey ................................................................................................. 2
CHAPTER 2 MULTI-BAND COMPLEX SPECTRAL SUBTRATION............... 7
2.1 Introduction ......................................................................................................... 7
2.2 Problem Statement and Proposed Algorithm ...................................................... 8
2.3 Implementation ................................................................................................. 11
2.4 Experimental Results ......................................................................................... 12
2.5 Future work ....................................................................................................... 14
REFERENCES....................................................................................................... 15
CHAPTER 1
INTRODUCTION
1.1 Motivation
Hearing impairment is undoubtedly a serious issue. Throughout the world, the number of people, suffering
from hearing loss, goes on increasing rapidly. During the last few decades, the design of an effective hearing
aid turned out to be a challenging problem [1]. Despite the tremendous endeavour in hearing aid
technology in terms of smaller size, processing speed and low power consumption ,the most distressing
fact is that only 60-70 percentage of hearing aid active users are satisfied with their devices [2].
The performance of hearing aid degrades severely in the presence of acoustic back-ground noise. Noise
reduction algorithms must be incorporated in the hearing aid device in order to enhance the quality of
perceived speech signal without compromising the intelligibility of speech. During the long history of speech
enhancement, numerous schemes had been proposed. A broad classification would be to divide the
approaches into time domain approach and transform domain approach. Filtering performed directly on the
time sequences includes techniques such as LPC based digital filtering [3], Kalman filtering [4] as well as
Hidden Markov model (HMM) based filtering [5]. In transform domain approach, noise attenuation is
performed on the transform co-efficient. The transforms used can be discrete Fourier transform (DFT),
discrete cosine transform (DCT) [6], Karhunen Loeve transform (KLT) or even wavelet transform (WT).
Though transform domain approaches suffer from annoying artefact called residual musical tones, these
methods seems to be more popular amongst researchers.
1.2
Objectives
While current hearing aids offer intelligibility improvements in some environments, in complex noisy
situations, where the potential benefit to users is the greatest, the potential is largely unrealized. Feedback
control and noise reduction performance have been identified as two performance bottlenecks. Hearing aids
have recently made the shift from analog to digital devices this has enabled the introduction of digital
signal processing algorithms for feedback and noise control. However, the traditional speech processing
algorithms employed are largely designed for telephony applications and the assumptions invoked in their
development do not reflect the reality of current and future hearing aids. So our aim is to improve the state of
the art of speech enhancement for hearing aids by developing algorithms that take into account the
challenges faced in real environments.
Speech enhancement is a continuously evolving research field with a wide variety of existing techniques.
Moreover, no current speech enhancement algorithm is universally recognized as the "best" solution in every
aspect and every context.
Practically speaking, each method has its strengths and weaknesses, and in general enhancement algorithms
can be characterized by:
The amount of noise reduction
The amount of distortion (or "damage") inflicted to the speech at the output of the enhancer
The effect of the algorithm on intelligibility
The amount and the nature of artefacts introduced in the enhanced speech
The flexibility of the method (i.e., for what range of noises or speakers will it perform as intended, and
conversely, when will it fail?)
The computational complexity of the enhancement system.
In general, one algorithm cannot excel simultaneously in all of the above categories; specifically, it is well
known that one of the central issue in speech enhancement is centred around the trade-off between the
amount of noise reduction and the output naturalness.
Our objective will be to design a speech enhancement algorithm to handle real-world situations. To be more
precise, real-world situations means realistic noisy speech conditions which are characterized by the
following:
Non-stationary noise
Low signal to noise ratio
Reduced intelligibility this means the nature and/or level of the speech or noise adversely affecting the
decipherability of spoken words or sentences.
1.3 Literature Survey

1.3.1 Speech Enhancement for Hearing Aids
The movement from analog to digital signal processing combined with the increasing speed and decreasing
power consumption of processors has enabled rapid improvements in hearing aid technology [7]. Rather than
providing simple amplification, the device performs multi-band amplification including dynamic range
compression to compensate for the "recruitment phenomenon" associated with sensory-neural hearing loss
[8]. The recruitment phenomenon results in the perceived loudness curve of a hearing impaired listener not
being a linear shift of a normal listener. While soft sounds may sound attenuated or even fall below the
impaired listener's perception threshold, loud sounds are perceived at the same volume as a normal hearing
listener. Simple amplification without considering the recruitment phenomenon can lead to loud sounds
being pushed above the uncomfortable level. The use of digital hearing aids also enables sophisticated noise
processing algorithms and more robust identification and compensation of feedback. Furthermore, signal and
environment classification and control logic can be to determine appropriate settings and modes of operation;
for example a user listening to music would wish to have noise reduction disabled to prevent any possible
distortion.
2
Fig 1.1: The basic block diagram of a typical filter bank based digital hearing aid.
The basic block diagram of a typical filter bank based digital hearing aid has been shown in Fig.1.1
[9]. The continuous audio signal, collected through the microphone, is converted to digital signal by ADC
and then it passes through acoustic feedback cancellation block (AFC) to reduce the effect of feedback
signal on the current input signal samples. Then, the feedback-free signal is decomposed into a specific
number of frequency sub-bands and passed through noise reduction (NR) block. This NR block attenuates
the background noise and improves the quality of speech.
1.3.2 Time Domain Methods versus Frequency Domain Methods

Speech enhancement can be performed in both time domain and frequency domain. Time domain techniques
include FIR filtering, IIR filtering, LPC filtering, LP residual enhancement, Kalman filtering, HMM, etc. The
transform domain methods are techniques in which a transformation is first performed on the noisy speech
before filtering, followed by the corresponding inverse transformation in order to restore the original speech.
The main advantage of performing the noise filtering or reduction process in the transform domain lies in the
relative ease of distinguishing and removing noise from speech. As the energy of the speech signal is
concentrated on certain frequency bins, then noise removal will be achieved with ease to some extent. Hence
ideally, the transform co-efficient should be fully de-correlated and independent of each other. The transform
should also be fast and non-computationally intensive for real time applications. Last but not the least is that
transform must be reversible.
Here mainly the transform domain speech enhancement techniques are illustrated briefly.
3
1.3.3 Noise Reduction

In order to remove noise from a speech signal, the speech and noise must be distinguishable in some way.
Various combinations of time, frequency and statistical properties of speech and noise signals can be
exploited to differentiate the signals. In the time domain speech is spontaneous and discontinuous with many
pauses of varying length while many noise sources have a more continuous power envelope. Many noises
also have different frequency content than speech or the time-frequency variations of the noise signal are
different from speech. Speech and noise signals may also exhibit different time-correlation properties,
frequently noise is considered random and non-predictable.
The most commonly exploited information is the time-frequency content of the signals, motivated by the
frequency-domain transformation performed by the cochlea in the inner-ear. In speech enhancement
applications the frequency division is generally achieved using fast Fourier transform (FFT) techniques to
implement the discrete Fourier transform (DFT). To account for time variations of the signal and to allow for
real-time processing, the DFT is computed in overlapping windowed frames, where the frame length is
sufficiently short that the speech can be considered statistically stationary for the frame duration. This shorttime Fourier transform (STFT) approach gives a time-frequency division of the input signal. The N-point
STFT of a signal x at time n windowed with a function w is computed as:
(1.1)
Simple speech enhancement approaches work in the STFT domain by attenuating frequency bins with low
SNRs, leaving bins with high SNRs unmodified. Fig. 2 presents a block diagram of a STFT speech
enhancement system. The clean speech signal x[n] is mixed with additive measurement noise v[n] to give
the measurement signal z[n]. The noisy signal is enhanced in the transform domain using an estimate of the
noise spectrum and the clean signal estimate is reconstructed from the inverse transformed frames using
overlap-add (OLA) synthesis. STFT-based enhancement approaches exploit the fact that additive statistically
independent noise in the time-domain remains additive statistically independent noise in the frequency
domain; and while speech and noise overlap in time, they may not overlap in all time-frequency bins. STFT
speech enhancement algorithms are commonly used because they offer consistently high noise attenuation
and the availability of the FFT makes them computationally cheap to implement.
Fig. 1.2: The block diagram of a STFT speech enhancement system

4
1.3.4 Spectral Subtraction

Spectral subtraction [10] is one of the earliest speech enhancement algorithms. It is also one of the simplest,
both conceptually and computationally. Spectral subtraction attempts to recover the clean signal by
subtracting an estimate of the noise signal in the short-time Fourier transform domain:
| ( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( ))
(1.2)
If the signal and noise are in phase, then cos(<X(k) <V(k)) = 1 and
( )
| ( )
( ( )
( )
( ))
( )
(1.3)
In the first case, the clean signal magnitude can be recovered by subtracting an estimate of the noisy signal
magnitude |V(k)|. In the second case the clean signal power spectrum can be recovered by subtracting an
estimate of the noise power |V(k)|2. These two approaches are known as magnitude and power spectral
subtraction respectively. Since the subtraction can produce negative spectral components, half-wave
rectification is used to ensure a valid spectrum.
Spectral subtraction is appealing because it is both conceptually and computationally simple, as it requires
only a subtraction, and because it can offer high levels of noise reduction. However, the performance of
spectral subtraction relies heavily upon the accuracy of the noise estimate, |V(k)|. The original
implementation in [10] used a voice activity detector (VAD) to estimate the noise spectrum during speech
pauses, invoking the assumption of quasi-stationary, slowly varying noise. Since the noise estimate is a
smoothed average, there will generally be a mis-match between it and the true spectrum. In addition,
ignoring the cross terms in the governing equation (1.2) will introduce additional error even if the noise
magnitude estimate is exact. This leads to under or over-subtraction of the noise resulting in speech
distortion or residual noise.
The main drawback of the spectral subtraction algorithm is the nature of the residual noise. In timefrequency regions where the noisy signal spectral amplitude is close to the estimated noise amplitude,
successive over and under-estimation of the true noise spectrum leads to fluctuating, narrowband residual
noise components in the enhanced speech known as musical noise or musical tones. Musical noise is
problematic because it occurs in time-frequency regions where speech energy is low, so it is not masked, and
its un-natural quality makes it disturbing to listeners, so that the original noisy speech is often preferable to
enhanced speech with musical noise [11].
In [12], a number of modifications are proposed to reduce the level of musical noise introduced by the basic
spectral subtraction algorithm. An over-estimate of the noise is used to attenuate the spurious spectral peaks
that lead to musical noise, and a spectral floor was introduced to mask any residual peaks. A generalized
exponent is also considered to complement the magnitude and power spectral subtraction algorithms. These
5
modifications combined give the generalized spectral subtraction estimate of the clean speech FFT
coefficient as:
( ) =
( ( )
( ) )
( )
(1.4)
Where is the over-subtraction factor, which is generally SNR dependent, and is the spectral floor, which
is commonly frequency dependent. Due to the variance of the noise magnitude about its mean value, has to
be large to fully prevent the musical noise phenomenon; values of 3 to 6 are used in [12], corresponding to
over-subtraction of up to 8 dB for power subtraction. However, this same noise spectral variance means that
when the instantaneous noise spectrum is below its expected value, the desired speech will be severely
distorted and low energy speech can be removed completely. The over-subtraction factor can therefore be
used to trade-off between speech distortion, musical noise artifacts and residual noise level; and different
algorithms have been proposed to control the over-subtraction and spectral floor parameters to control this
trade-off. Despite these attempts, while spectral subtraction can reduce the perceptual impact of noise, the
modifications required to prevent musical noise result in a processed speech signal which does not improve
and can actually reduce intelligibility compared to unprocessed speech [11].
Spectral subtraction algorithms only enhance the speech magnitude or power spectrum, the noisy signal
phase spectrum is used to reconstruct the clean signal phase. This is often justified by noting that improving
the phase estimate has a relatively small impact on overall SNR speech quality compared to improving the
magnitude spectrum estimate, and that the perceptual impact of phase distortion is lower than magnitude
distortion [13]. However, the presence of cross-terms in equation (1.2) means that a perfect estimate of the
clean signal spectrum cannot be obtained by subtracting the noise spectrum from the noisy signal spectrum,
even if a perfect estimate of the noise spectrum is available.
CHAPTER 2
Multi-band Complex Spectral Subtraction Technique
2.1 INTRODUCTION
The performance of hearing aid degrades severely in the presence of acoustic back-ground noise. Noise
reduction algorithms must be incorporated in the hearing aid device in order to enhance the quality of
perceived speech signal without compromising the intelligibility of speech. Spectral subtraction method is an
extensively used noise-reduction approach though it suffers from an artefact called musical noise [11][14].
To address this problem, several other versions of spectral subtraction had been proposed starting with Bolls
original work [10]. Based on introducing over-estimation and spectral oor factor in the subtraction process,
Berouti et al. [12] suggested a method to reduce the musical noise by subtracting an overestimate of the noise
spectrum from the original noisy speech spectrum, while preventing the resultant spectral components from
going below a pre-set minimum value. A frequency adaptive subtraction factor based approach is proposed
[15][16] based on the assumption that, in general noise may not affect the speech signal uniformly over the
whole spectrum. Some frequencies are affected more severely than the others depending on the spectral
characteristics of the noise. Lockwood and Boudy [16] proposed the non-linear spectral subtraction (NSS)
method where the over subtraction factor is frequency dependent in each frame of speech. Larger values of
estimated noise are subtracted at frequencies with low SNR levels, and a smaller value is subtracted at
frequencies with high SNR levels. Kamath and Loizou [15] have extended this concept and developed a
multi-band spectral subtraction method that divides the speech spectrum into N non-overlapping bands, and
the over subtraction factor for each band is calculated independently. In [17], a method is proposed to reduce
the musical noise in silence and unvoiced regions by dividing each silence and unvoiced frame of spectral
subtracted speech into several sub-frames and randomizing the phases of each sub-frame over a uniform
interval. Hasan et al. [18] have introduced a procedure based on self-adaptive averaging factor to estimate the
a priori SNR, which is applied to the conventional spectral subtraction algorithm.
However, the performance of the above methods has not been satisfactory in adverse environments,
particularly when the SNR is very low. The reason is that in very low SNR conditions, it is very difficult to
suppress noise without degrading intelligibility and introducing residual noise and speech distortion.
Moreover, in most of the research work reported so far, in the spectral domain only the estimated magnitude
or power of noise is subtracted from the original noisy speech signal while the phase of the signal remains
unaltered assuming that human perception is phase-deaf i.e. phase of speech signal has very minor effect on
human sound perception [19]. A few recent experimental results [20] have shown that phase would play a
signicant role in speech perception mechanism, particularly at low SNR. Therefore, intelligibility and
listening fatigue may be improved further if the phase information is incorporated in the algorithm.
Here we proposed a multi-band complex spectral subtraction method incorporating the phase also in the
subtraction process based on following assumptions:
7
Real-life noise is additive and un-correlated to the speech signal.

Noise is non-stationary i.e. noise affects the speech spectrum differently at different frequencies. Hence,
a dynamic noise model has to be used.
Speech and noise signals are not co-linear i.e. they are not in same phase.
Although the phase of the original speech signal cannot be recovered from the noisy audio signal,
incorporation of phase estimation in the spectral subtraction procedure is immensely important for accurately
estimating the magnitude of the clean speech signal, particularly at low SNR condition. Using the proposed
algorithm, on an average 0.87% improvement in PESQ has been observed over Kamaths [15] multi-band
spectral subtraction technique and almost 10.03% improvement over Beroutis [12] spectral subtraction
method for 0 dB non-stationary noise.
2.2 PROBLEM STATEMENT AND PROPOSED ALGORITHM

The noise reduction (NR) block is essentially required in order to make the hearing aid system more robust
against noise. In most of the state-of-the art speech enhancement algorithms [11], noise is assumed to be
Gaussian. However, real-world noise does not have a at spectrum i.e. noise signal does not affect the speech
spectrum uniformly. In the present work, noise has been considered as non-stationary and colored. Phase
difference between the clean speech signal and noise has also been incorporated in the subtraction process to
further enhance the quality and intelligibility of the speech signal.
If we assume that noise is additive and un-correlated with the clean speech signal, then the corrupted noisy
signal can be expressed as:
(2.1)
Where Y(n), S(n) and N(n) represent the corrupted speech samples, clean speech samples and noisy speech
samples respectively. In the spectral domain, the noisy speech vector can be expressed as the resultant of
noise and clean speech vectors.
(2.2)
where k denotes the frequency bin and j is the frame index. Again using vector magnitude property, it can be
written:
Where
denotes the phase difference between clean speech spectra and noise spectra. If
speech and noise are in same phase then,
(2.3)
=0 i.e. Clean
(2.4)
8
Hence, clean speech signal can be estimated from the noisy speech signal if we estimate the background
noise.
(2.5)
In the real world scenario, the clean speech signal and noise are not co-linear and some error E is
introduced due to the
factor in equation (3). This is best illustrated in the Fig. 2.1 which shows that there
is a signicant difference between the projected |S(k,j)| and the estimated |S(k,j)| that in turn produces the
error E. This error increases further as the SNR decreases. In Fig.2.2, using a single frequency phasor
diagram, it is demonstrated that if clean speech signal magnitude and
are kept constant and the noise
level increases , then the error changes from E1 to E2 ,where E2 > E1. Hence, the computational error
due to the phase difference between S(k) and N(k) is increased for an input speech signal having a low SNR
value.
Fig. 2.1: Phase difference between clean speech and noise spectrum introduces magnitude error in
estimating the clean speech magnitude spectra from noisy signal spectrum.
Fig. 2 . 2: Phasor diagram showing that the estimation error increases with decreasing SNR.
9
To get rid of this problem, in this work, a multiband complex spectral subtraction technique has been
proposed to estimate the clean speech spectrum, which can be expressed in the matrix form as:
(2.6)
where k indicates the frequency bin of the ith frequency band of the jth frame and bi k ei . bi
and ei are beginning and ending frequency bins of the ith band respectively. | i(k,j)| and n are
the estimated magnitude and phase spectrum of the noise respectively. The parameter i is the
over-subtraction factor of the ith band and it is a function of segmental SNR. Moreover, i is an
additional band-subtraction factor (tweaking factor) [15], which can provide an additional degree
of control within each frequency band.
The band-specific segmental SNRi can be given as:
(2.7)
(2.8)
(2.9)
10
Fig. 2 .3: (a) Time-domain representation of input noisy signal,(b) Variation of values of oversubtraction factor alpha with frame-number (c) Variation of values of Delta with frame-number.
Fig. 2.4: The Segmental SNR of four linearly spaced frequency bands of speech corrupted by noise.
2.3 IMPLEMENTATION
The block diagram of the proposed multi-band complex spectral subtraction (MBCSS) technique is shown in
Fig. 2.5. The signal, acquired from AFC, is first pre-emphasised and Hamming windowed using a 25-ms
window and 50% overlap. The magnitude spectrum is estimated using 256 point FFT block. After that,
the speech spectrum is divided into N non-overlapping bands. In each band, the segmental SNR (SSNR)
is evaluated and based on the value of SSNR, over-subtraction factor and all other empirical factors
involved in subtraction process explained in section 2.2, are generated for each band. Phase is also
estimated from taking inputs from the noise spectrum estimator. After updating the noise spectrum and
11
getting the phase spectrum of noise and noisy speech, the complex subtraction process starts. Finally, the
processed bands are recombined and the signal is reconstructed using modified magnitude spectra and
modified phase spectra. The synthesis procedure is carried out with the overlap-add method [12].
Fig. 2.5: Block Diagram of Proposed MBCSS.
2.4 EXPERIMENTAL RESULTS

The NOIZEUS speech corpus database [27] is used for our experiment. This database contains 30
sentences of .wav files recorded at sampling rate 8 kHz and corrupted by different type of realworld noises (train noise, car noise, babble noise, restaurant noise etc.) For evaluation, we have
performed different types of subjective and objective measures to determine the quality and
intelligibility of enhanced speech signals. It is well-known that different types of artifacts are
introduced in the enhanced speech signal. Signal distortion, background noise distortion and
overall quality are also evaluated by composite quality and intelligibility test proposed by [22].
We have also shown the results of comparing different types of objective measures (LLR, PESQ,
SegSNR, WSS, IS) between the proposed method (MBCSS) and previously proposed methods.
High Speech quality is denoted by high values of segSNR and PESQ, and low values of the
LLR. Fig. 2.6 shows that the proposed method provides better PESQ value of enhanced speech signal
particularly in low SNR.
Fig. 2.6:
Comparison of PESQ for various SNR levels.

12
Fig. 2. 7:
Fig. 2.8:
Fig. 2.9:
Time-domain clean speech signal along with its spectrogram.
Time-domain noisy speech signal with SNR-0dB along with its spectrogram.
Time-domain Enhanced speech signal along with its spectrogram.
.
13
2.5 Future work

In this report, a novel spectral subtraction method has been presented for enhancing the speech
corrupted by real-world noises, which is based on three important design blocks, namely dynamic
noise-magnitude estimator, phase estimator and multi-band complex subtraction (MBCSS). This
block may be used as the noise-reduction block in modern digital hearing aid as this
algorithm outperformed the conventional spectral subtraction in terms of different objective and
subjective measure. Future work includes the complete hardware implementation and architecture
design of the proposed algorithm in FPGA and ASIC form.
Table 2.1: Comparison of objective measures
14
REFERENCES
[1] Kochkin,S,"MarkeTrak VIII: Consumer satisfaction with hearing aids is slowly
increasing",Hearing Journal: Volume 63- Issue 1, pp.4-48,Jan 2010.
[2] Kochkin,S "MarkeTrak VII: Hearing Loss Population Tops 31 Million," HEARING
REVIEW, vol. 12, no. 7, pp. 16 - 29, 2005.
[3] Lim, J.S.; Oppenheim, A.V., "All-pole modeling of degraded speech," Acoustics, Speech and
Signal Processing, IEEE Transactions on , vol.26, no.3, pp.197- 210, Jun 1978
[4] Paliwal, K.K.; Basu, A., "A speech enhancement method based on Kalman
filtering," Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP
'87.
[5] Zhao, D.Y.; Kleijn, W.B., "HMM-Based Gain Modelling for Enhancement of Speech in
Noise," Audio, Speech, and Language Processing, IEEE Transactions on , vol.15,no.3, pp.882892, March 2007
[6] Ding, H; Soon, I.Y; Yeo,C.K, "A DCT-Based Speech Enhancement System With Pitch
Synchronous Analysis," Audio, Speech, and Language Processing, IEEE Transaction , vol.19,
no.8, pp.2614-2623, Nov. 2011
[7] Hamacher,V ; Fischer, E ; Kornagel, U; Puder, H "Applications of adaptive signal processing
methods in high-end hearing aids," in Topics in Acoustic Echo And Noise Control: Selected
Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, And
Speech Processing (E. Hansler and G. Schmidt, eds.), Springer, 2006.
[8] Hamacher, V ;Chalupper, J ; Eggers, J ; Fischer, E ; Kornagel, U; Puder, H;Rass, U "Signal
processing in high-end hearing aids: state of the art, challenges, and future trends," EURASIP
Journal on Applied Signal Processing, vol. 2005, pp. 2915-29, 10/15 2005.
[9] Chen, Y.J ; Wei, C.W; FanChiang, Y; Meng, Y .L ;Huang, Y; Jou, S, "Neuromorphic Pitch
Based Noise Reduction for Monosyllable Hearing Aid System Application," Circuits and Systems
I: Regular Papers, IEEE Transactions on , vol.61, no.2, pp.463-475, Feb. 2014
[10] Boll, S., "Suppression of acoustic noise in speech using spectral subtraction," Acoustics,
Speech and Signal Processing, IEEE Transactions on , vol.27, no.2, pp.113-120, Apr 1979
[11] Loizou,P.C, Speech Enhancement - Theory and Practice. Boca Raton,FL, USA: CRC Press,
Taylor and Francis Group, 2007.
[12] Berouti. M.; Schwartz. R.; Makhoul. J., "Enhancement of speech corrupted by acoustic
noise," Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP
'79. , vol.4, no., pp.208-211, Apr 1979
15
[13] P. Vary,"Noise suppression by spectral magnitude estimation-mechanism and theoretical
limits," Signal Process., vol. 8, pp. 387 - 400, Jul. 1985.

[14] Takahashi, Y.; Miyazaki, R.; Saruwatari, H.; Kondo, K., "Theoretical analysis of musical
noise in nonlinear noise reduction based on higher-order statistics," Signal and Information
Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific ,
vol., no., pp.1-10, 3-6 Dec. 2012
[15] Kamath, S; Loizou, P, "A multi-band spectral subtraction method for enhancing speech
corrupted by colored noise," Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE
International Conference on , vol.4, no., pp.IV-4164 , 13-17 May 2002
[16] Lockwood, P; Boudy, J, Experiments with a nonlinear spectral subtractor (NSS), Hidden
Markov Models and the projection, for robust speech recognition in cars, Speech
Communication, vol. 11, no. 2-3, pp. 215228, 1992.
[17] Seok, J.W; Bae, K.S , Reduction of musical noise in spectral subtraction method using
subframe phase randomisation, Electronics Letters, vol. 35, pp. 123125, Jan. 1999.
[18] Hasan,M ; Salahuddin,S ; Khan.M, A modied a priori SNR for speech enhancement using
spectral subtraction rules, IEEE Signal Process. Letters, vol. 11, pp. 450453, Apr. 2004.
[19] Gerkmann, T.; Krawczyk, M.; Rehr, R., "Phase estimation in speech enhancement
Unimportant, important, or impossible?," Electrical and Electronics Engineers in Israel (IEEEI),
2012 IEEE 27th Convention of , vol., no., pp.14-17 Nov. 2012,
[20] Paliwal,K; jcicki,K,W;Shannon,B,The importance of phase in speech enhancement,
Speech Communication, vol. 53, no. 4, pp.465494, Apr. 2011.
[21] Stetzler, T.; Magotra, Neeraj; Gelabert, P.; Kasthuri, P.; Bangalore, S., "Low power realtime programmable DSP development platform for digital hearing aids," Acoustics, Speech, and
Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on , vol.4, no.,
pp.2339-2342 vol.4, 15-19 Mar 1999
[22] Yi Hu; Loizou, P.C., "Evaluation of Objective Quality Measures for Speech
Enhancement," Audio, Speech, and Language Processing, IEEE Transactions on , vol.16, no.1,
pp.229-238, Jan. 2008
[24] Mowlaee, Pejman; Martin, Rainer, "On Phase Importance in Parameter Estimation for
Single-Channel Source Separation," Acoustic Signal Enhancement; Proceedings of IWAENC
2012; International Workshop on , vol., no., pp.1-4, 4-6 Sept. 2012
[25] Yegnanarayana, B.; Murthy, H.A., "Significance of group delay functions in spectrum
estimation," Signal Processing, IEEE Transactions on , vol.40, no.9, pp.2281,2289, Sep 1992
16
[26] Rangachari, S.; Loizou, P.C.; Yi Hu, "A noise estimation algorithm with rapid adaptation
for highly nonstationary environments," Acoustics, Speech, and Signal Processing, 2004.
Proceedings. (ICASSP '04). IEEE International Conference on , vol.1, no., pp.305-8, vol.1,pp.
17-21 May 2004
[27] Ma, J., Hu, Y. and Loizou, P.. "Objective measures for predicting speech intelligibility in
noisy conditions based on new band-importance functions", Journal of the Acoustical Society of
America, 125(5), pp.3387-3405, 2009
17

Vlsi Implementation of Temporal and Transform Domain Speech Enhancement Algorithms

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Vlsi Implementation of Temporal and Transform Domain Speech Enhancement Algorithms

Загружено:

Авторское право:

Доступные форматы

Annual Report on

VLSI IMPLEMENTATION OF TEMPORAL AND TRANSFORM

Advanced Technology Development Centre

Brief Outline of the work done during the year 2013-14

implementation of various temporal and transform domain speech enhancement algorithms.

Advanced Topics in Speech Processing

Soft Computing Applications

Architectural Design of ICs

English For Technical Writing

Objective of the work:

Summary of the Report

1.3 Literature Survey

1.3.2 Time Domain Methods versus Frequency Domain Methods

1.3.3 Noise Reduction

Fig. 1.2: The block diagram of a STFT speech enhancement system

1.3.4 Spectral Subtraction

Real-life noise is additive and un-correlated to the speech signal.

2.2 PROBLEM STATEMENT AND PROPOSED ALGORITHM

Fig. 2.5: Block Diagram of Proposed MBCSS.

2.4 EXPERIMENTAL RESULTS

Comparison of PESQ for various SNR levels.

Time-domain clean speech signal along with its spectrogram.

Time-domain Enhanced speech signal along with its spectrogram.

2.5 Future work

[13] P. Vary,"Noise suppression by spectral magnitude estimation-mechanism and theoretical

limits," Signal Process., vol. 8, pp. 387 - 400, Jul. 1985.

Вам также может понравиться