Вы находитесь на странице: 1из 10

FILTER BANKS AND WAVELETS EEN698 1

A Wavelet-based Pitch-shifting Method


Alexander G. Sklar

Abstract— In audio engineering, Pitch shifting is a term to signal). On the other hand, stationary signals do not occur
describe the process of changing the pitch of an audio signal in practice; consider the signal in the figure 1, containing
(which is related to the logarithm of the frequency), but opposite two frequencies ocurring at different times and its Fourier
to resampling, pitch shifting retains the duration of the original
signal. This process is very useful for example in correcting Transform magnitude spectrum in figure 2:
slightly out-of-tune singers or instruments. The dual problem
is changing the signal duration without affecting its pitch; this
is called time stretching. One of the most well-known techniques
for performing pitch shifting is called Phase Vocoding, but this
method presents some artifacts. This project aims to overcome
some of the deficiencies of the Phase Vocoder by using a wavelet-
based approach.
Index Terms— Pitch shift, vocoder, phase vocoder, wavelet,
Morlet

I. I NTRODUCTION
ITCH shifting describes the process of changing the pitch
P of an audio signal (which is related to the logarithm
of the frequency), but, opposite to resampling, pitch shifting
retains the duration of the original signal. This process is very
useful for example in correcting slightly out-of-tune singers or
instruments. The dual problem is changing the signal duration
without affecting its pitch; this is called time stretching. One of
the most well-known techniques for performing pitch shifting Fig. 1. A non-stationary signal
is called Phase Vocoding. This is a frequency-based technique,
whose main idea is to chop the input signal into very short
segments, perform a frequency domain analysis, change the
frequency content of that portion, and convert back to the time
domain. Basically, this is summarized as follows:

Algorithm 1 Typical frequency-domain algorithm


function freq_algorithm(in: x, out: y)
yb = [];
xb = buffer(x, N, M);
foreach segment in xb do
yb=fft(segment);
yb=modify(yb);
yb=[yb; ifft(yb)];
end
y=unbuffer(yb, N, M);

We split the input signal into (possibly overlapping by M


samples) blocks of length N , transform, modify and trans- Fig. 2. A non-stationary signal spectrum according to the FT
form back. This is a Short-Time Fourier Transform (STFT)
approach. We note that the FT does not possess any time-localization
information. Frequencies occur at all instants in time. To over-
A. FT, STFT and the Wavelet Transform come this lack of time information, the STFT was developed.
In the next figure we see a spectrogram of the same signal).
For stationary signals, the Fourier transform contains all
We see that the frequencies are well localized in time as well
the necessary frequency information, since all the present
as in frequencies.
frequencies are present all the time (this is after all, a stationary
One of the difficulties the Phase vocoder encounters is the
asklar@gmail.com “smearing” effect that affects transient components (such as a
2 FILTER BANKS AND WAVELETS EEN698

Fig. 3. A non-stationary signal’s spectrogram Fig. 5. A non-stationary signal with unresolved frequency components

bass drum or snare drum). This is due to the fact that despite quency1 and time). The main idea behind the use of wavelets
the STFT being “localized” both in time and frequency, it is is to decompose a signal into a (possibly infinite) sum of
not localized enough. Consider the effect of computing the functions which are both time- and frequency-localized. This
is very different to the FT or STFT where the basis functions
spectrogram with a longer and a shorter window width in
figures 4 and 5. were sinusoids (which are frequency-localized but not time-
localized since they extend from −∞ to +∞).

B. The Morlet Wavelet


The Morlet wavelet is named after Jean Morlet2 , although
it was originally formulated by Goupillaud, Grossmann and
Morlet in 1984. It is a three-parameter function of time:
1 1 2
Ψσ (t) = cσ π − 4 e− 2 t (eiσt − κσ ) (1)
− 12 σ2
where κσ = e is defined by the admissibility criterion3
and the normalization constant cσ is:
 2
 1
3 2 −2
cσ = 1 + e−σ − 2e− 4 σ
The Fourier transform of the Morlet wavelet is:

2
1 1
+ω 2 )
Ψ̂σ (ω) = cσ π − 4 ((σ − ω) eσω + ω) e− 2 (σ (2)
The "central frequency" ωΨ is the position of the global
Fig. 4. A non-stationary signal with unresolved time components maximum of Ψ̂σ (ω) which, in this case, is given by the
solution of the equation:
We see that in the figure 4, the precise location of the 2 2
− 1 e−σωΨ

(ωΨ − σ) − 1 = ωΨ
beginnings of the tones is uncertain. Likewise, the precise
frequency in figure 5 isn’t obvious. This is because the The parameter σ in the Morlet wavelet allows trade between
frequency resolution is inversely proportional to the analysis time and frequency resolutions. Conventionally, the restriction
window length. This is known as the Heisenberg uncertainty σ > 5 is used to avoid problems with the Morlet wavelet at
principle low σ (high temporal resolution).
In summation, the STFT provides constant bandwidth anal- 1 Actually, 1
ysis, i.e. high resolution in the high frequencies and low frequency .
2 Jean Morlet is a French geophysicist who did pioneering work in the field
resolution in the low frequencies. of wavelet analysis in collaboration with Alex Grossman. Morlet invented the
To overcome these problems, the wavelet transform was term “wavelet” (ondelette) to describe equations similar to those that had been
devised. Just as the STFT, the Wavelet transform is a two- around since the 1930s.
3 This criterion states that
R
R Ψσ (t) dt = 0. It is fundamental for the
dimensional representation of a one-dimensional signal. The existence of an inverse wavelet transform as well as for the applicability of
parameters are the scale and translation (analogous to fre- Parseval’s identity.
A WAVELET-BASED PITCH-SHIFTING METHOD 3

For signals containing only slowly varying frequency and components and formants should be transposed accordingly.
amplitude modulations (audio, for example) it is not necessary We analyze the effects of formant shifting in a later section.
to use small values of σ. In this case, κσ becomes very small As we have already stated, there are many algorithms nowa-
(e.g. σ > 5 ⇒ κσ < 10−5 ) and is, therefore, often neglected. days, both of public domain and commercial ones, which carry
Under the restriction σ > 5, the frequency of the Morlet out pitch shifting with varying degrees of success. To mention
wavelet is conventionally taken to be ωΨ ' σ. a few, the Phase vocoder is a very well-known technique which
is also very well suited for real time processing since it can
C. Discrete vs. Continuous Wavelet transform be implemented quite efficiently.

To be orthonormal, a set of wavelet functions must be


complete (i.e. there has to be enough of them), and each must A. The Phase Vocoder
be orthogonal to the others (i.e. no overlap or projection). In A Phase Vocoder is often implemented using Fast Fourier
addition, each wavelet function is normalized to have unit total Transform (FFT) analysis. The FFT of a signal will produce
energy. its frequency spectrum, with the limitation being that the
For the Morlet wavelet, there are two possibilities: transform does not preserve any temporal information of the
1) the continuous wavelet transform: here the wavelet is signal, i.e. we can gain information of a signal’s frequency
slid along the time function. Therefore, each wavelet content, but not when in time those frequencies occurred.
overlaps the ones next to it, which gives lots of redun- To overcome this limitation, the technique of the Short Time
dant information. This is not orthogonal. Fourier Transform (STFT) was developed. The STFT is a se-
2) the discrete wavelet transform: we skip along the time ries of windowed Fourier Transforms which slightly overlap in
function, only doing the wavelet transform where the time. By segmenting the signal and then performing individual
wavelet no longer overlaps the previous ones. FFTs on each block, some of the time information can be
• For the Morlet wavelet, this is approximately orthogonal,
preserved (with a resolution equal to the window’s length)
but not exactly. Due to the extended tails of the Gaussian, while still achieving the spectrum of the signal. Windowing
it is not possible to construct a truly orthogonal set for is needed to avoid frequency leakage because of the signal
the Morlet. truncation ([4], [5]).
• For other wavelets such as the Daubechies, it is possible
As we have already stated, the output can be viewed as a
to construct an exactly orthogonal set. 2D discrete grid with time and frequency as the axis. Once
the processing has been done to create the 2D analysis grid,
So the moral is: if we want to do the continuous wavelet
the grid values can be changed to produce any desired effect.
transform, then we aren’t worried about orthogonality, and we
The original time domain signal can be reconstructed by a
can use the Morlet. If you want to do the discrete wavelet
series of IFFTs followed by an overlap and add of each
transform, don’t use the Morlet.
individual block. Pitch shifting in particular, involves scaling
the phase information of the individual vocoder components
D. A wavelet approach to pitch shifting while maintaining the magnitude components ([1]).
De Gersem et al. suggest an algorithmic framework for pitch
shifting using the Complex Morlet wavelet. Firstly, we take B. The Continuous Wavelet Transform (CWT)
the CWT of the original signal to obtain a time-frequency
The CWT offers just another way to generate a time-
representation of it. The second step is to scale the frequency
frequency representation of a signal. The CWT of a function
axis of this “grid” by a constant factor c. Since we are keeping
f (t) is defined as
the same time scale, when we convert this grid back to the time
domain, the result is a pitch-shifted version of the original. The Z ∞

algorithm is as follows: F (a, b) = f (t) ψa,b (t) dt (3)
−∞

Algorithm 2 Pitch-shifting algorithm where  


1 t−b
coefs=cwt(f,scales); ψa,b (t) = p ψ
absc=abs(coefs); |a| a
phac=angle(coefs); and ψ (t) is the “mother” wavelet.
phac_unwrap=unwrap(phac); The inverse operation is defined as:
coefs_shifted=absc.*exp(i*phac_unwrap*c);
scales_shifted=scales/c; Z ∞
da ∞
 
1 b−t
Z
f_shifted=icwt(coefs_shifted,scales_shifted); f (t) = F (a, b) ψ db (4)
Cψ 0 a2 −∞ a

II. P REVIOUS W ORK Where


∞ 2
|ψ (ω)|
Z
A good pitch shifting algorithm should be able to transpose Cψ = dω
an audio signal a tone up or down without any changes in −∞ |ω|
the duration of the signal. It should also preserve the transient
4 FILTER BANKS AND WAVELETS EEN698

One can wonder, does any function constitute a wavelet?


The answer is, unfortunately no. The selection of the wavelet
function ψ (t) is not arbitrary and must meet two main
restrictions.
1) The wavelet function must be absolutely integrable and
square integrable (ψ ∈ L1 ∩ L2 ).
Z ∞
|ψ (t)| dt < ∞
−∞

and Z ∞
2
|ψ (t)| dt < ∞
−∞

2) The wavelet must have zero mean value (Admissibility


Condition).
Z ∞
Fig. 6. The Complex Morlet wavelet
ψ (t) dt = 0
−∞

Besides these restrictions the wavelet function must be ap-


propriate for the type of application. The CWT projects the
signal onto a series of time-shifted and dilated versions of
the wavelet function. To capture a desired aspect of a signal,
the wavelet should possess similar properties. For the case of
musical signals, the energy distribution and phase behavior
are very important characteristics. Thus the Complex Morlet
wavelet has been selected by [2] as the most appropriate for
audio representations.

C. The Complex Morlet Wavelet


The Complex Morlet wavelet is defined in a similar way to
the Morlet wavelet:
1 j2πfc t − ft2 Fig. 7. The Complex Morlet wavelet frequency response (note that since
ψ (t) = √ e e b this is a complex signal, the magnitude is not symmetric and the phase is not
πfb
antisymmetric).
where fb is the bandwidth parameter and fc is the wavelet
center frequency.
For our purposes, we will take D. The Phase Vocoder and the CWT
As we have already stated, both the CWT and the STFT
fb = 2
represent a signal on a time-frequency (or translation-scale)
5
fc = grid. However, the STFT is an equispaced time-frequency
2π lattice whose time resolution is the window size and frequency
so that we are left with resolution inversely proportional to it. Since we are concerned
with music applications, we are interested not only in the
t2 steady-state evolution, but also in the transients. Some exper-
ψ (t) = c ej5t e− 2
iments have shown the fundamental importance of transients
with in audio perception. For example, we know that if we take
1 two instruments producing the same note and exchange both
c= √ notes’ onsets, there is a great deal of cross-confusion, i.e.

given two instruments I1 and I2 producing notes n1 ∈ I1
In our calculations we will use an unnormalized version of and n2 ∈ I2 , people will classify ni → Ij for i = j as well as
the wavelet function, so that c = 1: for i 6= j with equal measure. Besides instruments, transients
t2
are crucial in speech for example for fricative discernment.
ψ (t) = ej5t e− 2 (5) In music applications, transients also appear as a product of
percussive instruments such as drum hits. The STFT phase
A WAVELET-BASED PITCH-SHIFTING METHOD 5

vocoder uses too fine a resolution in the low frequencies and The bottom line is, the RKHS condition means that there
too coarse a resolution on the high frequencies. This results exists some unique signal whose transform is a signal we mod-
in lost timing as well as the addition of beating or warbling ified. Put differently, if we begin with a signal, transformed
effects. The term smearing is used to describe the undesirable it to a different domain through a linear functional such as
side effects of Phase Vocoder processing. This is one of the the FT or the CWT, and modify this transformed signal in
motivations for using the CWT. some way, we don’t have any warranty that the modified
The CWT, by definition, provides a grid that is not a discrete transformed signal corresponds to any signal in the original
one as in the STFT, but a continuous one, as the name implies. domain! The condition than guaranties the existence of a time-
The grid is therefore, a variable scale one. The scale is given domain inverse is the RKHS condition, which is dependent on
by the dilation of the wavelet function. The time-frequency the kind of modification we perform on the dual domain.
representation of the signal can have resolutions tailored to
the areas of interest. By using the CWT, one can create a B. Mathematical Foundations of the Pitch Shifting Algorithm
grid which has finer resolutions for increasing frequencies As we have already stated, the classical approach to pitch
which is more suited to music. Unlike the Phase Vocoder, shifting is to first compute the STFT time-frequency represen-
the analysis grid created by the CWT is not a time-frequency tation of a signal. Afterwards, the grid spacing and coefficient
grid in the strict sense. Since the CWT performs multiple phases are scaled to create a synthesis grid. The inverse is
projections onto scaled wavelet functions, the output actually then computed to reconstruct the signal. It is summarized as
represents the similarity (i.e. the correlation) between the input follows:
signal and the wavelet function at each scale. A frequency is 1) Compute the STFT representation of the signal.
often assigned to each scale value, so that one can achieve 2) Convert coefficients into polar form
an approximate representation of the frequency content. This 3) Unwrap the phase and divide by the scaling factor c.
frequency is the dominant spectral component of the wavelet 4) Reconstruct signal using new synthesis scale.
function. Depending of the wavelet used, this frequency can
The crucial part here is the division of the phase by the scaling
be a very good or very poor estimate of frequency content,
factor c. Consider the following property (scaling property) of
viz. if the wavelet has a very spread out response, then the
the Fourier Transform:
usefulness of this “central” frequency isn’t that good, just as
1 ω
what happened in the spectrograms with the STFT. This is f (at) ⇐⇒ F
a major reason why the Complex Morlet wavelet is used for |a| a
audio signals. This wavelet has a very narrowband frequency We want to exploit this property in our approach to scale the
spectrum, which offers a good representation of the frequency pitch of an audio signal. Any complex number z can be written
content. in polar notation, as z = |z| ej∠z . In particular,
F (ω) = |F (ω)| ej∠F (ω)
III. M ETHODOLOGY
Therefore, F a = F ωa exp j∠F ωa .
ω
  
A. Reproducing Kernel Hilbert Subspace
Recall the definition of group delay
A reproducing kernel Hilbert space is a function space in
d∠F (ω)
which pointwise evaluation is a continuous linear functional. τ (ω) = −

Equivalently, they are spaces that can be defined by reproduc-
ing kernels. The subject was originally and simultaneously We then get
d∠F ωa

developed by Nachman Aronszajn (1907-1980) and Stephan ω
Bergman (1895-1987) in 1950. τ = −a = aτ 0 (ω)
a dω
Let X be an arbitrary set and H a Hilbert space of complex- where
d∠F ωa

valued functions on X. H is a reproducing kernel Hilbert
τ 0 (ω) = −
space iff the linear map f 7→ f (x) dω
from H to the complex numbers is continuous for any x in
X. By the Riesz representation theorem, this implies that for
a given x there exists an element Kx of H with the property ω   ω   Z   
ω
F = exp j∠ − τ dω

that: f (x) = hKx , f i ∀f ∈ H (∗) a
F
a a
def
The function K(x, y) = Kx (y) is called a reproducing  ω   Z 
0
= exp j∠ − a τ (ω) dω

kernel for the Hilbert space. In fact, K is uniquely determined F
a
by condition (*).  
1  ω 
Z
For example, when X is finite and H consists of all f (at) ⇐⇒ exp j∠ − a τ 0 (ω) dω
|a| a
F
complex-valued functions on X, then an element of H can
be represented as an array of complex numbers. If the usual 1
f (at) ⇐⇒ |F (Ω)| exp (ja∠F (Ω))
inner product is used, then Kx is the function whose value is |a|
1 at x and 0 everywhere else. The magnitude is scaled by a constant factor, so we won’t
R In other contexts, (*) amounts to saying f (x) = take this into account since this corresponds only to a dif-
X
K(x, y)f (y) dy. ference in volume. The phase portion of the FT is related to
6 FILTER BANKS AND WAVELETS EEN698

the group delay of the signal’s frequency content. To maintain where ⊗ denotes convolution.
the same time duration when changing the frequencies, the The final result of Equation 6 can be quickly found in the
slope of the phase needs to be scaled to either slow down or frequency domain by taking the inverse Fourier transform of:
speed up the group delay to compensate for the time scaling
F̃ (an , b) = |a|f˜ (ω) ψ̃ (aω)
p
side effect. The hardest part in terms of computation is the
transformation of the input signal into the 2D time-frequency
representation. The quality of this transform will directly affect
the output pitch shift. It was stated earlier how the CWT results
E. Computation of the Inverse CWT
in a similar 2D signal representation. More care has to be
taken when modifying coefficients of the CWT however, as The inverse CWT is defined by equation 4. We see that it in-
the reconstruction conditions are much more complex than volves a double integral, which can be performed numerically
that of the STFT Phase Vocoder. Any modifications in the dual through some integration technique such as the trapezoidal
domain must be made while satisfying the Reproducing Kernel rule, quadrature integration, etc. The trapezoidal rule approx-
Hilbert Subspace (RKHS) property. Changing the phase com- imates the area under a curve by fitting a trapezoid between
ponents of the Complex Morlet CWT maintains this property. every adjacent sample point. There is also an approximate
Thus the algorithm used in the Phase Vocoder implementation reconstruction formula given by [2] which simplifies to a
can be used with the Complex Morlet CWT. single integration over the dilation scales.
Z ∞
1
C. The Pitch Shifting Algorithm f (t) ≈ Kψ F (a, t) 2/3 da (7)
−∞ a
The pseudo code for the entire CWT based pitch shift
algorithm is as follows:
IV. R ESULTS
A. Algorithm Testing
function pitch_shift(in c: real, in x:
real[], out y: real[]) The algorithm was implemented in MATLAB, and several
coefficients=cwt(x,scales); tests were carried out, which we describe below.
magnitude=abs(coefficients); Example 1: We pitch-shift a middle-C tone (256 Hz) by
phase=unwrap(angle(coefficients)); a factor of 2 and 12 (an octave up and down respectively).
coefs_shifted=magnitude*exp(j*phase*c); The spectrogram of the original and the two shifted signals is
scales_shifted=scales/c; shown in figures 8, 9 and 10 respectively, and the FT of the
y = icwt(coefs_shifted,scales_shifted); relevant frequency region is shown in figure 11.

The main component of the algorithm is the development


of the cwt() and icwt() functions for the Complex Morlet
Wavelet. To implement these continuous time operations on
a computer, some level of discretization and approximation
must be made. The CWT can be well implemented using a
series of convolutions carried out in the frequency domain.

D. Computation of the CWT


Recall the definition of the CWT from Equation 3 and the
Complex Morlet definition from equation 5, by substitution of
the Complex Morlet wavelet into the CWT formula:
Z t − b  t − b 2 !∗
p1 1

F (a, b) = f (t) exp j5 − dt
|a| a 2 a
Z t − b  t − b 2 !
−∞

p1 1

= f (t) exp −j5 − dt
|a| a 2 a Fig. 8. Spectrogram of a middle-C tone
Z  b − t  1  b − t 2 !
−∞

p1 ∞
= f (t) exp j5 − dt We see that the algorithm performs well: the frequencies
|a| a 2 a
Z
−∞
b − t are shifted correctly up and down, and the duration of the clip
p1

= f (t) ψ dt is preserved.
|a| −∞ a

We rewrite this as Example 2: In the next test case, we wanted to see how the
  algorithm performed as to preserving the temporal character-
1 b istics of a signal. To do this, a middle C tone burst was used
F (a, b) = p f (b) ⊗ ψ (6)
|a| a as the test signal, and shifted an octave higher.
A WAVELET-BASED PITCH-SHIFTING METHOD 7

Fig. 9. Spectrogram of a higher C tone Fig. 11. Fourier Transform of the three C tones

Fig. 10. Spectrogram of a lower C tone Fig. 12. C tone Burst

The signal is shown in figures 12, 13 and 14 as well as the These results show that our algorithm certainly works. A
spectrogram of the original and shifted signals. final test was performed involving speech and music signals.
We notice there is some smearing of the frequency content, For both of these classes of signals, the algorithm performs
but considering the abrupt frequency changes, the smearing is fairly well. In music signals, there is a very small distortion
not too large and there is negligible distortion upon playback. artifact in between notes, which corresponds to discontinuity
in the time-frequency grid. Voice signals are very susceptible
Example 3: Harmonic relations. In this test we verify to distortion when pitch shifting. Smearing can result in
that a shifted C major scale preserves the harmonic relations a loss of speech coherency, which is almost unnoticeable
between successive notes. Note that if we were to shift for instrumentals but is very noticeable for voice. Despite
frequencies (as opposed to pitch), we would be shifting the this, even in the extreme cases of the 12 semitone shifts,
frequency content by an additive (or subtractive, for that mat- much of what the speaker is saying is still clear. The major
ter) value. Instead, we are changing the frequency content by distortion that can be noticed is the timbre changes as pitch
a multiplicative factor, thus changing the pitch by a constant, is altered. The voice of the singer transforms from beast-like
additive term. to chipmunk-like as the pitch is changed. Unfortunately this
The spectrogram of the original and shifted signals are is one thing the algorithm does not account for.
shown in figures 15 and 16 respectively. 1) Formants: The algorithm shifts each and every fre-
By listening to the resultant audio files and from inspection of quency. This can cause the unique characteristics of a signal
the spectrograms, we see that the spacing (exponential spacing, to change, namely the formants. These are created by a
that is) between tones is definitely maintained. However, some particular person’s vocal tract and are what makes a certain
attenuation can be noticed near the end of the signal. voice distinguishable from someone else’s. They correspond
8 FILTER BANKS AND WAVELETS EEN698

Fig. 13. C tone Burst spectrogram


Fig. 15. C major scale spectrogram

Fig. 14. Shifted C tone Burst spectrogram

Fig. 16. Shifted C major scale spectrogram

to the resonant frequencies associated with the normal modes


inside a person’s vocal tract, and are of course different for formula giving in Equation 7 was used instead of the tra-
every possible vowel/consonant. ditional, double-integral inverse formula. This approximation
To maintain these identifiable characteristics while pitch introduces error; in addition, the integral was computed using
shifting, certain frequencies should not be emphasized as much a very crude trapezoidal approximation, further contributing
as others. This is a difficult task to do as the CWT method to errors in the result.
treats all frequencies equally. The term formant preservation The CWT has been shown to be an alternative to the Phase
is used to describe pitch shifting while maintain the character Vocoder in representing a signal in a time-frequency format.
or timbre of a signal but it is a complex topic involving Our method still has some areas that need improvement.
psychoacoustics. Mainly, the ICWT algorithm has to be refined, and the underly-
ing numerical approximation technique should be investigated.
V. C ONCLUSIONS Further work on this pitch shifter may also include research
into the use of alternate wavelets. Work can also be done on
There are several possible sources of distortion for the finding a method of preserving vocal formants, thus avoiding
pitch-shifting algorithm. First, we are computing the CWT the chipmunk effect.
only in a discrete lattice (since that’s all we can represent
in a computer). A finer sampling lattice consisting of more
projections per octave may provide a more accurate time-
frequency representations of the signal. Another major source
of errors may be in the ICWT calculations. The approximate
A WAVELET-BASED PITCH-SHIFTING METHOD 9

A PPENDIX We now prove the following


The Heisenberg Uncertainty principle for L and L 1 2 Theorem 1: Heisenberg inequality for L2 signals
signals Proof: Equation 8 implies that
2 Z
a) L1 (R) functions: Assume f (t) and its Fourier trans- df (t) 2
Z Z
t f (t) df (t) dt ≤ 2 2

form F (ω) are absolutely integrable. Let t and ω be any t f (t) dt dt

R dt
R R dt
two values such that f t 6= 0 and F (ω) 6= 0. The
corresponding effective duration ∆t and the corresponding Let
df (t)
Z
effective bandwidth ∆ω are the satisfying
A = t f (t) dt
Z ∞
 R dt
|f (t)| dt = f t ∆t
 
Z d f (t)2
−∞ 2
= t dt
and Z ∞ R dt

2 2
|F (ω)| dω = |F (ω)| ∆ω f (t) f (t)
Z
−∞ = t − dt
2 2

R
−∞
with R∞
t |f (t)| dt 2
By hypothesis, |t| f (t) → 0 ⇒ A = − E2 .
t = R−∞
∞ d
−∞ |f (t)| dt We now recall that dt f (t) ⇐⇒ jωF (ω), we then have
using Parseval’s identity that
and R∞
ω |F (ω)| dω Z
df (t) 2

dt = 1
Z
ω = R−∞
∞ ω 2 |F (ω)| dω
−∞
|F (ω)| dω R
dt 2π R
Z ∞
Therefore,
E 2
2
|F (ω)| ≤ |f (t)| dt
Z
t f (t) df (t) dt

−∞ − =
 2 dt
≤ f t ∆t ZR
1
Z
Z ∞ 2 2
 1 ≤ t2 f (t) dt · ω 2 |F (ω)| dω
f t ≤ |F (ω)| dω R 2π R
2π −∞
2 1 2
1 = E (∆t) · E (∆ω)
= |F (ω)| ∆ω 2π
2π E2
2 2
1   ≤ E 2 (∆t) (∆ω)
∆t∆ω |F (ω)| f t ≥ f t |F (ω)| 4
2π 1
1 ∆t∆ω ≥
∆t∆ω ≥ 2

QED.
b) L2 (R) functions: Define
Z ∞ Z ∞
2 1 2
E= |f (t)| dt = |F (ω)| dω
−∞ 2π −∞

R∞ 2
2 −∞
t2 |f (t)| dt
(∆t) =
E
R∞ 2
2 −∞
ω 2 |F (ω)| dω
(∆ω) =
2πE
2
Claim 1: If limt→±∞ t |f (t)| = 0, then ∆t∆ω ≥ 21 .
We recall the Cauchy-Schwartz Inequality for the L2 Hilbert
space. For any L2 functions z (x) and w (x) defined in the
interval [a, b],
Z 2 Z
b b Z b
2 2
z (x) w (x) dx ≤ |z (x)| dx |w (x)| dx (8)


a a a

which in vector notation becomes


2
hZ, W i ≤ hZ, Zi hW, W i
or equivalently
hZ, W i ≤ kZk kW k
10 FILTER BANKS AND WAVELETS EEN698

Alexander Sklar was born in Tel-Aviv, Israel on


January 19th , 1981. He graduated from both the
Electrical Engineering and Computer Science pro-
grams at the Universidad de la República in Monte-
video, Uruguay in May 2005. His interests include
software design and engineering, music and DSP,
robotics and control.

R EFERENCES
[1] De Gersem P., De Moor B., Moonen M., “Applications of the continuous
wavelet transform in the processing of musical signals”, in Proc. of the
13th International Conference on Digital Signal Processing (DSP97),
Santorini, Greece, Jul. 1997, pp. 563-566
[2] De Gersem P., De Moor B., Moonen M., “Applications of wavelets
in audio and music”, in Record of the KVIV Study-day on Wavelet
analysis: a new tool in signal and image processing, Antwerp, Belgium,
Dec. 1996, 14 p.
[3] Siome Goldenstein, Jonas Gomes, "Time Warping of Audio Signals,"
cgi, p. 52, Computer Graphics International 1999 (CGI’99), 1999
[4] S.K. Mitra, “Digital Signal Processing”, 2nd Edition, McGraw-Hill
Science, McGraw-Hill, 2001.
[5] A. Oppenheim, R. Schafer. Discrete-time Signal Processing. Prentice-
Hall Signal Processing Series. Prentice-Hall, Upper Saddle River, NJ,
1999
[6] U. Zolzer, “DAFX Digital Audio Effects,” West Sussex, Eng-
land:Wiley,2002 pp. 201-282.
[7] R. Kronland Martinet, “The wavelet transform for analysis, synthesis
and processing of speech and music sounds,” Computer Music Journal,
vol. 12, Winter 1988, pp. 11-20
[8] J.P. Antoine, Two-dimensional wavelets and image processing, Institut
de Physique Théorique, Université Catholique de Louvain.
[9] J.R. Beltrán, F. Beltrán, Additive Synthesis-Based on the continuous
wavelet transform: a Sinusoidal plus Transient Model, Proc. of the
6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK,
September 8-11, 2003
[10] A.M. Reza, Spire Lab, UWM, From Fourier Transform to Wavelet
Transform,White paper, 1999.
[11] J.L. Flanagan, R.M Golden, Phase Vocoder, Bell Syst. Tech. J., vol. 45,
pp. 1493-1509, 1966
[12] F. Hammer, Time-scale Modification using the Phase Vocoder: An
approach based on deterministic/stochastic component separation in
frequency domain, Diploma Thesis, Institute for Electronic Music and
Acoustics (IEM), Graz University of Music and Dramatic Arts, Graz,
Austria.
[13] P. Bastien, Pitch shifting and voice transformation techniques, TC-
Helicon.
[14] Wikipedia http://www.wikipedia.com

Вам также может понравиться