Академический Документы
Профессиональный Документы
Культура Документы
Introduction (lecture 4)
Pitch estimation is essential to many
music signal applications
Genre classification
Music tutor: detection of playing fault
Music style analysis
Automatic transcription, audio signal
music score
Techniques in pitch
extraction
Time domain approaches
(1) ACF (Autocorrelation function) and MACF
(Modified Autocorrelation function)
(2) Normalized cross correlation function
NCCF
(3) AMDF (Average magnitude difference
function)
Definition of pitch
What is the pitch ( ) of a tone?
Answer: The perceived frequency of
sound. (wiki)
Method 1:
ACF (Autocorrelation function)
Autocorrelation function (ACF)
By definition, auto - correlation is
N
1
R( m) lim
x ( n ) x ( n m), 0 m M 0
N 2 N 1
n N
R for n ' ' and' -' are symmetrical, so only n 0 is used.
1
R(m)
N
N 1 m
x
n
R
m
n
What is Autocorrelation,
R(m)?
E.g.
1 N 1m
R(m)
x(n) x(n m), 0 m M 0
N n 0
It is easier if you ignor the mean (1/N) term
R(m)
N 1 m
x=[1 5 7 1 4 ]
N=5,
R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]
R(0)= (1+ 25+49+1+16)=92
R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)]
x=[1 5 7 1 4 ]
[1 5 7 1 4 ]
(5+ 35+ 7+ 4)=51
And so on
R=[92.0000 51.0000 40.0000 21.0000
4.0000]
Exercise 4.1
First, what is auto-correlation?
%matlab code
x=[1 5 7 1 4 8 6 2 4 9 3 ]'
auto_corr_x=xcorr(x) %auto- X[t]
correlation
t
figure(1), clf
Auto_correlation(x[t])
subplot(2,1,1),plot(x)
grid on, grid(gca,'minor'), hold on
subplot(2,1,2),plot(auto_corr_x)
grid on, grid(gca,'minor')
We only look at positive n
Exercise:
Gap between two peaks is 4,
so period of X is around 4
Show the steps of
calculation
Ans: ??
R(m)
N 1 m
autocorrelation
Rthe_max (j1)
Rsecond_max (j2)
Lag Time j
in samples
j1=0
j2
Ch4. pitch, v3.c
1
1
Lag_time_in_samples j2 j1
1
sampling _ frequency
sampling _ frequency
f0
j2
Ch4. pitch, v3.c
y (n ) clc x (n ) 0
, x ( n ) CL
accurate result because
x(n) C , x(n) C
L
L
R' (m)
N 1 m
y (n) y (n m), 0 m M
n 0
X(n)
CL
CL
Cut(remove)
the middle
part
Typical CL
n
=1/4 peakto-peak of X
y(n) =clc(x)
10
Finding
pitch
by center
clipping
X(n)
Y(n)=
Center
Clipped
In R(m) auto
correlation of
x(n), it is not R(m)
easy to pick
peaks
In R(m), auto
correlation of R(m)
clipped signal
y(n)=clc{x(n)}
, peaks are
easy to pick
T1
T2
T3
T=mean(T1,T2,T3)=
Period=1/(pitch_frequency)
Ch4. pitch, v3.c
11
12
Example
For each
frame, find a
X(n)
pitch.
Plot pitch
against time
(blue), you
can see the
Pitch (n)
frequency
pitch profile
time
Time n (frame)
Ch4. pitch, v3.c
13
14
Method 2:
Normalized cross correlation function NCCF
method [Verteletskaya 2009 ]
N m 1
NCCF ( m )
x (n ) x(n m)
n 0
N m 1
n 0
x 2 (n)
N m 1
x (n m)
, 0 m M0
n 0
15
Method 3:
Average Magnitude Difference Function
(AMDF) Method [Verteletskaya 2009
1
Dx ( m )
N
N 1 m
x(m) x(n m) ,
n 0
peaks
0 m M0
16
Method 4:
Cepstrum Pitch Determination (CPD)
[Verteletskaya 2009 ]
s ( n ) e( n ) h ( n )
s ( w) E ( w) H ( w)
F 1 log S ( w)
F 1 log E ( w) F 1 log H ( w)
1
C (m)
N
N 1
S (k ) e
2
mk
N
k 0
N 1
2
nk
N
The problem : For human voice, the peak may be the result of glottal excitation.
17
18
Cepstrum of speech
A new word by reversing the first 4 letters of
spectrum cepstrum.
It is the spectrum of a spectrum of a signal
Too many ripples in the
Why we need this?
Answer: remove the ripples
of the spectrum caused by
glottal excitation.
Fourier
Transform
Speech signal x
Spectrum
Ch4. pitch, v3.c
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
of x
19
Cepstrum=
C(n)=fft|(log|fft(x(n))|)|
Quefrency is in time
domain (in second)
So Higher
Quefrency lower
frequency
20
C_high
For
Glottal
excitation
Frequency
Spectrum of vocal track filter
C_high
Cepstrum of vocal track
For
Vocal track
Frequency
quefrency (sample index)
This peak may be the pitch period:
This smoothed vocal track spectrum
can be used to find pitch
For more information see :
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
Ch4. pitch, v3.c
21
A4_Oboe
Spectrogram
22
Input:
Oboe A4
X(n)
Fourier Transform
X(w)=fft(x)
Cepstrum
C(n)=fft|(log|fft(x(n))|)|
From range 200
To 900 Hz
Cepstrum C(n)
All range, around
From 30 to Hz
Hz
900Hz
1/900=1.11x10^-3
200Hz
23
1/200=5x10^-3
Summary
Methods of pitch extraction have
been studied.
Cepstrum and its use for pitch
extraction is discussed.
24
References
[Naotoshi Seo 2007] Project: Pitch Detection, ]
http://note.sonots.com/SciSoftware/Pitch.html#ke283f3a
[Verteletskaya 2009 ] E. Verteletskaya, B. imk,
Performance Evaluation of Pitch Detection Algorithms,
http://access.feld.cvut.cz/view.php?
cisloclanku=2009060001
[Rabiner1976] Rabiner, L.; Cheng, M.; Rosenberg, A.;
McGonegal, C." A comparative performance study of
several pitch detection algorithms",IEEE Transactions on
Acoustics, Speech and Signal Processing, Volume: 24,
Issue:5 page(s): 399 - 418, Oct 1976
25
Appendix
26
27
% source : http://www.angelfire.com/in2/yala/t4scales.htm
28
Autocorrelation
In signal processing, given a signal
f(t), the continuous autocorrelation is
the continuous cross-correlation of f(t)
with itself, at lag , and is defined as:
R f ( ) f * ( ) f ( )
f (t ) f * (t )dt f (t ) f * (t )dt
( x )( x
n j
)
Ch4. pitch, v3.c
29
%matlab code
x=[1 5 7 1 4 8 6 2 4 9 3 ]'
auto_corr_x=xcorr(x) %auto- X[t]
correlation
t
figure(1), clf
Auto_correlation(x[t])
subplot(2,1,1),plot(x)
grid on, grid(gca,'minor'), hold on
subplot(2,1,2),plot(auto_corr_x)
grid on, grid(gca,'minor')
We only look at positive n
Exercise:
Gap between two peaks is 4,
so period of X is around 4
Show the steps of
calculation
R(m)
N 1 m
70
24
0]
30
It is using MACF, you can use ACF, and the result for the pitch
found is the same for this example.
31
32
Matlab
33