Вы находитесь на странице: 1из 33

Chapter 4: Pitch

estimation for music signal


processing
KH Wong

Ch4. pitch, v3.c

Introduction (lecture 4)
Pitch estimation is essential to many
music signal applications
Genre classification
Music tutor: detection of playing fault
Music style analysis
Automatic transcription, audio signal
music score

Ch4. pitch, v3.c

Techniques in pitch
extraction
Time domain approaches
(1) ACF (Autocorrelation function) and MACF
(Modified Autocorrelation function)
(2) Normalized cross correlation function
NCCF
(3) AMDF (Average magnitude difference
function)

Frequency domain approaches


(4) Cepstrum Pitch Determination (CPD)

Ch4. pitch, v3.c

Definition of pitch
What is the pitch ( ) of a tone?
Answer: The perceived frequency of
sound. (wiki)

Ch4. pitch, v3.c

Method 1:
ACF (Autocorrelation function)
Autocorrelation function (ACF)
By definition, auto - correlation is

N
1
R( m) lim
x ( n ) x ( n m), 0 m M 0

N 2 N 1
n N
R for n ' ' and' -' are symmetrical, so only n 0 is used.

1
R(m)
N

N 1 m

x(n) x(n m), 0 m M


n 0

x
n
R

Symmetrical on both side

m
n

Ch4. pitch, v3.c

What is Autocorrelation,
R(m)?
E.g.

1 N 1m
R(m)
x(n) x(n m), 0 m M 0
N n 0
It is easier if you ignor the mean (1/N) term
R(m)

N 1 m

x(n) x(n m), 0 m M


n 0

x=[1 5 7 1 4 ]
N=5,
R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]
R(0)= (1+ 25+49+1+16)=92

R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)]
x=[1 5 7 1 4 ]
[1 5 7 1 4 ]
(5+ 35+ 7+ 4)=51
And so on
R=[92.0000 51.0000 40.0000 21.0000
4.0000]

Ch4. pitch, v3.c

Exercise 4.1
First, what is auto-correlation?

%matlab code
x=[1 5 7 1 4 8 6 2 4 9 3 ]'
auto_corr_x=xcorr(x) %auto- X[t]
correlation
t
figure(1), clf
Auto_correlation(x[t])
subplot(2,1,1),plot(x)
grid on, grid(gca,'minor'), hold on
subplot(2,1,2),plot(auto_corr_x)
grid on, grid(gca,'minor')
We only look at positive n
Exercise:
Gap between two peaks is 4,
so period of X is around 4
Show the steps of

calculation

Ans: ??

R(m)

N 1 m

x(n) x(n m), 0 m M


n 0

Ch4. pitch, v3.c

autocorrelation

When a segment of a signal is correlated with itself, the


distance (-=Lag_time_in_samples) between the positions of
the maximum and the second maximum correlation is
defined as the fundamental period (pitch) of the signal.
Auto
correlation
R(j)

Rthe_max (j1)
Rsecond_max (j2)

Lag Time j
in samples

j1=0

j2
Ch4. pitch, v3.c

Then the fundamental


frequency can be calculated
as:
Then the fundamental frequency can
be calculated as:
f0

1
1

Lag_time_in_samples j2 j1

1
sampling _ frequency

Lag _ time _ in _ samples sampling _ priod


j2 j1

Usually =0, because is at .

sampling _ frequency
f0
j2
Ch4. pitch, v3.c

Modified Auto-Correlation Method:


Auto-Correlation Method enhanced by Center
clipping
x(n) CL , x(n) CL
It will give more

y (n ) clc x (n ) 0
, x ( n ) CL
accurate result because
x(n) C , x(n) C
L
L

higher frequency signals


will not interfere with
the result

R' (m)

N 1 m

y (n) y (n m), 0 m M
n 0

X(n)
CL
CL
Cut(remove)
the middle
part

Typical CL
n
=1/4 peakto-peak of X

y(n) =clc(x)

Ch4. pitch, v3.c

10

Finding
pitch
by center
clipping

X(n)
Y(n)=
Center
Clipped

In R(m) auto
correlation of
x(n), it is not R(m)
easy to pick
peaks
In R(m), auto
correlation of R(m)
clipped signal
y(n)=clc{x(n)}
, peaks are
easy to pick

T1

T2

T3

T=mean(T1,T2,T3)=
Period=1/(pitch_frequency)
Ch4. pitch, v3.c

11

The MACF (Modified Autocorrelation


function) algorithm

Ch4. pitch, v3.c

12

Example
For each
frame, find a
X(n)
pitch.
Plot pitch
against time
(blue), you
can see the
Pitch (n)
frequency
pitch profile

time

Time n (frame)
Ch4. pitch, v3.c

13

Class exercise 4.2


x=[1 3 7 2 1 9 3 1 8 ], If Fs= sampling
frequency= 1Hz.
(a) Find pitch of this signal x using ACF
(Autocorrelation function) .
(b) Repeat above of if Fs = 8KHz

Ch4. pitch, v3.c

14

Method 2:
Normalized cross correlation function NCCF
method [Verteletskaya 2009 ]
N m 1

NCCF ( m )

x (n ) x(n m)

n 0
N m 1

n 0

x 2 (n)

N m 1

x (n m)

, 0 m M0

n 0

Ch4. pitch, v3.c

15

Method 3:
Average Magnitude Difference Function
(AMDF) Method [Verteletskaya 2009

An intuitive method, just pick the peaks


and find the period

1
Dx ( m )
N

N 1 m

x(m) x(n m) ,
n 0

peaks

0 m M0

Find peaks in D, the


estimated period is the
average gaps between
two neighboring ve
peaks
Ch4. pitch, v3.c

16

Method 4:
Cepstrum Pitch Determination (CPD)

[Verteletskaya 2009 ]

s ( n ) e( n ) h ( n )
s ( w) E ( w) H ( w)
F 1 log S ( w)

F 1 log E ( w) F 1 log H ( w)
1
C (m)
N

N 1

S (k ) e

2
mk
N

Peak at Q, Pitch =1/0.006=


166Hz.

k 0

N 1

C(k) log S(n) e


n 0

2
nk
N

The problem : For human voice, the peak may be the result of glottal excitation.

Ch4. pitch, v3.c

17

For human voice pitch


detection (or recognition )
We must study its structure of the
vocal system and find out how to get
the accurate answer.
vocal system has 2 elements
Glottal excitation (no use for pitch
measurement)
Vocal tract filter
Use liftering to remove glottal excitation
before we use the spectrum of the vocal
tract filter for pitch extraction.
Ch4. pitch, v3.c

18

Cepstrum of speech
A new word by reversing the first 4 letters of
spectrum cepstrum.
It is the spectrum of a spectrum of a signal
Too many ripples in the
Why we need this?
Answer: remove the ripples
of the spectrum caused by
glottal excitation.
Fourier
Transform

Speech signal x

spectrum caused by vocal


cord vibrations.
But we are more interested in
the speech envelope for
recognition and reproduction

Spectrum
Ch4. pitch, v3.c
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf

of x

19

Liftering method: Select the higher


and lower samples
Signal X(n)

Cepstrum=
C(n)=fft|(log|fft(x(n))|)|

Quefrency is in time
domain (in second)
So Higher
Quefrency lower
frequency

Select high time


liftering, select C_high
(lower
frequency):glottal
excitation
Select low time
liftering,
Select C_low (higher
frequency) :Vocal
tract filter response

Ch4. pitch, v3.c

20

Recover Glottal excitation and


vocal track spectrum
Spectrum of glottal excitation

Cepstrum of glottal excitation

C_high
For
Glottal
excitation

Frequency
Spectrum of vocal track filter

C_high
Cepstrum of vocal track
For
Vocal track
Frequency
quefrency (sample index)
This peak may be the pitch period:
This smoothed vocal track spectrum
can be used to find pitch
For more information see :
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
Ch4. pitch, v3.c

21

Measure pitch of musical instruments


Example: Find pitch of Oboe A4 sound
http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wav

A4_Oboe
Spectrogram

Ch4. pitch, v3.c

22

Example: Find pitch of Oboe A4 sound


http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/A4_oboe.wav
http://www.cse.cuhk.edu.hk/%7Ekhwong/www2/cmsc5707/demo_ceps_note_v3.zip

The first peak of the cepstrum (in Quefrency)


time=0.002268(1/time)=F1=440.91Hz is the pitch, it has
the strongest energy

Input:
Oboe A4
X(n)

Fourier Transform
X(w)=fft(x)
Cepstrum
C(n)=fft|(log|fft(x(n))|)|
From range 200
To 900 Hz
Cepstrum C(n)
All range, around
From 30 to Hz

Hz

900Hz
1/900=1.11x10^-3

Found two Harmonics 440, 220Hz


This axis is in x10^-3

The second peak:


Ch4. pitch, v3.c
time=0.004535(1/time)=F2=220.507

200Hz
23
1/200=5x10^-3

Summary
Methods of pitch extraction have
been studied.
Cepstrum and its use for pitch
extraction is discussed.

Ch4. pitch, v3.c

24

References
[Naotoshi Seo 2007] Project: Pitch Detection, ]
http://note.sonots.com/SciSoftware/Pitch.html#ke283f3a
[Verteletskaya 2009 ] E. Verteletskaya, B. imk,
Performance Evaluation of Pitch Detection Algorithms,
http://access.feld.cvut.cz/view.php?
cisloclanku=2009060001
[Rabiner1976] Rabiner, L.; Cheng, M.; Rosenberg, A.;
McGonegal, C." A comparative performance study of
several pitch detection algorithms",IEEE Transactions on
Acoustics, Speech and Signal Processing, Volume: 24,
Issue:5 page(s): 399 - 418, Oct 1976

Ch4. pitch, v3.c

25

Appendix

Ch4. pitch, v3.c

26

Music Frequency table


http://wc.pima.edu/~manelson/MUS%20102/MIDI%20tunings%20per%20note.jpg

Ch4. pitch, v3.c

27

Music frequency table

% source : http://www.angelfire.com/in2/yala/t4scales.htm

Ch4. pitch, v3.c

28

Autocorrelation
In signal processing, given a signal
f(t), the continuous autocorrelation is
the continuous cross-correlation of f(t)
with itself, at lag , and is defined as:

R f ( ) f * ( ) f ( )

f (t ) f * (t )dt f (t ) f * (t )dt

In discrete system, autocorrelation R


at lag j for signal is defined as:
R( j )

( x )( x

n j

)
Ch4. pitch, v3.c

29

Anwer4.1: Exercise 4.1


First, what is auto-correlation?

%matlab code
x=[1 5 7 1 4 8 6 2 4 9 3 ]'
auto_corr_x=xcorr(x) %auto- X[t]
correlation
t
figure(1), clf
Auto_correlation(x[t])
subplot(2,1,1),plot(x)
grid on, grid(gca,'minor'), hold on
subplot(2,1,2),plot(auto_corr_x)
grid on, grid(gca,'minor')
We only look at positive n
Exercise:
Gap between two peaks is 4,
so period of X is around 4
Show the steps of

calculation

R(m)

N 1 m

Ans: [302 214 142 183 194 116


65v3.c 88
Ch4. pitch,

x(n) x(n m), 0 m M


n 0

70

24

0]

30

Answer 4.2 for exercise 4.2

It is using MACF, you can use ACF, and the result for the pitch
found is the same for this example.

Question: x=[1 3 7 2 1 9 3 1 8 ], sampling at 1Hz.Find pitch of this signal x using


MACF (Modified Autocorrelation function) .
%%%%%%%%%%%%%%Answer: %%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%
orginal_x =
1
3
7
2
1
9
3
1
8
x =centered_wave =orginal_x-mean_x =
-2.8889 -0.8889 3.1111 -1.8889 -2.8889 5.1111 -0.8889 -2.8889 4.1111
cl=center clipped range=
2
y =center clipped signal=
-2.8889
0 3.1111
0 -2.8889 5.1111
0 -2.8889 4.1111
(a) if the sampling frequency Fs = 1KHz
>> Answer: from the autocorrelation result of y in the figure, we can see that the
distance between 2 peaks is 3, so pitch is 1/3 Hz, since the sampling is 1 Hz..

Ch4. pitch, v3.c

31

Answer 4.2: Class exercise


4.2
R=[ 24.3333, 9.6667,
8.2222, 16.3333,
6.5556, 4.5556, ,6.8889,
2.7778, 0.8889]
2nd diagram, R(+ve only) ,
pick 2 peaks, Period is 3,
frequency =1/3 hz
(b) if FS = 8KHz
Answer: If the sampling
frequency is Fs=8KHz,
sampling period is
dt=1/Fs=(1/8)ms , the
period of x is 3 units,
therefore the actual time is
3*dt= 3*(1/8)ms. The
frequency of x is 1/dt=(8/3)
KHz

Ch4. pitch, v3.c

32

Matlab

%Ver2, MACF (Modified Autocorrelation function)using center


clipping
clear
%select one of the followings
%real_data=1 %1 or 0
real_data=0
if real_data==1
%use real sound
%[x,fs]=wavread ('d:\0music\sounds\violin3.wav');
[orginal_x,fs]=wavread ('violin3.wav');
x=x(10000:11000);
else
%use test data
%x=[1 2 5 6 7 6 1 0 4 3 4 8 6 7 3 2 4 9 3 ]
orginal_x=[1 3 7 2 1 9 3 1 8 ]
fs=1 %assume frquecy is 1Hz
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% test
x=orginal_x-mean(orginal_x)
n=length(x)
maxx=max(x)
minx=min(x)
dd=maxx-minx
figure(1)
clf
plot(x)
%pause
%center clipping algo for pitch extraction
if real_data==1
cl=dd/4000
else
cl=dd/4 %center clippped "cl" length is 1/4 of total peak-to_peak
span
pause
end

%assume the signal x is voltage against time


%center clip means set those signals with levels within
the clipped regions
%center = mean voltage level of the whole signal
%positive peak = maxim,um of the signal voltage
%negative peak = minimum of the signal voltage
%center clip regions are:(i) from center to 1/2 of
center_to_positive peak
%
(ii) from center to -1/2 from
center_to_negative peak
for t=1:n
if x(t)<cl & x(t) > -1*cl %those within center clipped
region set to 0
y(t)=0;
else
y(t)=x(t);
end;
end ;
auto_corr_y=xcorr(y) %auto correlation
figure(2)
clf
subplot(3,1,1),plot(x)
ylabel('x=centered wave')
subplot(3,1,2),plot(y)
ylabel('y=center clipped wave')
hold on
subplot(3,1,3),plot(auto_corr_y)
ylabel('auto correlation of y')
xlabel('time ')
max_list=max(y)
fs
'orginal_x ' , orginal_x
'x =centered_wave =orginal_x-mean_x ' , x
'cl=center clipped range', cl
'y =center clipped signal' , y

Ch4. pitch, v3.c

33

Вам также может понравиться