Вы находитесь на странице: 1из 18

Pitch detection 1

In this chapter will we describe a pitch detector or pitch extractor. The goal of a pitch de-
tector is to find the fundamental frequency f0 of a speech signal. There are several different
techniques for finding this fundamental frequency. In this chapter a pitch detector will be
chosen, based on test of some different pitch detections techniques.

1.1 Introduction

A pitch detector is used to detect the fundamental frequency ( f0 ), called the pitch, in a sig-
nal. In the project "Bandwidth expansion of narrowband signals" a pitch detector would be
useful to find f0 of speech signals, due to the fact that telephone signals are limited in the
frequency interval 300-3400 Hz. When extending the bandwidth of a signal from a sam-
pling frequency of 8000Hz to 16000Hz, it would also be useful to create the lower frequen-
cies of a phone-signal, say 50-300 Hz. Because the human pitch lies in the interval 80-350
Hz(table 1.1.1 on page II), where the pitch for men is normally around 150 Hz, for women
around 250 Hz and children even higher frequencies, the pitch is needed to construct this
part of the speech signal.

Some of the most used detectors are; Energy based, Cepstral based, zero crossing, pitch
in the difference function and autocorrelation based. In this chapter there will be tested
three different pitch detectors, The "On-The-Fly", a Cepstral based and an auto correlation
based.

"On-The-Fly" [1] is time based an algorithm which is developed to estimate the pitch of
a signal, divided into frames. This paper is a description of this algorithm, how its imple-
mented and an overall test of its ability to estimate a pitch of a speech signal.

The Cepstral pitch detector is based on the log of the fourier transformation. This algo-
rithm is only shortly described and tested in this paper.

The Auto correlation pitch detector [5] is described and tested in this chapter.

I
CHAPTER 1. PITCH DETECTION

1.1.1 The test signal

The pitch detector must be able to estimate a pitch of signals in a limited frequency range.
The speech signal used throughout this paper, is spoken by a man with a pitch around 150Hz
20Hz, tested with a pitch analyzer program1 . If not mentioned otherwise figures illustrate
a single frame (20ms) and always the same frame, so that the figures can be compared. The
signal is sampled at 16000Hz and passed though a telefilter, which is a 10the order band pass
filter working at 300-3400Hz. All figures in this paper are illustrated with the original signal
and the band passed signal. In this way it is possible to compare the difference between
these.

f0 min(Hz) f0 max(Hz)
men 80 200
woman 150 350

Table 1.1: Range of human speech

Also notice that the implemented algorithms not only have been tested with the above de-
scribed signal. This signal is only used to gain understanding of the algorithms.

1.2 On-The-Fly pitch estimation

1.2.1 Common time domain pitch detectors

Many time-domain pitch detectors use the autocorrelation (equation 1.1) method to estimate
the pitch, which is also among the oldest methods. Another method to find the pitch is the
difference function (equation 1.2), which has been shown by Yin [2] and Chowdhury [3] to
be more effective than the autocorrelation method. The algorithm "On-The-Fly", is a pitch-
detector is based on the difference function (dt ).
W
rt () = x j x j+ (1.1)
j=1

W
dt () = (x j x j+r )2 (1.2)
j=1

, where W is the length of the window. t indicates this is a time-domain calculation.

1.2.2 Unifying the difference function

The difference function is an effective algorithm to detect the f0 of a signal. Here f0 should
be at the global minimum of the difference function. This minimum is although difficult to
1 Pitch analyzer v1.1 - written by Franz Josef Elmer

II
1.2. ON-THE-FLY PITCH ESTIMATION

locate, because the difference function of a random signal decrease with a slope of a straight
line, as shown on figure 1.1.

Differencen function of the signal


45
All pass signal
40 Band limited signal

35

30

Amplitude
25

20

15

10

0
0 50 100 150 200 250 300 350
Samples difference index

Figure 1.1: Shows the difference function taken of both the all-passed signal
and the band limited signal. The figure shows that both signals approximately
converge with a straight line. Also notice the difference between the local
minima. The local minimas of the band limited signal are generally much
smaller than the all pass signal.

To locate the actual global minima, the difference function output needs to be unified. So if
the time delay is equal to the period of the pitch (P), the converging slope (see figure 1.1)
can be seen as a straight line of a unity slope. Correlating xt+P with xt will result in a global
maximum at the pitch P. Therefore by considering the covariance matrix of the random
vector X = [x j xt + ]T we get that:

cx () = E[(x m)(x m)T ] (1.3)

Considering this as WSS, two orthogonal eigenvectors can be found from the covariance
matrix, by decomposing it using diagonalization:

1 () 1 () + 2 () 1 () 2 ()
     
1 1 0 1 1
Cx () = = (1.4)
1 1 0 2 () 1 1 1 () 2 () 1 () + 2 ()

By assuming uniform distributions and equal means, the eigenvalues can be written like
this:
W
1 () (x j + x j+)2 (1.5)
j=1

W
2 () (x j x j+)2 (1.6)
j=1

III
CHAPTER 1. PITCH DETECTION

Because the eigen-values and vectors describes new basis for the covariance matrix, a uni-
fied difference function can be derived. Thinking of eigenvector 1 and eigenvalue 1 being a
1D basis of the 2D space, the corresponding eigenvector 2 and eigenvalue 2 can be thought
of as the error off this basis and vice versa. Therefore the unified difference function can be
written as the following:
W
(x j x j+r )2
2 () j=1
dt () = = (1.7)
1 () + 2 () W
2 x2j + x2j+r
j=1

The unified difference function is plotted figure 1.2.

Unified differencen function of the signal


1
All pass signal
0.8
Amplitude

0.6

0.4

0.2

0
0 50 100 150 200 250 300 350

1
Band limited signal
0.8
Amplitude

0.6

0.4

0.2

0
0 50 100 150 200 250 300 350
Samples difference index

Figure 1.2: Here the two signals are unified. Now the difference min the local
minima are really shown.

By Yin the the pitch is found at the global minima of the unified difference function (dt ). This
is found as a sample index and is equal to P. f0 is when calculated by the function:

fs
f0 = (1.8)
P
, where fs is the sampling frequency (fs).

IV
1.2. ON-THE-FLY PITCH ESTIMATION

1.2.3 Contradiction in results

By taking further notice to the two figures in figure 1.2, we find a contradiction. The two
signals doesnt seem to have the same global minimum, eventhough f0 of the two signals
should be equal. In the article [1] it does not say anything about "On-The-Fly" was devel-
oped for these types of band limited signals. But it is developed to find the actual pitch of
a signal, due to the fact, that the authors claim that the global minimum of the difference
function is not always equal to the pitch period. This is actually the case for our band limited
signal, where the situation is as follows:

The all-pass signal has a global minimum around index 95, which is equal to a pitch
of about 170 Hz.

The band limited signal has a global minimum around index 47, which is equal to a
pitch of about 340 Hz.

Surely one of these results must be wrong. Considering the test signal originates from a
man, with a pitch around 150 Hz, the result of the band limited signal must be wrong.

1.2.4 Idea of the "On-The-Fly"

The fundamental idea of the "On-The-Fly" algorithm is to find other candidates for f0 . This is

done by locating a number of the smallest minimas in dt , where the overall global minimum,
is still by offset selected to as f0 . Notice that at all time, the global minimum is still thought
of as being some harmonic of the actual pitch. If the global minimum is not equal to f0 , the
actual pitch should be found within some limits of this minimum, so it would be a candidate
of a harmonic frequency to the global minima.

The surrounding local minima are therefore to be tested in 2 ways:

1. A minimum differs only by a significant amount (threshold stage 1).


A small threshold (ATh1) is to be used in testing the difference in the minimas size. If
the difference from the found pitch-sample is within the limit, the one with the smaller
time period is considered the actual pitch. This has to be accompanied with another
larger threshold (TTh1) which is used to test the cohesion between the candidate har-
monic of the pitch minima and the new candidates. If the minimum does not lie with
the threshold of the harmonic, this sample should be discarded.

2. A minima differs only a relatively significant amount (threshold stage 2).


A larger threshold (ATh2) is to be used in testing the difference in the minimas size. If
the difference from the found pitch-sample is within the limit, the one with the smaller
time period is considered the actual pitch. This has to be accompanied with another
smaller threshold (TTh2) which is used to test the cohesion between the candidate
harmonic of the pitch minima and the new candidates. If the minimum does not lie
with the threshold of the harmonic, this sample should be discarded.

V
CHAPTER 1. PITCH DETECTION

Figure 1.3 is illustrating the 2 test conditions where the threshold stages are marked.

Unified differencen function of the signal with smallest mi


nimas
1
All pass signal
0.8 Minimas

Amplitude
0.6

0.4

0.2
TTh1 ATh1
0
0 50 100 150 200 250 300 350

1
Band limited signal
0.8 Minimas
Amplitude

0.6

0.4

0.2
TTh2 ATh2
0
0 50 100 150 200 250 300 350
Samples difference index

Figure 1.3: Same as figure 1.2, though here with the 5 smallest minimas
marked with green lines. The threshold stage 1, ATh1 and TTh1, is shown
in the upper figure, while threshold stage 2, ATh2 and TTh2, is shown in the
lower figure. Of course both thresholds need to be applied to each signal when
testing

1.3 The On-The-Fly algorithm and implemantation

The algorithm to On-The-Fly is relatively easy and is in figure 1.4 shown in a block diagram.

A B C
Calculate the unified Locate a number Do parabolic
difference function (dt ) (typically 5) of the interpolation of the
of a frame. smallest minima of dt. possible candidates.

D E F
Do threshold stage 1, Do threshold stage 2, Result of threshold stage
with the thresholds ATh1 with the thresholds ATh2 2 is the found pitch
and TTh1. and TTh2. (fundamental frequency)

Figure 1.4: The diagram of the On-The-Fly algorithm


A Here dt is calculated.

VI
1.3. THE ON-THE-FLY ALGORITHM AND IMPLEMANTATION


B Locating a number of the smallest local minima of dt can be a challenging task. First a
minimum must be defined. But can 2 local minima be separated with 1 or only a few
samples? If not, where should the limit be?
An example shows that surely minimas with only a few samples between them would
not be harmonics of each other. Typical speech signals are samples at fs=16000 Hz.
With the maximal pitch of 500 Hz, there would still have to be minimum (16000Hz/500Hz)

32 samples between minimas of the dt . But still a local minimum might occur outside
the harmonic and thereby eliminate the possibility of locating the actual harmonic to
be found.

In the implementation, an algorithm to find a number of minimas of the dt was devel-

oped. It works in the following way; First it locates all minimas of dt . Then any minima
larger that 0.4 is deleted from data, to assure the minimas could actually be the pitch.
Next the global minimum among all the minima is found and stored in ti , where ti is
the index to the minima. If any minima exists within 10 samples of the found global
minimum, these minimas are deleted from data along with the found minima, and the
procedure is run again until there is no more data or the number of minimas wanted
has been found.
The matlab code for this is found in section 1.8.2.

C The parabolic interpolation is needed to reduce quantization error in the candidate


pitch values. An example, say fs=16000 Hz, if a minima is the pitch and located at
sample 60, it equals a pitch of 267Hz. If the minima were found at sample 59, it would
equal a pitch of 271Hz. No pitch is possible to get between 267-271Hz. Further this
error gets higher, as the pitch rises. The parabolic interpolation solves this problem,
and is implemented as a simple curve fitting problem.
The matlab code for this is found in section 1.8.3.

D Among the candidate minimas (ti ), we need to know if there are a harmonics of the
global minimum is said to be the pitch. This is decided from the formula:

N tg < T T h1 , where N={2, 3, .., 6}

(1.9)
ti

, where tg is the index of the overall global minima. The threshold TTh1 here specifies
how much a harmonic can maximal differs from the global minimum. Those ti for
which at least one of the possible N-values satisfy the threshold are then tested against
threshold ATh1. ATh1 is a threshold of the difference in the minima value between tg
and ti . If any ti are within the 2 threshold values, tg is changed with the one, which
have the smaller timeperiod (if not tg itself). In this threshold set, TTh1 is set to 0.2 and
ATh1 is set to 0.07.
The matlab code for this is found in section 1.8.4.

E Bullet D and E are exactly the same, just with 2 different threshold sets. In this thresh-
old set, TTh2 is set to 0.05 and ATh2 is set to 0.2.

The fully implemented "On-The-Fly" algorithm can be seen in section 1.8.2.

VII
CHAPTER 1. PITCH DETECTION

1.3.1 Testresults of "On-The-Fly"

Though this paper a single frame has been used to show how the algorithm works. In figure
1.5 we see the final result of the algorithm on this frame.

Pitch found in the difference function by OnTheFly


1
All pass signal
0.8 f (pitch=165.84Hz)
0
Amplitude

0.6

0.4

0.2

0
0 50 100 150 200 250 300 350

1
Band limited signal
0.8 f (pitch=338.51Hz)
0
Amplitude

0.6

0.4

0.2

0
0 50 100 150 200 250 300 350
Samples difference index

Figure 1.5: The pitch found in a single frame by the "On The Fly"-algorithm.

The o in each figure indicates where the global minima of dt (offset pitch)
were found.

The figure clearly shows that the pitch is not the same in this frame. The pitch should be
around 150Hz, which is the result from the original signal. From the band passed signal a
pitch was not found anywhere near the actual pitch of the frame.
This could although be a insecurity in the algorithm. So in figure 1.6 150 frame of the signal
are shown.

VIII
1.4. CEPSTRAL BASED PITCH DETECTOR

OnTheFly pitch estimation of original signal


1 500

250

Amplitude of signal

Frequency of pitch
0 0

250
Org. speech signal
Pitch frequency
1 500
0 0.5 1 1.5 2 2.5
4
x 10

OnTheFly pitch estimation of bandpassed signal


1 500

250
Amplitude of signal

Frequency of pitch
0 0

250
Bandpassed speech signal
Pitch frequency
1 500
0 0.5 1 1.5 2 2.5
Sample index x 10
4

Figure 1.6: The pitch found by the "On The Fly"-algorithm.

In this figure we see that the found pitch from the original speech signal is far more conse-
quente, than the one found in the band passed signal. From this and the fact that the signal is
man with a pitch around 150Hz, we can conclude that "On The Fly" may be a good at detect-
ing pitch in full frequencyranged signals, but not in band passed signals2 . This algorithm is
therefore not useful in the project "Bandwidth expansion of narrowband signals".

1.4 Cepstral based pitch detector

The theory behind the cepstral detector is that the fourier transform of a pitched signal usu-
ally have a number of regularly peaks, who is representing the harmonic spectrum. When
log magnitude of a spectrum is taken, these peaks are reduced (their amplitude brought into
a usable scale). The result is a periodic waveform in the frequency domain, where the pe-
riod is related to the fundamental frequency of the original signal. This means that a fourier
transformation of this waveform has a peak representing the fundamental frequency.
This method has only shortly been tested in matlab, with a non satisfying result. Due to the
test, and time limitation, this pitch detector will not be explained further.

2 A short description the algorithms flaws in detecting pitch in band limited signal can be found in section

1.8.1

IX
CHAPTER 1. PITCH DETECTION

1.5 Pitch detection using the autocorrelation

When calculating the autocorrelation you get a lot of information of the speech signal. One
information is the pitch period (the fundamental frequency). To make the speech signal
closely approximate a periodic impulse train we must use some kind of spectrum flattening.
To do this we have chosen to use "Center clipping spectrum flattener". After the Center
clipping the autocorrelation is calculated and the fundamental frequency is extracted. An
overview of the system is shown in figure 1.7.

Speech Frame x(n)

LPF
Fc = 900 Hz

Cl = 30% of max{x(n)}

Center clip x(n)

Compare AC
Rn(k) for
Fs/350 <= K <= Fs/80

Compute
R = max{Rn(k)}

Compute
R = Rn(0)

Frame unvoiced No Test yes Pitch period


output = 0 R >= 30% of Rn(0) f0 = K + Fs/350

Figure 1.7: Overview of the autocorrelation pitch detector

1.5.1 Center clipping spectrum flattener

The meaning of center clipping spectrum flattener is to make a threshold (cl ) so only the
high and low pitches are saved. I our case (cl ) is chosen to 30% of the max amplitude. the
center clipping of a speech signal is shown in figure 1.8 on page XI

X
1.5. PITCH DETECTION USING THE AUTOCORRELATION

Input speech
A max

+C L

Time

-C L

Center clipped speech

Time

Figure 1.8: Center clipping

The center clipping is made due to the following definition:

CL =% of Amax (e.g. 30%)

if x(n) > CL then y(n)=x(n)- CL

if x(n) <= CL then y(n) = 0

1.5.2 Autocorrelation

The autocorrelation is a correlation of a variable with itself over time. The autocorrelation
can be calculated using the equation shown in equation 1.10

N.k1
R(k) = x(m)x(m + k) (1.10)
m=0

In this project we have used the Matlab function "xcorr" to calculate the autocorrelation.

XI
CHAPTER 1. PITCH DETECTION

1.5.3 Median smoothing

When we have calculated the pitch detection, we can see that there are some outliers. To
minimize these outliers, we have chosen to use a median smooth (Median Filter). The effect
of this median filter is shown in figure 1.9. The window length (n) have been tested with
n=3 and n=5. The window length have been chosen to n=5, because the pitch detector then
eliminate the outliers in the noisy part. One of the problem using the Median filter with n=5
is that it smooths over five windows. I our project that means that we will have to make a
delay in our system for 50 ms. The delay is calculated as:

number o f windows the length o f the window window overlap = 5 20ms 0, 5 = 50ms
(1.11)

In our project we have chosen not to deal with this problem, cause we are not going to
implement it as an real time system. If we should implement it as a real time system, we
would have to predict the pitch five windows ahead.

Pitch detection with median smooth n=3


1 500

Frequency of pitch
Amplitude of signal

0.5 250

0 0
Speech signal
0.5 median smooth 250
pitch detection
1 500
0 0.5 1 1.5 2 2.5
4
x 10
with median smooth n=5
1 500
Amplitude of signal

Frequency of pitch

0.5 250

0 0
Speech signal
0.5 median smooth 250
pitch detection
1 500
0 0.5 1 1.5 2 2.5
4
x 10
with median smooth n=7
1 500
Amplitude of signal

Frequency of pitch

0.5 250

0 0
Speech signal
0.5 median smooth 250
pitch detection
1 500
0
0 0.5
0.5 1
1 1.5
1.5 2
2 2.5
2.5
4
Sample index x
x 10
10
4

Figure 1.9: Median Smoothing

If the window length is set to n=7, shown in figure 1.9, the result seen perfect. The prob-
lem with such a long window is that the algorithm sometime finds a wrong fundamental
frequency when there are "speech pause".

XII
1.6. OVERALL TEST AND RESULTS

1.5.4 Test

The algorithm has been tested against result from the program "pitch analyzer 1.1", and has
shown that the auto correlation pitch detector is very accurate.

1.6 Overall test and results

Thought this paper a speech signal has been used as a testsignal. But since pitch of a random
testsignal isnt known, the algorithms has also been tested up againt signals with a known
pitch. Figure 1.10 shows the 2 implemented pitch detectors, estimating the pitch of a linear
swept frequency signal, which spectogram is shown in the top figure.

The spectogram of a Linear sweptfrequency signal


8000

7000

6000

5000
Frequency

4000

3000

2000

1000

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Time

Pitch estimation of a Linear sweptfrequency signal (801000Hz)


500
1

Pitch frequency (Hz)


0.5
Amplitude

0 0

Linear sweptfrequency signal (Amplitude)


0.5 OnTheFly (pitch freq)
FastAutocorr (pitch freq)
FastAutocorr shifted 5 windows < (pitch freq)
1 Actual pitch of signal (pitch freq)
500
0 2000 4000 6000 8000 10000 12000 14000 16000
Samples

Figure 1.10: The top figure shows a spectogram of the linear swept frequency
signal, starting at 80 Hz and ending in 1000 Hz. The buttom figure shows
the 2 algorithms estimation of the pitch of the linear swept frequency signal.
NOTICE the pitch detectors are constructed to detect the pitch in the interval
80-350 Hz.

As the figure shows, both methods are equally good to estimate the pitch within their oper-
ation limits 80-350 Hz. Outside these limits the figure shows that both methods quite poor
to floor the pitch frequency, which would be the preferred outcome.
Further the Fast auto-correlation method is delayed 5 frames (50ms), because it uses a
median-smooth over 5 estimated pitches. This cant be compensated for and the only so-

XIII
CHAPTER 1. PITCH DETECTION

lution to this, is therefore to remove the smoothing, which will cause more spikes (see fig-
ure 1.9 on page XII).

Another important feature are the influence of noise in signals. This is tested with a 200 Hz
sinus signal, with a signal/noise ratio shifting from 100% to 0% in a timeinterval, shown in
figure 1.11.

Pitch estimation of a 200 Hz sinus signal with decreasing signal/noise ratio


500

1
400

0.8
300

0.6

200
0.4

Pitch frequency (Hz)


100
0.2
Amplitude

0 0

0.2
100

0.4
200

0.6

300
0.8 Signal with noise (Amplitude)
Signal/noise (%)
OnTheFly (pitch freq) 400
1
FastAutocorr (pitch freq)
FastAutocorr shifted 5 windows < (pitch freq)
500
0 2000 4000 6000 8000 10000 12000 14000 16000
Samples

Figure 1.11: Test of the influence of noise in a signal. Here the signal/noise
ratio reises with time.

The figure shows that the Fast-auto-correlation method is very robust against noisy signals,
while the On-The-Fly breaks down very early in the test. Because speech signals arent just
a single sinus signal, such signals may more or less be similar to a noisy sinus. Therefore the
Fast-auto-correlation would be preferred.

1.7 Conclusion

Between the 2 implemented methods of pitch detection, only the autocorrelation method are
usable in the project "Bandwidth expansion of narrowband signals". It was shown (figure
1.6) that the "On-The-Fly" algorithm could not estimate a reliable pitch from band limited
signals, which is a must in the project. The Fast-auto-correlation method, on the contrary,
had no problem with either type of signal and is robust againt noisy signals. The preferred
pitch detector is therefore the Fast-auto-correlation method.

XIV
1.8. APPENDIX

1.8 Appendix

1.8.1 On-The-Flys flaws when used on band limited signals

Tests of On-The-Fly algorithm showed that it was unable to estimate an accurate pitch of a
band limited signal. So why is that?
The problem is with the lower cut-off frequency of the band pass filter. The On-The-Fly
algorithm uses these lower frequencies harmonics to estimate the pitch. Because it is time
based and only used the minimas of the difference equation, the higher order harmonics
get more influence, than the lower order harmonics (see figure 1.2). Next the key idea is to
be the threshold stages is to select the candidate harmonic with the smallest time period,
resulting in a higher pitch. Comparing this key idea with the figure, the algorithm will most
likely select a higher pitch frequency, than it should.

1.8.2 Matlab implementation of the "On-The-Fly" algorithm


1 % This f u n c t i o n c a l c u l a t e s the fundamental frequency of a frame of data
2 % w i t h t h e OnTheFly a l g o r i t h m .
3
4 % I n p u t
5 % data : Frame o f d a t a f o r w i t h p i t c h i s t o be c a l c u l a t e d
6 % fs : Sampling frequency of si g n a l
7
8 % O u t p u t
9 % f0 : The c a l c u l a t e d f u n d a m e n t a l f r e q u e n c y o f t h e i n p u t d a t a
10 % 0 i s r e t u r n e d i f no p i t c h i s f o u n d
11
12 f u n c t i o n [ f 0 ] = OnTheFly ( d a t a , f s )
13
14 frameLength = len gt h ( data ) ; % Framelength
15 curveFitSamples = 3 ; % S a m p l es f ro m minima t o l e f t and r i g h t
16
17 % Calculate difference function
18 d = z e r o s ( 1 , f r am eLen g t h 1) ;
19 f o r i = 1 : f r am eLen g t h 1
20 d ( i ) = sum ( ( d a t a ( 1 : f r am eLen g t h i ) d a t a ( i + 1 : f r a m e L e n g t h ) ) . ^ 2 ) ;
21 d ( i ) = d ( i ) / ( 2 sum ( d a t a ( 1 : f r am eLen g t h i ) . ^ 2 + d a t a ( i + 1 : f r a m e L e n g t h ) . ^ 2 ) ) ;
22 end
23
24 % Fi n d g l o b a l minima and 4 l o c a l minima ( s m a l l e s t minima ) w i t h minimum 1 0
25 % s a m p l e s b e t w e e n m i n i m a s . Minimas can h a ve a maximum v a l u e o f 0 . 4
26 [ m i n i m as minCount ] = f i n d m i n ( d , 5 , 1 0 , 0 . 4 ) ;
27 %m i n s ( end : 1 : 2 , : ) = m i n s ( 2 : end , : ) ;
28 nMins = s i z e ( m i n i m as ) ;
29
30 % I f t o many l o c a l m i n i m a s a r e l o c a t e d by f i n d m i n , t h e s i g n a l i s l i k e l y t o
31 % be n o i s e , and t h e r e f o r e n o t t o be p r o c e s s e d . A l s o i f no m i n i m a s h a s b een
32 % f o u n d , t h e d a t a i s p r o p e r l y u n v o i c e d , and i s n o t t o be p r o c e s s e d .
33 i f minCount <= 2 0 & & nMins ( 1 ) > 0
34 % M i n i m i z a t i o n o f q u a t i z a t i o n e r r o r w i t h p a r a b l e c u r v e f i t t i n g f o r ea ch minima
35 f o r i = 1 : nMins ( 1 )
36 % xc o o d i n a t e s t o make c u r v e f i t t i n g f ro m
37 x = ( m i n i m as ( i , 1 )c u r v e F i t S a m p l e s ) : ( m i n i m as ( i , 1 ) + c u r v e F i t S a m p l e s ) ;
38
39 % Removes c o o r d i n a t e s o f x , wh i ch i s n o t w i t h i n t h e d a t a f r a m e
40 x ( f i n d ( x < 1 | x > ( f r am eLen g t h 1) ) ) = [ ] ;
41 y = d(x) ; % C o r r e s p o n d i n g v a l u e s o f xc o o r d i n a t e s
42 [ m i n i m as ( i , 1 ) m i n i m as ( i , 2 ) ] = p a r r e g ( x , y ) ; % C u r v e f i t t i n g and t o p p o i n t e x t r a c t i o n
43 end
44
45 % Thresholding stage 1
46 TTh1 = 0 . 2 ; ATh1 = 0 . 0 7 ; % Threshold s e t t i n g s
47 [ t g m i n s ] = f i n d f 0 ( TTh1 , ATh1 , m i n i m as ) ;
48
49 % Thresholding stage 2
50 TTh2 = 0 . 0 5 ; ATh2 = 0 . 2 ; % Threshold s e t t i n g s
51 [ t g m i n s ] = f i n d f 0 ( TTh2 , ATh2 , [ t g ; m i n i m as ] ) ;
52
53 f0 = f s / tg ( 1) ; % C a l c u l a t i o n of the fundamental frequency of the frame
54 else
55 f0 = 0 ; % Z ero i f f r a m e i s u n v o i c e d o r n o i s e !

XV
CHAPTER 1. PITCH DETECTION

56 end

Locating minimas

Main function in locating minimas of data:


1 % F u n c t i o n l o c a t e s nMins o f t h e s m a l l e s t m i n i m a s o f d a t a , wh ere m i n i m a s
2 % h a ve t o be s e p a r a t e d w i t h a t l e a s t s p r e d s a m p l e s . A l s o a minima can t
3 % e x c i d e maxMinValue !
4
5 % I n p u t
6 % data : Data t o l o c a t e m i n i m a s i n
7 % nMins : # o f minimas t o f i n d i n data ( i f p o s s i b l e )
8 % spred : Minimum # o f s a m p l e s b e t w e e n m i n i m a s l o c a t e d
9 % maxMinValue : Maximum v a l u e o f a minima
10
11 % O u t p u t
12 % ti : A nby 2 m a t r i x wh ere co l u m n 1 i s i n d e x e s o f minima and
13 % co l u m n 2 i s v a l u e s o f minima .
14 % m i n Co u n t : T o t a l number o f f o u n d minima i n d a t a
15
16 f u n c t i o n [ t i minCount ] = f i n d m i n ( d a t a , nMins , s p r e d , maxMinValue )
17
18 [ n m] = s i z e ( d a t a ) ; % Flips vector correct !
19 i f m == 1
20 data = data ;
21 end
22
23 m i n i m as = l o c a l m i n i m a ( d a t a ) ; % L o c a t e s ALL m i n i m a s i n d a t a a s i n d e x e s
24 m i n i m as ( : , 2 ) = d a t a ( m i n i m as ( : , 1 ) ) ; % Extends mins t o hold d a t a v a l u e s t o i n d e x e s o f minimas
25 minCount = l e n g t h ( m i n i m as ) ; % T o t a l number o f l o c a l m i n s f o u n d
26
27 i n d e x e s = f i n d ( m i n i m as ( : , 2 ) > = maxMinValue ) ;% Fi n d i n d e x e s o f minima t h a t e x c i d e s maxMinValue
28 m i n i m as ( i n d e x e s , : ) = [ ] ; % D e l e t e s m i n i m a s t h a t e x c i d e maxMinValue
29 [m n ] = s i z e ( m i n i m as ) ; % New s i z e o f m i n s m a t r i x
30
31 if m >= 1 % Only runs i f data are a v a i l a b l e
32 t i = z e r o s ( nMins , 2 ) ; % A l l o c a t e s number o f s m a l l e s t m i n i m a s wa n t ed
33 f o r i = 1 : nMins
34 [ minima mIndex ] = min ( m i n i m as ( : , 2 ) ) ;% F i n d s g l o b a l min among m i n i m a s
35 t i ( i , : ) = m i n i m as ( mIndex , : ) ; % C o p i e s f o u n d g l o b a l minima i n t r o new m a t r i c e
36 i n d e x e s = f i n d ( abs ( m i n i m as ( mIndex , 1 )m i n i m as ( : , 1 ) ) < s p r e d ) ; % Finds i n d e x e s o f minimas w i t h i n spred samples
o f f o u n d minima
37 m i n i m as ( [ mIndex i n d e x e s ] , : ) = [ ] ; % D e l e t e s minima w i t h i n s p r e d s a m p l e s o f f o u n d minima
38 i f l e n g t h ( m i n i m as ) < = 0 % Brea k l o o p i f no more d a t a
39 t i ( i + 1 : end , : ) = [ ] ; % Deletes used a l l o c a t e d space
40 break ;
41 end
42 end
43 else
44 t i = zeros ( 0 ,2 ) ; % A l l o c a t e s a 0 by 2 m a t r i x ( t o a v o i d r u n t i m e e r r o r )
45 end

Sub function that locates all minimas of data:


1 % m i n s = L o ca l Mi n i m a ( x )
2 %
3 % finds positions of a ll s t r i c t l o c a l minima i n i n p u t a r r a y
4
5 f u n c t i o n m i n s = Lo cal Mi n i m a ( x )
6
7 nPoints = length ( x ) ;
8 Mi d d l e = x ( 2 : ( n P o i n t s 1) ) ;
9 L e f t = x ( 1 : ( n P o i n t s 2) ) ;
10 Right = x ( 3 : nPoints ) ;
11 m i n s = 1 + f i n d ( Mi d d l e < L e f t & Mi d d l e < R i g h t ) ;
12
13 % W r i t t e n by Ken n et h D . H a r r i s
14 % T h i s s o f t w a r e i s r e l e a s e d u n d e r t h e GNU GPL
15 % www . gnu . o rg / c o p y l e f t / g p l . h t m l
16 % any comments , o r i f you make any e x t e n s i o n s
17 % l e t me know a t h a rri s@ a xo n . r u t g e r s . edu

XVI
1.8. APPENDIX

1.8.3 Curvefitting function


1 % This fun cti on c a lc u l at e s the toppoint of a c u r v e f i t t e d parable .
2 f u n c t i o n [ xx yy ] = p a r r e g ( x , y )
3 %x : xk o o r d i n a t e s t o do c u r v e f i t t i n g a ro u n d
4 %y : yk o o r d i n a t e s t o do c u r v e f i t t i n g a ro u n d
5
6 [m n ] = size (y) ; % V e c t o r x h a s t o s t a n d ( a mby 1 v e c t o r )
7 if n ~= 1
8 y = y ;
9 end
10 [m n ] = size (x) ; % V e c t o r y h a s t o s t a n d ( a mby 1 v e c t o r )
11 if n ~= 1
12 x = x ;
13 end
14
15 % Cu rve f i t t i n g i n t h e f o rm Ax=b
16 X = [ ones ( s i z e ( x ) ) x x . ^ 2 ] ; % X values
17 a = X\ y ; % Least sqaures s o l ut io n of i n c l i n a t i o n
18 xx = a ( 2 ) / ( 2 a ( 3 ) ) ; % Calculates x of toppoint of parable
19 yy = [ 1 xx xx . ^ 2 ] a ; % Calculates t of toppoint of parable

1.8.4 Threshold stage function


1 % Function th at loc a tes the fundamental frequency of t i according to input
2 % t h r e s h o l d s TTh and ATh .
3
4 % I n p u t
5 % TTh : Harmonic t h r e s h o l d
6 % ATh : A m p l i t u d e t h r e s h o l d
7 % ti : A nby 2 m a t r i x [ i n d e x _ o f _ m i n i m a , v a l u e _ o f _ m i n i m a ]
8 % Row 1 MUST be t h e c u r r e n t p i c k e d f u n d a m e n t a l f r e q u e n c y minima
9
10 % O u t p u t
11 % tg : New minima o f t h e f u n d a m e n t a l f r e q u e n c y
12 % ti : Rem a i n i n g m i n i m a s
13
14 f u n c t i o n [ t g , t i ] = f i n d f 0 ( TTh , ATh , t i )
15 N = [2 3 4 5 6]; % Threshold c o n s t a n t s to t e s t f o r harmonics
16
17 tg = t i ( 1 , :) ; % E x t r a c t c u r r e n t f u n d a m e n t a l f r e q u e n c y o f minimas
18 ti (1 ,:) = [] ; % D e l e t e s t h e c u r r e n t f u n d a m e n t a l f r e q u e n c y o f m i n i m a s f ro m t
19 t i = sortrows( ti ,1) ; % S o r t s ro ws a c c o r d i n g t o i n d e x e s ( a s c e n d i n g )
20 t i = t i ( end : 1 : 1 , : ) ; % R e v e r s e s t i . . . s o r t m u st be d e s e n d i n g
21 index = [ ] ; % Must t o a l l o c a t e d t o a v o i d r u n t i m e w a r n i n g s
22
23 % T e s t f o r c a n d i d a t e s o f h a rm o n i c m i n i m a s t o t h e f u n d a m e n t a l f r e q u e n c y minima
24 f o r i = 1 : l e n g t h (N) % Run e l e m e n t s i n n t i m e s
25 x = tg (1) . / t i ( : ,1 ) ; % Calculates tg / t i
26 n e w I n d e x e s = f i n d ( abs (N( i )x ) < TTh ) ; % F i n d s i n d e x e s o f t i , t h a t s a t i f i e s t h e t h r e s h o l d TTh
27 i f length ( newIndexes) > 0
28 index = [ index ; newIndexes ] ; % Adds t h e f o u n d i n d e x t o a g e n e r a l i n d e x v a r i a b l e
29 end
30 end
31
32 % T e s t among t h e c a n d i d a t e s o f a m i n i m a s t o t h e f u n d a m e n t a l f r e q u e n c y minima
33 % T e s t t h e c a n d i d a t e s i f t h e t h r e d s h o l d i s w i t h i n t h e t h r e s h o l d ATh
34 newTgIndex = 0 ; % I n d e x among i n d e x o f new f u n d a m e n t a l f r e q u e n c y minima
35 for i =1: length ( index ) % Runs o n l y among t h e h a rm o n i c c a n d i d a t e s
36 i f t i ( i n d e x ( i ) , 2 ) t g ( 2 ) < ATh % T e s t i f ATh t h r e s h o l d i f s a t i f i e d
37 newTgIndex = i ; % S a ve i n d e x o f i n d e x o f new f u n d a m e n t a l f r e q u e n c y minima
38 end
39 end
40 i f newTgIndex > 0 % I f a new f u n d a m e n t a l f r e q u e n c y was f o u n d , s e t new t g
41 tgTemp = t i ( i n d e x ( newTgIndex ) , : ) ; % Makes co p y o f new f u n d a m e n t a l f r e q u e n c y minima
42 t i ( index ( i ) , : ) = tg ; % C o p i e s o l d f u n d a m e n t a l f r e q u e n c y minima i n t o t i
43 t g = tgTemp ; % C o p i e s new f u n d a m e n t a l f r e q u e n c y minima i n t o t g
44 t i = sortrows ( ti ,1) ; % S o r t s ro ws a c c o r d i n g t o i n d e x e s ( a s c e n d i n g )
45 t i = t i ( end : 1 : 1 , : ) ; % R e v e r s e s t i . . . s o r t m u st be d e s e n d i n g
46 end

XVII
CHAPTER 1. PITCH DETECTION

1.9 Litterature list

[1] Saurabh Sood and Ashok Krishnamurthy: A Robust On-The Fly Pitch
(OTFP) Estimation Algorithm. The Ohio State University, Columbus OH.

[2] A. D. Cheveigne and H. Kawahara. Yin: A fundamental frequency


estimator for speech and music. Journal of the Acoustical Society of
America, 111(4), 2002.

[3] S. Chowdhury, A. K. Datta, and B. B. Chaudhuri. Pitch detection


algorithm using state phase analysis. Journal of the Acoustical
Society of India, 28(1), Jan. 2000.

[4] http://www.cnmat.berkeley.edu/~tristan/Report/node4.html

[5] http://www.ee.ucla.edu/~ingrid/ee213a/speech/vlad_present.pdf

XVIII