Академический Документы
Профессиональный Документы
Культура Документы
Vesa Vlimki
Report 37
December 1995
ISBN 9512228807
ISSN 0356083X
TKK Offset 1995
Abstract
This work deals with digital waveguide modeling of acoustic tubes, such as bores of
musical woodwind instruments or the human vocal tract. The acoustic tube systems
considered in this work are those consisting of a straight cylindrical or conical tube
section or of a concatenation of several cylindrical or conical tube sections. Also, the
junction of three tube sections is studied. Of special interest for our application are
junctions where a side branch is connected to a cylindrical or conical tube since these
are needed in the simulation of woodwind instrument bores.
Basic waveguide models are generalized by employing the concept of fractional
delay, which means a fraction of the unit sample interval. A fractional delay is implemented using bandlimited interpolation. A novel discrete-time signal processing technique, deinterpolation, is defined. Applying fractional delay filtering techniques, a spatially discretized waveguide model is turned into a spatially continuous one. This
implies that the length of the digital waveguide can be adjusted as accurately as
required, and a change of the impedance of a waveguide may occur at any desired point
between sampling points. This kind of a system is called a fractional delay waveguide
filter (FDWF). It is a discrete-time structure but yet a spatially continuous model for a
physical system.
The basic principles of digital waveguide modeling are first reviewed. Modeling
techniques for cylindrical and conical acoustic tubes are described, as well as methods
to simulate junctions of two or more of these sections. Different design methods for
both FIR and IIR (allpass) fractional delay filters are reviewed and the theoretical foundations of FDWFs are studied. Fractional delay extensions for acoustic tube model
structures are discussed and approximation errors due to fractional delay filters are analyzed. In addition, a new technique for eliminating transients due to time-varying filter
coefficients in recursive filters is introduced.
The models described in this work are directly applicable to physical modeling and
model-based sound synthesis of speech and wind instruments.
Keywords: digital signal processing, digital filter design, fractional delay, interpolation, acoustic tube, conical acoustic tube, vocal tract modeling, physical modeling,
waveguide synthesis, time-varying digital filters
Preface
This work was carried out at the Laboratory of Acoustics and Audio Signal Processing in
the Helsinki University of Technology (HUT) (Espoo, Finland) during the years 1993
1995. I am deeply grateful to my supervisor, Prof. Matti Karjalainen, for constant help
and friendship during all phases of my work and also for organizing financial support for
my research.
I have had the possibility to conduct part of this study at CARTES (Espoo, Finland). I
am grateful to Mr. Otto Romanowski (director of CARTES), Ms. Sini Sormunen, Mr.
Harri Kaimio, Mr. Jan G. Liljestrand, Mr. Tuomo Rnkk, Mr. Erkka Suominen, Mr.
Juuso Voltti, and other people I have met in CARTES, for a pleasant and inspiring atmosphere.
I have also spent more and more time working at the Acoustics Laboratory in
Otaniemi, and I want to thank the personnel of this slightly overcrowded laboratory for
being friendly every day. Especially I want to thank our secretary, Ms. Lea Sderman,
for her kind help in practical issues, and Mr. Jyri Huopaniemi and Mr. Martti Rahkila for
their assistance in questions associated with computer networks and file formats. Jyri
read the manuscript of this thesis and gave me several useful comments. I wish to thank
Mr. Timo Kuisma with whom I had fruitful discussions on the nature of deinterpolation.
We also worked together on two conference papers. Lic. Tech. Toomas Altosaar has
proof-read many of my conference papers and has helped me to improve my English
writing. I am deeply indebted to him for this. At the Acoustics Laboratory, I have had
many interesting discussions with experienced scientists, such as Dr. Paavo Alku, Mr.
Juha Backman, and Dr. Unto K. Laine, which I gratefully acknowledge here. Paavo read
the manuscript and helped me with his insightful comments. My special thanks go to Mr.
Tommi Huotilainen, with whom I share an office room at the Acoustics Lab, for being a
tough opponent in the game of squash.
I wish to thank the pre-examiners of my thesis, Assoc. Prof. Olli Simula (Laboratory
of Computer and Information Science, HUT) and Assoc. Prof. Julius O. Smith
(CCRMA, Stanford University, California, USA) for their positive feedback and profound comments. I am especially happy that I have had the opportunity to discuss interesting issues related to physical modeling of musical instruments with Dr. Smith several
times during the years 19911995 all over the world (e.g., in Finland, Sweden,
Denmark, Japan, Italy, USA, Canada, and the Internet).
I would like to express my gratitude to Prof. Timo I. Laakso (SEMSE, University of
Westminster, London, U.K.) who has played an important role in this study: with him I
have written numerous papers that are related to this thesis. Furthermore, Timo gave me
valuable comments on my manuscript thus helping me to improve my text.
I also want to thank Prof. Iiro Hartimo (Laboratory of Signal Processing and
Computer Technology, HUT) who read an early version of my manuscript and gave me
useful feedback. Mr. Luis Costa worked hard to find language deficiencies and typing
mistakes in the manuscript. I want to thank him for his diligence. I learned a lot from his
comments.
Dr. JeanMarc Jot (IRCAM, Paris, France) and I had some inspiring e-mail discussions on time-varying digital filters. Mr. Stephan Tassart (IRCAM, Paris, France) has
amused me with theoretically interesting questions related to Lagrange interpolation. I
7
have had some friendly debates with Mr. Maarten van Walstijn (Royal Conservatory, the
Hague, the Netherlands) via e-mail, and also in Ferrara, Padua and Venice, about fractional delays and conical waveguides. I want to express my gratitude to these fine persons for their interest in my work.
I have had the possibility to write a journal paper with Prof. Tapio Tassu Takala
(Laboratory of Information Processing Technology, HUT). I thank him for fluent cooperation and friendship. I wish to thank my coauthor Dr. Jonathan Mackenzie (SEMSE,
University of Westminster, London, U.K.) for innovative co-operation. Mr. Rami
Hnninen (Laboratory of Information Processing Technology, HUT) has recently started
implementing a woodwind instrument model that employes fractional delay techniques
presented in this work. Discussions with him have helped me to understand how to better
describe some of the filter structures. My sincere thanks go also to Mr. Sami Linnainmaa
(HUT) for his clever comments on the Thiran allpass filter. I am grateful to Mr. Martti
Vainio and Mr. Olli Salmensaari for their help in getting hold of some interesting references that appeared to be difficult to find.
I thank my good friend, mathematician, poet, and rock musician, Timo Kojonen, for
trying to weed out inaccurate mathematical notations in my work and especially for great
company in the evenings. I would also like to thank my fiance Miia Jaatinen for love and
understanding also when I have been under pressure. Luckily enough, she is still not
showing any particular interest in digital signal processing, and has always been successful in making me think of issues other than work or this thesis.
Finally, my warmest thanks go to my parents, Anneli and Pekka Vlimki, who have
constantly encouraged me to study as much as possible, and given me the opportunity to
do so.
The financial support of the Academy of Finland, the Finnish Cultural Foundation,
and the Jenny and Antti Wihuri Foundation is acknowledged.
Westend, Espoo
November 30, 1995
Vesa Vlimki
Table of Contents
List of Symbols ............................................................................................................. 12
List of Abbreviations ................................................................................................... 15
1 Introduction ............................................................................................................. 17
1.1 Historical Notes.................................................................................................. 17
1.2 Overview of the Work........................................................................................ 20
1.3 Contributions of the Author ............................................................................... 21
1.4 Related Publications........................................................................................... 22
2 Digital Waveguide Modeling .................................................................................. 25
2.1 Waveguide Modeling of Strings ........................................................................ 25
2.1.1 Wave Equation and the Traveling Wave Solution .................................. 25
2.1.2 Incorporating Losses and Dispersion ...................................................... 26
2.1.3 KarplusStrong Algorithm and Its Extensions........................................ 27
2.1.4 Calibration of the Waveguide String Model ........................................... 29
2.1.5 Excitation Signal of the Waveguide String Model.................................. 32
2.1.6 Discussion ............................................................................................... 33
2.2 Waveguide Modeling of Cylindrical Tubes ....................................................... 34
2.2.1 The One-Dimensional Wave Equation.................................................... 35
2.2.2 A Model Composed of Coupled Uniform Tube Sections ....................... 35
2.2.3 KL Scattering Junction for Volume Velocity Waves.............................. 36
2.2.4 KL Scattering Junction for Pressure Waves ............................................ 38
2.2.5 Boundary Conditions of a Cylindrical Tube ........................................... 39
2.2.6 General Properties of the Cylindrical Tube Model ................................. 40
2.2.7 Sparse Tube Model.................................................................................. 41
2.2.8 Extensions to the Basic KL Model .......................................................... 42
2.3 Junction of Three Acoustic Tubes ..................................................................... 42
2.3.1 Three-Port Scattering Junction for Volume Velocity.............................. 43
2.3.2 Three-Port Scattering Junction for Sound Pressure ................................ 44
2.3.3 Side Branch in a Uniform Tube .............................................................. 45
2.4 Waveguide Modeling of Conical Tubes ............................................................ 48
2.4.1 Traveling Wave Solution for Spherical Waves ....................................... 49
2.4.2 Junction of Conical Tube Sections .......................................................... 50
2.4.3 Reflection Function with Taper Discontinuity Only ............................... 52
2.4.4 Closed and Open End of a Truncated Conical Tube ............................... 53
2.4.5 Discrete-Time Reflection and Transmission Filters................................ 54
2.4.6 Stability of the Reflection Filter .............................................................. 56
2.4.7 Modeling the Ends of a Conical Tube ..................................................... 56
2.4.8 Conclusions and Discussion .................................................................... 57
2.5 Waveguide Models for Wind Instruments ......................................................... 58
2.5.1 Need for Finger-Hole Models ................................................................. 59
2.5.2 Modeling of a Single Woodwind Finger Hole ........................................ 59
2.5.3 Discrete-Time Scattering Junction .......................................................... 61
2.6 Relation to Other Methods ................................................................................. 62
9
10
Table of Contents
11
List of Symbols
ak
cross-sectional area
A(z)
area ratio
speed of sound
scaling coefficient
C(z)
D(z)
denominator of A(z)
E(e
f0
fundamental frequency
fN
Nyquist frequency ( f N = f s / 2)
fs
sampling frequency
square matrix
coefficient vector
G(z)
h(n)
hid (n)
hr (n)
H(z)
Hid (z)
imaginary unit ( j = 1)
kn
length in meters
index variable
integer constant
List of Symbols
nc
Na
sound pressure
volume velocity
R( )
reflection function
time variable
tk
sampling interval (T = 1 / f s )
T(z)
transmission function
transformation matrix
Vandermonde matrix
vk (n)
column vector
v(n)
Vandermonde matrix
w(n)
W(z)
W( )
x(n)
xc (t)
X(z)
y(n)
yc (t)
acoustic admittance
Y(z)
z-transform variable
acoustic impedance
13
14
( )
(k)
( )
phase response
id ( )
delay
g ( )
group delay
p ( )
phase delay
density of air
angular frequency
List of Abbreviations
CFD
DF
direct form
DFT
DSP
DTFT
FD
fractional delay
FDWF
FIR
FM
frequency modulation
GLS
IIR
IR
impulse response
KL
KellyLochbaum
KS
KarplusStrong
LS
least squares
MF
maximally flat
PPI
SISO
single-input-single-output
STFT
TLI
tangent-line interpolation
WDF
WGF
waveguide filter
ZZ
ZetterbergZhang
15
1 Introduction
The theme of this work is computational modeling of acoustic tubes. The models are
intended for use in sound synthesizers based on physical modeling. Such synthesizers
can be used for producing realistic sounds of woodwind instruments or the human
voice.
The technique used throughout this work is called digital waveguide modeling. The
term digital waveguide was proposed by Smith (CCRMA, Stanford University) who
has had a prominent role in the development of the theory of physically based sound
synthesis (Smith, 1987, 1992b, 1995a). The digital waveguide synthesis technique is
well suited to modeling of one-dimensional resonators, such as a narrow acoustic tube,
a vibrating string, or a thin bar.
18
Chapter 1. Introduction
19
vibration of strings is fed to electrical shakers that are attached to a violin body, the
strings of which have been carefully damped (Adrien, 1991). This approach has the
very nice advantage that the radiational properties of the instrument need not be simulated. In virtual-reality applications, however, this practical trick is not amenable, since
all the vibrating structures must create numerical data to be used by other parts of the
virtual reality system. Recently, a new implementation of modal synthesis, called
Modalys, has been developed at IRCAM (Eckel et al., 1995).
In the first half of this decade, waveguide synthesis (or, waveguide modeling) has
turned out to be the most important of all the physical modeling methods (Smith,
1995b). This applies to both the academic and industrial communities: a majority of
recent advances in the theory of physical modeling have concerned digital waveguide
techniques, and all existing commercial physical modeling synthesizers utilize digital
waveguides. This methodology is also used in this work and is reviewed in detail in
Chapter 2.
The first commercial synthesizer to employ a physical model was the Yamaha VL-1
Virtual Acoustic Synthesizer that was introduced early in 1994. It was based on digital
waveguide modeling techniques. Recently, other electronic instruments following the
same guidelines have been released, and more are like to appear in the future. Physical
modeling is considered the most important novel digital sound synthesis technique in
the past decade. FM synthesis, the previous revolutionary technique, was introduced in
1973, and the first commercial product to employ itthe Yamaha DX-7was introduced in 1983. Nowadays, FM synthesizers along with sampling keyboards, are the
most widespread electronic musical instruments. Model-based synthesizers are bound to
become popular because they sound and behave as acoustic instruments and thus offer
more possibilities of expression than the earlier devices (Jaffe, 1995). A fascinating
future prospect is to build hybrid synthesizers employing physical modeling combined
with sampling and filtering techniques (Smith, 1995b).
Physical modeling is interesting also for research purposes. A physical model
enables independent adjustment of individual parameters of an instrument, such as the
mass or stiffness of a vibrating reed. This is typically very difficult with an acoustic
instrument. Thus a model can help to more clearly understand how an instrument can
and should be played and calibrated.
With a highly sophisticated model, it would be possible to design new acoustic
musical instruments, and these could be played before they are actually built.
Modification of existing instruments could also be tried out using such a system. Today
this is still a dream, since modern physical models that produce sound in real time are
greatly simplified.
The major strategy in the development of current waveguide synthesis models has
been to include only the most crucial mechanisms of the instrument and eliminate the
less significant ones. Properties of the human auditory system need to be considered,
since it is the human ear that finally judges whether the synthetic sound is satisfactory.
The rule of thumb is that those features of the instrument that do not bring any audible
improvement to the sound are discarded. Learning to do this also means that one has
understood the essential factors that make a musical instrument sound the way it does.
20
Chapter 1. Introduction
21
At the end of Chapter 3, a method to suppress these transients caused by changing the
coefficients is introduced.
Chapter 4 discusses the possibilities that the use of fractional delays offer. In short,
fractional delays are needed for two purposes in waveguide models. First, they are used
for tuning the total length of the waveguide system. In musical instrument models this is
equivalent to tuning. The fundamental frequency of a one-dimensional resonator is
determined by the time it takes for a sound signal to travel from one end of it to the
other. Obviously, this delay has to be very precisely modeled. Otherwise the waveguide
synthesizer sounds out of tune. In vocal tract models, the length of the digital waveguide determines the formant frequencies.
Second, fractional delays are required for simulation of waveguides where some
change in the medium takes place at an arbitrary point along the waveguide. For example, in a model of the human vocal tract the different vowels are generated by changing
its profile. This is achieved by moving the tongue, jaws or other articulators. Hence the
exact location where the change takes place is very significant because it determines
how the voice sounds.
Chapter 4 first presents a new signal processing technique, deinterpolation. Using
both interpolation and deinterpolation operations, a vocal tract model can simulate a
change of profile at any given point. A new feature of the proposed model is that
thanks to fractional delaysit is possible to independently vary the length of each tube
section.
Furthermore, tube systems with side branches are discussed. A practical example of
an acoustic tube system with a side branch is the human vocal tract with both oral and
nasal tracts included. Another example is a finger hole in the side of a woodwind bore.
These systems can be modeled with digital waveguides employing fractional delay filters. Structures for both FIR and IIR fractional delay waveguide filters are presented.
Errors due to the fractional delay approximation are examined.
Finally, Chapter 5 summarizes the results of this work and provides suggestions for
future research.
22
In Chapter 2 of this work, modeling of vibrating strings is also discussed as an introduction to the topic of waveguide synthesis. The author has contributed to the development and implementation of the string models. He also developed and programmed the
analysis methods described in Section 2.1.4 and 2.1.5.
The design methods for fractional delay filters described in Chapter 3 were first
described in a report co-authored by this author (Laakso et al., 1994). The author has
made considerable contributions to this work describing the maximally flat design
techniques for both FIR and allpass filters (see Sections 3.3 and 3.4 of this work). He
also developed the Farrow structure for Lagrange interpolation (see Section 3.3.7).
Furthermore, the author has invented a new technique for minimizing the transients
caused by time-varying filter coefficients in recursive filters (Section 3.5.5). This problem is relevant not only to this work but arises in every DSP application where filter
coefficients need to be changed during the filtering operation. Here this technique is
utilized with allpass fractional delay filters, but the same theory and algorithm are
applicable to any recursive filter. The determination of the parameter Na (the advance
time of the transient eliminator), based on the cumulated energy of the recursive filters
impulse response, is a result of cooperations with Prof. Laakso. He initialized this work
but the author derived many of the results.
To conclude, in this work the author presents a new framework for simulation of
physical systems. His main message is that it is not necessary to suffer from discretization of time delays inside a discrete-time system, such as a digital waveguide model,
although its input and output signals are inevitably discretized in time. Consequently,
discretization of the spatial variable in simulation of wave propagation is not obligatory.
This limitation can be overcome using interpolation techniques described in this work.
Chapter 1. Introduction
23
Vesa Vlimki, Matti Karjalainen, and Timo Kuisma, Articulatory control of a vocal
tract model based on fractional delay waveguide filters, in Proc. IEEE Int. Symp.
Speech, Image Processing and Neural Networks (ISSIPNN94), Hong Kong, April 13
16, 1994, vol. 2, pp. 571574.
Vesa Vlimki, Matti Karjalainen, and Timo Kuisma, Articulatory speech synthesis
based on fractional delay waveguide filters, in Proc. IEEE Int. Conf. Acoust., Speech,
Signal Processing (ICASSP94), Adelaide, Australia, April 1922, 1994, vol. 1, pp.
585588.
Vesa Vlimki and Matti Karjalainen, Digital waveguide modeling of wind instrument
bores constructed of truncated cones, in Proc. Int. Computer Music Conf. (ICMC94),
Aarhus, Denmark, September 1217, 1994, pp. 423430.
Vesa Vlimki and Matti Karjalainen, Improving the KellyLochbaum vocal tract
model using conical tube sections and fractional delay filtering techniques, in Proc.
1994 Int. Conf. Spoken Language Processing (ICSLP94), Yokohama, Japan,
September 1822, 1994, vol. 2, pp. 615618.
Timo I. Laakso, Vesa Vlimki, Matti Karjalainen, and Unto K. Laine, Crushing the
DelayTools for Fractional Delay Filter Design. Report no. 35, Espoo, Finland,
Helsinki University of Technology, Faculty of Electrical Engineering, Laboratory of
Acoustics and Audio Signal Processing, 46 pages + figures + 2 tables, October 1994. A
revised version entitled Splitting the unit delaytools for fractional delay filter
design has been accepted for publication in the IEEE Signal Processing Magazine, vol.
13, no. 1, January 1996.
Vesa Vlimki, Jyri Huopaniemi, Matti Karjalainen, and Zoltn Jnosy, Physical
modeling of plucked string instruments with application to real-time sound synthesis,
presented at the 98th AES Int. Convention 1995, preprint no. 3956, 52 p., Paris, France,
February 2528, 1995. A revised version has been submitted to the Journal of the Audio
Engineering Society, July 1995.
Vesa Vlimki, A new filter implementation strategy for Lagrange interpolation, in
Proc. IEEE Int. Symp. Circuits and Systems (ISCAS95), Seattle, Washington, April 29
May 3, 1995, vol. 1, pp. 361364.
Vesa Vlimki and Matti Karjalainen, Implementation of fractional delay waveguide
models using allpass filters, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal
Processing (ICASSP95), Detroit, Michigan, May 912, 1995, vol. 2, pp. 15241527.
Vesa Vlimki, Fractional delay waveguide filters for modeling acoustic tubes, in
Proc. Second Int. Conf. Acoustics and Music Research (CIARM95), Ferrara, Italy, May
1921, 1995, pp. 4146.
Vesa Vlimki, Timo I. Laakso, and Jonathan Mackenzie, Elimination of transients in
time-varying allpass fractional delay filters with application to digital waveguide
modeling, in Proc. 1995 Int. Computer Music Conf. (ICMC95), Banff, Canada,
September 37, 1995, pp. 327334.
24
Vesa Vlimki and Tapio Takala, Virtual musical instrumentsnatural sound with
physical models, submitted to Organised Sound, vol. 1, no. 1, April 1996 (special issue
on sounds and sources).
Parts of the present work have been published in the same or slightly modified form in
Vesa Vlimki, Fractional Delay Waveguide Modeling of Acoustic Tubes. Report no.
34, Espoo, Finland, Helsinki University of Technology, Faculty of Electrical
Engineering, Laboratory of Acoustics and Audio Signal Processing, 133 pages, July
1994.
2y
2y
= 2
t 2
x
(2.1)
where t is time, x is distance, y(t, x) is the string displacement, K is the string tension,
and e is the mass density of the string. The general solution of the wave equation was
published by dAlembert in 1747 and is given by
25
26
z-1
z-1
z-1
z-1
y(n,m)
z-1
y (n,m )
Fig. 2.1 A digital waveguide consists of two delay lines where the signal components
travel in opposite directions. The amplitude of the signal at any point (n, m) is
obtained summing the two components.
y(x,t) = y + (x - ct) + y - (x + ct)
(2.2)
where y + (x - ct) and y - (x + ct) are two arbitrary twice-differentiable functions that
represent waves traveling in opposite directions with speed c = K / e .
When the traveling waveforms are bandlimited lowpass signals, it is straightforward
to sample the traveling wave solution of Eq. (2.2). This yields
y(n, m) = y + (n, m) + y - (n, m)
(2.3)
Here we have discretized both the temporal variable t and the spatial variable x by substituting
t = nT
for n = 0,1,2,K
x = mX
for m = 0,1,2,K
(2.4)
where T is the sampling interval, i.e., the inverse of the sampling rate, and X is the spatial sampling interval which is related to T as
X = cT
(2.5)
27
i.e., the time-derivative of displacement. Smith (1990) has shown that this kind of distributed losses can be modeled by a real gain factor associated with each unit delay of
the waveguide. The multipliers between each input and output point can be commuted
and combined thus preserving the computational efficiency of the waveguide system.
Frequency-dependent losses, such as those due to the bridge where the strings are
attached, can be modeled by placing a linear digital filter between each unit delay element of the waveguide. These elements may also be commuted and combined so that
between each input-output pair the total loss is modeled correctly. The computational
load of the commuted system may be two or three orders of magnitude less than that of
the straightforward one.
Physical systems are always passive and stable. In waveguide simulation it is of
paramount importance to preserve the passive nature of the real-world system. A
waveguide model includes a feedback loop, and if the loop gain exceeds unity at any
frequency, the system becomes unstable. Hence, digital filters to be included in a digital
waveguide have to be designed so that their magnitude response is equal to or less than
unity at all frequencies. This perspective has to be remembered when any linear systemsuch as a fractional delay filteris added to a waveguide system.
2.1.3 KarplusStrong Algorithm and Its Extensions
Karplus and Strong (1983) proposed a simple algorithm for the synthesis of plucked
string sounds. The method is based on the idea that a wavetable, i.e., a table containing
a sampled waveform of an audio signal, is modified while it is read. This technique was
found (Smith, 1983) to be a special case of a string simulation studied by McIntyre,
Woodhouse, and Schumacher (McIntyre and Woodhouse, 1979; McIntyre et al., 1983),
and it was extended with these ideas in mind by Jaffe and Smith (1983). The Karplus
Strong (KS) algorithm is an important predecessor of current waveguide models
because it established the usefulness of highly simplified cases and recently it has led to
quite detailed string instruments models (see, e.g., Karjalainen et al., 1993; Karjalainen
and Vlimki, 1993; Smith, 1993; Vlimki et al., 1995a).
The block diagram of the KS algorithm is shown in Fig. 2.2. The system is a recursive comb filter. The delay line (or wavetable) is initialized with white noise. The output of the delay line is fed to a lowpass filter that is called the loop filter. The filtering
result is the output of the system and it is also fed back to the delay line.
This technique does not produce a purely repetitive output signal as does the basic
wavetable synthesis (sampling) technique. Namely, those frequency components of the
random signal that coincide with the resonances of the comb filter will attenuate slower
than the other components. Thus, the signal in the delay line progressively turns into a
pseudo-periodic signal with a clearly perceivable fundamental frequency.
In the original KS model, the loop filter Hl (z) is a two-tap averager which is easy to
implement without multiplications. This simple filter, however, cannot match the frequency-dependent damping of a physical string. We feel that an FIR filter is not suited
to this purpose since its order should be rather high to match the desired characteristics.
Delay Line
Loop Filter
Output
28
Input
Pluck Point
Equalizer
String
Model
x(n)
P(z)
S(z)
Output
y(n)
Fig. 2.3 The generalized KarplusStrong model consists of a string model S(z) cascaded with a plucking point equalizer P(z).
Instead we suggest the use of an IIR lowpass filter for simulating the damping characteristics of a physical string. From a physical point of view, it seems well motivated to
use a second or third-order all-pole filter as a loop filter (Fletcher and Rossing, 1991). It
is meaningful to use the simplest satisfying loop filter since one such filter is needed for
the model of each string. Also, the loop filter is not a static filter but its coefficients
should be changed as a function of the string length and other playing parameters. This
is easiest to achieve using a low-order filter.
Furthermore, it is important to remember that the loop filters magnitude response
must not exceed unity at any frequency since otherwise the system becomes unstable.
When the magnitude response of the filter is less than unity at all frequencies the resulting sound will gradually attenuate. The timbre will also vary over time if the magnitude
response is not flat, because the harmonics of the signal will attenuate at different rates.
Dispersion can be brought about using a filter with a nonlinear phase response.
We have found that a reasonable approximation for the frequency-dependent damping in a string may be obtained using a first-order all-pole filter
Hl (z) = g
1 + a1
1 + a1z -1
(2.6)
where g is the gain of the filter at w = 0, and a1 is the filter coefficient that determines
the cutoff frequency of the filter. For Hl (z) to be a stable lowpass transfer function, we
require that 1 < a1 < 0. The numerator 1 + a1 scales the frequency response (divided
by g) at w = 0 to unity thus allowing control of the low-frequency gain using the coefficient g. We require that g 1.
A comb filter P(z) may be cascaded with the string model to cause the filtering effect
due to the plucking position (Jaffe and Smith, 1983). Its transfer function is
P(z) = 1 + z - M Rf (z)
(2.7)
where Rf (z) is the reflection filter of one end of the string model and M is the delay (in
samples) between the excitation point in the upper and lower line of the original
waveguide model. In practice Rf (z) in (2.7) need not be modeled very accurately since
its magnitude response is only slightly less than unity at all frequencies. In practice the
most remarkable difference between the direct and reflected excitation is the inverted
phase and thus the filter can be replaced by a constant multiplier rf = 1 + e where e is
a small nonnegative real number. This modified system that we call a generalized KS
model is illustrated in Fig. 2.3.
One of the major problems of the basic KarplusStrong algorithm is that the fundamental frequency of the tone cannot be accurately controlled. The fundamental frequency f 0 of the synthetic signal is determined by the length of the delay line, that is
29
y(n)
Hl (z)
F(z)
z- LInt
Fig. 2.4 The block diagram of the generalized KarplusStrong model S(z) for a vibrating string. The filter F(z) is an FIR or allpass-type fractional delay filter, LInt
is the length of the delay line (in samples), and Hl (z) is the loop filter that
accounts for losses and dispersion. The input signal x P(n) has been filtered
with the pluck point equalizer (see, Fig. 2.3).
f0 =
fs
M +1/ 2
(2.8)
where f s is the sampling rate, M is the length of the delay line in samples, and 1/2
refers to the phase delay (or, equivalently, the group delay) of the two-point averaging
filter. It is seen that Eq. (2.8) is a function of the integer-valued variable M. Thus the
fundamental frequency is quantized and an arbitrary pitch cannot be produced.
The length of the delay line loop, and consequently the fundamental frequency f 0 of
the synthetic signal, can be accurately controlled by adding a fractional delay filter in
the feedback loop of the KarplusStrong algorithm. The block diagram of the string
model S(z) is shown in Fig. 2.4. Jaffe and Smith (1983) introduced a first-order allpass
filter for producing the desired fractional delay. Later, a linear interpolator (Sullivan,
1990) and a third-order Lagrange interpolator (Karjalainen and Laine, 1991) have been
used for this task. Chapter 3 of this work discusses various digital filter approximations
for the fractional delay.
2.1.4 Calibration of the Waveguide String Model
The synthesis system includes three parts that completely determine the character of the
synthetic sound: the string model, the plucking point equalizer, and the input sequence.
This implies that to calibrate the model to some particular instrument it is needed to
estimate the values for the length of the delay line L, the coefficients of the loop filter
Hl (z), the delay M and the parameter rf of the plucking-point equalizer, and the input
signal x(n). In this section, parameter estimation procedures are described that will
extract these values.
The delay length L (in samples) determines the fundamental frequency f 0 of the
synthetic signal as follows
f0 =
fs
L
(2.9)
where f s is the sample rate. For pitch detection we have used a well-known method
based on the windowed autocorrelation function (Rabiner, 1977). An estimate for the
pitch is obtained by searching for the maximum of this function for each frame. Median
filtering may be used to remove erroneous results from the pitch estimate sequence. We
have successfully used three-point median filtering for this purpose.
The pitch of a plucked string tone is usually a function of time, decreasing slightly
after the attack and reaching a steady value within about 0.5 s. The problem then is to
30
determine the best estimate f0 for the fundamental frequency in a perceptual sense. In
practice a good solution is to use the average pitch value after some 500 ms as the
nominal value, since it is important to have a reliable pitch estimate towards the end of
the note. This improves the quality of synthesized tones. When the estimate f0 for the
fundamental frequency has been chosen, the effective length L of the delay line can be
computed as
L=
fs
f0
(2.10)
When also the loop filter has been designed, we can subtract its phase delay t H (w )
(at the estimated fundamental frequency) from L and define the nominal fractional delay
Lf as
Lf = L - t H (w0 ) - L - t H (w0 )
(2.11)
where w0 = 2pf0 , and is the floor function [see Eq. (3.12)]. The delay Lf is used as
the desired delay in the design of the fractional delay filter F(z). In practice the phase
delay of this filter can be expressed as t f (w0 ) = Mf + Lf , where Mf Z + is the integral part of the phase delay that depends on the order of the fractional delay filter and
Lf is the fractional part.
Next we discuss how to measure the damping of the string as a function of frequency . All losses including the reflection at the two ends of the string are incorporated in the loop filter Hl (z). In order to design a loop filter we need to estimate the
damping factors at the harmonic frequencies of the sound signal. This is achieved using
the short-time Fourier transform (STFT) and tracking the amplitude of each harmonic.
The STFT of signal y(n) is a sequence of discrete Fourier transforms (DFT) defined
as
Y m (k) =
N -1
w(n)y(n + mH)e- jw n
k
for m = 0, 1, 2,K
(2.12a)
n=0
with
wk =
2pk
N
for k = 0, 1, 2,K , N - 1
(2.12b)
where N is the length of the DFT, w(n) is a window function, and H is the hop size or
time advance (in samples) per frame. In practice we compute each DFT using the FFT
algorithm. To obtain a suitable compromise between time and frequency resolution, we
use a window length of four times the period length of the signal. The overlap of the
windows is 50% implying that H is 0.5 times the window length. We apply excessive
zero-padding by filling the signal buffer with zeros to reach N = 4096 in order to
increase the resolution in the frequency domain. The spectral peaks corresponding to
harmonics can be found from the magnitude spectrum by first finding the nearest local
minimum at both sides of an assumed maximum. The largest magnitude between these
Smith (1993b) has independently developed a very similar analysis method for the waveguide string
model.
31
two local minima is assumed to be the harmonic peak. The estimates for the frequency
and magnitude of the peak are fine-tuned by applying parabolic interpolation around
the local maximum and solving for the value and location of the maximum of this interpolating function. For details, see Smith and Serra (1987) or Serra (1989). The number
of harmonics to be detected is typically Nh < 20 (for the acoustic guitar). The STFT
analysis is applied to the portion of the signal that starts before the attack and ends after
some 0.51 s after the attack.
The spectral peak detection results in a sequence of magnitude and frequency pairs
for each harmonic. The sequence of magnitude values for one harmonic is called the
envelope curve of that harmonic. A straight line is matched to each envelope curve on a
dB scale since ideally the magnitude of every harmonic should decay exponentially,
i.e., linearly on a logarithmic scale. Measurements show that this idealized case is rarely
met in practice, and many different kinds of deviations are common. It is possible to
decrease the error in the fit by starting the fit at the maximum of the envelope curve and
by terminating it before the data gets mixed with the noise floor, e.g., after the decay of
about 40 dB. As a result, a collection of slopes b k for k = 1, 2, K, Nh is obtained. The
corresponding loop gain of the string model at the harmonic frequencies is computed as
Gk
bk L
20
= 10 H
for k = 1, 2,K, Nh
(2.13)
where b k are the slopes of the envelopes, and H the hop size. The sequence Gk determines the prototype magnitude response of the loop filter Hl (w ) at the harmonic frequencies kf 0 , where k = 1, 2, K, Nh .
The desired phase response for Hl (w ) can be determined based on the frequency
estimates of the harmonics. We do not, however, try to match the phase response of the
filter. There are two principal reasons for this. First, we believe that it is much more
important to match the time constants of the partials of the synthetic sound than the frequencies of the partials since quite small deviations from harmonic frequencies occur in
the case of plucked string instruments. In case we wanted to resynthesize strongly
inharmonic tones or a piano with properly stiff strings, the phase response of the loop
filter should be considered. The second reason is that as we want to use a low-order
loop filter, the complex approximation of the frequency response would not be very
successful. Thus we restrict ourselves to magnitude approximation only in the loop filter design.
The human ear is sensitive to the change of decay rate of a sinusoid, and in practice
we measure the time constant of some 20 lowest harmonics with the object of matching
the frequency response of the loop filter Hl (w ) to that data. Since we use a first-order
loop filter, it is clear that there are more restrictions than unknowns in this problem, and
only an approximate solution is possible. We have decided to use weighted least
squares design (Vlimki et al., 1995a).
It is reasonable to choose a weighting function W(Gk ) that gives a larger weight to
the errors in the time constants of the slowly decaying harmonics since the hearing
tends to focus on their decay. A candidate for such a weighting function is
W(Gk ) =
1
1 - Gk
(2.14)
32
We require that 0 Gk < 1 for all k, which is a physically reasonable assumption since
the system to be modeled is passive and stable.
The gain of the loop filter at low frequencies, i.e., g, can be chosen based on the loop
gain values of the lowest harmonics. In many cases it is good enough to set g = G0
(Vlimki et al., 1995a). The value for the coefficient a1 that minimizes the error function can then be found easily by differentiating the error function with respect to a1 and
finding the zero of the derivative. In practice we find a near-optimal solution in the following way: the value of the derivative is evaluated and, depending on the sign of the
result, a1 is changed by a small increment, the derivative is evaluated again, and so on.
After the derivative has reached a very small value, the iteration is terminated and the
final value for a1 is used in the synthesis model.
We have verified the convergence of this design procedure in practice by analyzing
signals generated using the synthesis model. The loop filter designed based on the analysis data yields the same filter parameterswithin numerical accuracyas used in the
synthesis. Also, matches with natural tones have been found to be satisfactory in many
cases (Vlimki et al., 1995a).
It is well understood that when a string is set into to vibration by plucking it, the
sound signal will lack those harmonics that have a node at the plucking point (see, e.g.,
Fletcher and Rossing, 1991). However, in general, the string is not plucked exactly at
the node of any of the lowest harmonics, and since the amplitude of the higher harmonics is considerably small anyway, it is not always possible to accurately detect the
plucking point by simply searching for the lacking harmonics in the magnitude spectrum. Another practical problem may occur because of nonlinear behavior of the string.
Namely, the amplitude of vibration of a weak harmonic can gain energy from other
modes so that its amplitude begins to rise reaching a maximum about 100 ms after the
attack and then begin to decay (Legge and Fletcher, 1984). This can often be seen in the
analysis of harmonic envelopes of the guitar. For these reasons we believe that a more
comprehensive understanding of the effect of the plucking point can be achieved by
studying the time-domain behavior of the string in terms of the short-time autocorrelation function. A frequency-domain technique for estimating the plucking point has been
reported by Bradley et al. (1995).
Estimation of the plucking position is an inherently difficult problem since a
recorded tone can include contributions of several delays of approximately the same
magnitude, e.g., early reflections from objects near the player, such as the floor, the
ceiling, or a wall. To minimize the effect of these additional factors, we suggest analysis
of tones recorded in an anechoic chamber.
It is, however, not absolutely obligatory to estimate and model the effect of the
plucking position. Its contribution can be left in the excitation signal obtained using
inverse filtering, which is described in the following section.
2.1.5 Excitation Signal of the Waveguide String Model
The input signal x(n) of the waveguide string model can be estimated using inverse filtering, i.e., filtering of the signal y(n)which is now assumed to be the output of the
modelwith the inverted transfer function S -1 (z) of the string model. The transfer
function of the string model shown in Fig. 2.4 can be expressed as
1- z
33
- LInt
1
F(z)Hl (z)
(2.15)
where LInt is the length of the delay line (in samples), F(z) is the transfer function of the
fractional delay filter, and Hl (w ) is the transfer function of the loop filter. If S(z) had
zeros outside the unit circle, the inverse filter S -1 (z) would be unstable. In that case,
inverse filtering would not yield an acceptable result. Fortunately, this is not a problem
in practice as can be seen by substituting (2.6) into (2.15). Inverting this equation yields
1 + a1z -1 - g(1 + a1z -1 )z - LI F(z)
S (z) =
1 + a1z -1
-1
(2.16)
This technique is simple to apply but since the order of the loop filter is low, the
harmonics are not canceled accurately using the inverse filter S -1 (z). The resulting x(n)
may suffer from high-frequency noise. The low-order loop filter was chosen primarily
since it is efficient to implement the synthesis algorithm. However, inverse filtering is
an off-line procedure where accuracy is much more important than efficiency. We could
thus design a higher order filter to be used in the inverse filtering as suggested by
Laroche and Meillier (1994).
Another deficiency of the inverse filtering technique described above is that it does
not take into account the nonlinear effects of the physical strings. This could be
accounted for by using a time-varying inverse filter to cancel the transfer function of the
string. This filter can be designed based on the STFT analysis that was discussed in the
previous section.
The residual signals resulting from the analysis and inverse filtering of an original
string instrument tone can be directly applied as the excitation to the string model. The
excitation signal can be shortened by windowing the first 100 ms of the residual signal
using, e.g., the right half of a Hamming window. This truncated signal can be used as
the excitation of the synthesis model. In the case of the electric guitar, the length of the
residual signal used in resynthesis can be reduced to about 50 ms without any significant loss of information. This is due to the lack of body resonances.
Usually the residual signal of an acoustic guitar tone includes only a few low-frequency components of the body of the instrument. They have large time constants. The
most prominent ones are typically the air mode (the Helmholtz resonance) and the lowest mode of the back plate. Vlimki et al. (1995a) as well as Smith and Van Duyne
(1995) have suggested the extraction of these two strongly resonating modes. Once they
have been removed from the residual signal, the resulting signal is very short. The two
body resonances can be simulated using second-order all-pole filters.
2.1.6 Discussion
In this section we have discussed waveguide modeling of string instruments. We have
seen that these synthesis models have reached an advanced level: the model is compact
and efficient to implement but still mainly based on the physics of the instrument. The
analysis method enabled resynthesis of string tones or copying the timbre of a given
plucked string instrument. Smith and Van Duyne (1995) have used similar waveguide
techniques and some of their extensions for modeling the piano (see also Van Duyne
and Smith, 1995).
34
Plucked string instruments can be considered relatively easy from the viewpoint of
physical modeling, since their resonator is a simple linear system. Later in this chapter
we will tackle modeling of wind instruments which are strongly nonlinear systems with
a complicated linear resonator: a bore that contains finger holes or valves. For these reasons modeling is more complicated. Furthermore, nonlinearities make analysis tools
difficult to devise.
Lips
Vocal Folds
Vocal Tract
L = cT
Fig. 2.5 Vocal tract profile and its approximation by cylindrical tube sections.
35
(2.17)
where c is the propagation speed of the sound wave ( 340 m/s), p is the longitudinal
pressure displacement in the tube, t is time, and x is the distance. This equation may be
derived using Newtons second law of motion (Fletcher and Rossing, 1991). The second-order terms have been neglected in Eq. (2.17) and thus it is sometimes called the
linearized wave equation. This is, of course, an approximation, but it appears to be quite
adequate even with high sound pressures. This equation is practically always valid
when considering the human vocal tract or bores of musical instruments.
The general solution to the one-dimensional wave equation has the form
p(x,t) = p + (x - ct) + p - (x + ct)
(2.18)
where p + and p - are arbitrary continuous functions. The term p + (x - ct) represents a
pressure wave component traveling in the positive x direction at speed c whereas
p - (x + ct) represents a pressure wave traveling in the negative x direction at the same
speed. Provided that the signal traveling in the tube is bandlimited, this solution can be
discretized and a discrete-time waveguide model can be constructed as explained in
Section 2.1.1.
The exact shape of the pressure and volume velocity waves in a cylindrical tube
depends on the initial conditions and the boundary conditions. Typical cases for boundary conditions are a closed and an open end.
2.2.2 A Model Composed of Coupled Uniform Tube Sections
The characteristic impedance Z of an acoustic tube is defined by
Z=
rc rc
=
A pa 2
(2.19)
36
Zk +1
Zk
uk++1 (t)
pk+ (t)
uk (t)
pk- (t)
uk (t)
uk +1 (t)
pk +1 (t)
pk-+1 (t)
p + (x,t) p - (x,t)
=
u + (x,t) u - (x,t)
(2.20)
(2.21)
where t = x / c and Zk is the characteristic impedance of the kth tube section. The total
volume velocity in the kth segment is the difference of the two components
uk (x,t) = uk+ (t - t ) - uk- (t + t )
(2.22)
The pressure has to be continuous at the connection of the tubes and the net flow in
37
uk (L,t) = uk +1 (0,t)
(2.23)
where L = cT is the length of tube sections. We continue by substituting Eqs. (2.21) and
(2.22) into Eq. (2.23). This yields
(2.24)
From this pair of equations, it is easy to derive the equations describing the volume
velocity components uk- and uk++1. They are
Zk +1 - Zk +
2Zk +1
uk (t - t ) +
uk-+1 (t)
Zk +1 + Zk
Zk +1 + Zk
uk- (t + t ) =
uk++1 (t) =
2Zk
Z - Zk +1 uk+ (t - t ) + k
uk +1 (t)
Zk +1 + Zk
Zk +1 + Zk
(2.25a)
(2.25b)
Based on the above equations it is reasonable to define the reflection coefficient for
the kth junction in the positive direction as (Liljencrants, 1985, p. 1-3; Fletcher and
Rossing, 1991, pp. 147148)
rk =
Zk +1 - Zk Ak - Ak +1
=
Zk +1 + Zk Ak + Ak +1
(2.26a)
where Zk and Ak denote the acoustic impedance and cross-sectional area of the kth tube
section, respectively. The latter form has been devised using relation (2.19). It is seen
that rk 1, since the areas (and consequently the impedances) are always nonnegative.
The reflection coefficient of the same junction in the negative direction is opposite in
sign. The transmission coefficient in the positive direction is written as
tk =
2Zk
2 Ak +1
=
= 1 - rk
Zk +1 + Zk Ak + Ak +1
(2.26b)
(2.27a)
(2.27b)
There is some confusion in the literature about the definition of the reflection coefficient for the twoport junction. In some references the definition is opposite in sign with respect to the definition used here
(see, e.g., Rabiner and Schafer, 1978, pp. 8485).
38
a)
b)
1 - rk
uk+
rk
uk-
1 + rk
+
uk +1
uk+
uk-+1
uk-
rk
uk +1
- rk
uk-+1
Fig. 2.7 a) The two-port scattering junction for acoustic volume velocity signals and b)
an equivalent one-multiplier junction.
The signal flow diagram describing the scattering of volume velocity at a junction of
two uniform tube segments is illustrated in Fig. 2.7a. A more economical implementation of this junction is obtained by rewriting equations (2.27a) and (2.27b) in the following way:
uk- (t + t ) = uk-+1 (t) + w(t)
(2.28a)
(2.28b)
where
(2.28c)
Since the term w(t) is common to Eqs. (2.28a) and (2.28b), it needs to be computed only
once. This one-multiplier configuration of the two-port junction is depicted in Fig. 2.7b.
2.2.4 KL Scattering Junction for Pressure Waves
The two-port junction may also be derived for pressure signals. The total pressure in the
kth section is given by (2.18) and the net flow (total volume velocity) is written as
uk (x,t) = uk+ (t - t ) - uk- (t + t ) =
1 +
pk (t - t ) - pk- (t + t )
Zk
(2.29)
The application of continuity requirements for both pressure and volume velocity
results in the following pair of equations
1 p + (t - t ) - p - (t + t ) = 1 p + (t - t ) - p - (t + t )
k
k
k +1
k +1
Z
Zk +1
k
p + (t - t ) + p - (t + t ) = p + (t) + p - (t)
k
k +1
k +1
k
(2.30)
By eliminating the term pk++1 (t) it is easy to solve for pk- (t + t ) and, similarly by eliminating pk- (t + t ), for pk++1 (t)
pk- (t + t ) =
Zk +1 - Zk +
2Zk
pk (t - t ) +
pk-+1 (t)
Zk +1 + Zk
Zk +1 + Zk
(2.31a)
a)
b)
1 + rk
pk+
rk
pk-
39
pk++1
pk+
pk-+1
pk-
rk
pk++1
- rk
1 - rk
pk-+1
Fig. 2.8 a) The two-port scattering junction for the acoustic pressure signal and b) the
equivalent one-multiplier junction.
pk++1 (t) =
2Zk +1
Z - Zk +1 pk+ (t - t ) + k
pk +1 (t)
Zk +1 + Zk
Zk +1 + Zk
(2.31b)
The same reflection coefficient as in the case of volume velocity signals can still be
used. By using Eq. (2.26a), the pair of Eqs. (2.31a) and (2.31b) may be expressed as
pk- (t + t ) = rk pk+ (t - t ) + (1 - rk ) pk-+1 (t)
(2.32a)
(2.32b)
The scattering of the pressure wave at a junction of two uniform tube segments is illustrated in Fig. 2.8a. Note that the transmission coefficients differ from those defined for
volume velocity waves (see Fig. 2.7a).
A one-multiplier two-port junction for the sound pressure signals is obtained by
rewriting Eqs. (2.32a) and (2.32b) as follows
pk- (t) = pk-+1 (t) + w(t)
(2.33a)
(2.33b)
where
(2.33c)
The one-multiplier scattering junction for sound pressure signals is shown in Fig. 2.8b.
2.2.5 Boundary Conditions of a Cylindrical Tube
Modeling of the end point of a cylindrical tube is achieved with the help of the reflection functions defined above. For a closed end, the radiation impedance Zrad tends to
infinity. Then the reflection coefficient for the closed end (from the positive side) is
Zrad - Z M
=1
Z rad Zrad + Z M
r M = lim
(2.34)
where Z M is the characteristic impedance of the last tube section before the closed end.
Equation (2.34) means that the wave (either pressure or volume velocity) reflects back
from a closed end as such.
40
Input
-1
-1
-1
-1
SMk-1
1
S1
S0
-1
Output
SM
-1
Zrad (z) - Z M
Zrad (z) + Z M
(2.35)
The radiated volume velocity signal is obtained by filtering the signal at the open end of
the tube using the filter 1 - R(z) as suggested by Eq. (2.27b) (see also Fig. 2.7). The
radiated pressure signal, however, is computed using the filter 1 + R(z) (see Fig. 2.8).
In some applications it is of interest to model radiation from an open tube in a flange.
This is the case, e.g., in vocal tract modeling. Laine (1982) has derived a simple discrete-time reflection filter that is suitable for speech synthesis. It is based on the use of a
first-order differentiator (with a real scaling factor) as a model for the radiation
impedance. This approximation leads to a first-order IIR reflection filter. This model
has been used successfully as an approximation for the radiation impedance of a woodwind instrument (Vlimki et al., 1992b).
2.2.6 General Properties of the Cylindrical Tube Model
The traditional KellyLochbaum tube model is a ladder or lattice filter with a two-port
adaptor between every two unit delays. A model composed of M cylindrical tube sections is illustrated in Fig. 2.9. The model has two delay lines (a digital waveguide) and
M+1 two-port scattering junctions Sn .
Bonder (1983a) has derived an expression for the formant frequencies of the cylindrical M-tube model. He has also shown six general properties of the model. These
properties are recapitulated below.
1) If the area of each tube is scaled by a common factor, the resonance frequencies
remain unchanged. This is a consequence of the fact that the reflection coefficients depend on the ratio of the tube impedances (or areas), not on their exact
values [see Eq. (2.26a)].
2) Formant frequencies are inversely proportional to the length of the tube segments.
This property suggests that if the lengths of all the tube segments are multiplied
41
by some factor x, then the resonance frequencies are those of the original model
multiplied by 1/x.
3) A tube model with a certain configuration has the same formant frequencies as the
configuration with tubes in the reverse order.
4) In case of an odd-tube, i.e., when the number of tube segments is odd (or can be
reduced to odd by combining pieces with the same diameter), the model has fixed
formants at certain frequencies, irrespective of the profile of the tube. These frequencies are given by
1 cM
f fix,k = k +
2 2L
for k = 0,1,2,K
(2.36)
where c is the speed of sound, M is the total number (odd) of tube sections, and
L = Ml is the overall length of the tube. The indices of the fixed formant frequencies are [(M + 1) / 2] + kM (for k = 0,1,2,K). A model consisting of an even
number of sections also has fixed formants if it can be reduced to an odd-tube.
As an example of property 4, let us consider a 3-tube model (M = 3) with L = 17.5
cm. This model has fixed formants at frequencies 1500 Hz, 4500 Hz, 7500 Hz, etc.
These correspond to the resonance frequencies f 2 , f 5 , f 8 , and so on (Bonder, 1983a).
The other formants depend on the shape of the tube.
5) The lowest M / 2 formants of an M-tube model determine all the other formants except for the fixed ones: if f is one of the formants of an M-tube model,
then also cM / 2L - f and f + kcM / 2L (for k = 1,2,3,K) are its formants. Here
is the floor function defined by Eq. (3.12) and cM / 2L corresponds to the
Nyquist frequency f N of the system. Property 5 is another way of saying that the
frequency response of a sampled system is periodic with a period of f N .
6) There are (M - 1) / 2 different tube configurations with the same formant
structure. This feature is a consequence of the previous property. For a 2-tube
model there is no degree of freedom because 1 / 2 = 0 . For tube models with M
3, there is always a number of different ways to choose the diameters of the
sections and still end up with the same formant frequencies. Property 6 is the
explanation for the difficulty of estimating the diameters of the tube sections
directly from an acoustic signal. This question has been investigated in Bonder
(1983b).
2.2.7 Sparse Tube Model
One possible generalization of the M-tube model is such that some of the two-port
junctions are removed (or their reflection coefficient is set to 0). This kind of model is
more efficient to implement but can still produce essentially a similar spectral envelope
as the corresponding acoustic tube system. The number of degrees of freedom is, of
course, less than that for the traditional M-tube KL model.
Usually an M-tube model is considered in the context of digital ladder or lattice filters. Probably for this reason, there do not seem to have been many studies of models
where fewer than M + 1 scattering junctions are employed with M unit delays.
Examples are works by Lacroix and Makai (1979), Frank and Lacroix (1986), and Chan
42
and Leung (1991). From the physical viewpoint this seems like a natural way to reduce
the complexity of the model. We shall call a model where less than M + 1 junctions are
used in connection with M unit delays (with the original sampling frequency) a sparse
tube model. A further generalization of the M-tube model using fractional delay filtering
techniques is introduced in Chapter 4 of this work.
2.2.8 Extensions to the Basic KL Model
There are two severe limitations in the basic KellyLochbaum (KL) model:
1) variable tract length is not incorporated, and
2) all the tube sections have the same length.
Due to the first restriction it is impossible to exactly match given formants, since the
vocal tract length is in general not correctly modeled. The second one gives rise to difficulties in the control of the model. Namely, the mapping of the physical parameters to
the model parameters (reflection coefficients in the scattering junctions) is not simple.
The articulators move in the front-back dimension and the diameter of the tract also
varies, whereas in the model only the diameter of the tube can be adjusted at the fixed
points. Control would be more intuitive if the junctions could be moved.
There have been two solutions to the these problems. Strube (1975) suggested the
use of a fractional delay element in place of the last unit delay element of the delay line.
He showed that the frequencies and bandwidths of the formants were then changed
compared with the ideal ones. Later, Lagrange interpolation was proposed by Laine
(1988).
The other method for the accurate adjustment of the vocal tract length is due to Wu
et al. (1987a, 1987b). They keep the number of sections in the model constant but scale
the length of each section by a common factor, which can in principle be an arbitrary
real number. This is achieved by changing the sample rate of the model. This, however,
does not require that the sample rate of the computer hardware and DA converters
should be variable, but the operation is done by software instead. A time-varying FIR
interpolating filter is used for sampling-rate conversion. A more efficient form of this
technique has been reported by Wright and Owens (1993).
A drawback of both techniques described above is that the length of each tube section cannot be independently controlled in the discrete-time model. In Chapter 4 of this
work we present a fractional delay waveguide model where the length of each tube section can be arbitrarily controlled. Consequently, the total length of the vocal tract is
continuously variable.
43
Y1
u1-
u1+
p1-
p1+
p2 - u 2
p3
u3
-
u3
p3
p2 + u +
2
Y
Y3
uk = 0
(2.37)
u1+ (t) - u1- (t) + u2+ (t) - u2- (t) + u3+ (t) - u3- (t) = 0
(2.38)
k =1
(2.39)
This rule gives another equation that is needed in the derivation of the three-port junction. Equations (2.37) and (2.39) are analogous to Kirchhoffs current and voltage laws,
respectively, used for electrical networks.
In the following the three-port junction for the acoustic volume velocity is derived.
The variable quantities are illustrated in Fig. 2.10. Superscript + means that the signal
propagates towards the junction, and denotes the reverse direction. Subscript k refers
to the index of the branch.
The pressure in each branch of the junction at point x is expressed in the following
form:
pk (x,n) = pk+ (n - t ) + pk- (n + t ) =
1 +
uk (n - t ) + uk- (n + t )
Yk
(2.40)
44
1 +
1 +
1 +
u1 (t) + u1- (t) =
u2 (t) + u2- (t) =
u3 (t) + u3- (t)
Y2
Y3
Y1
(2.41)
Using Eqs. (2.38) and (2.41), the outgoing volume velocity signals u1- (t), u2- (t), and
u3- (t) may now be solved in terms of the ingoing signals u1+ (t) , u2+ (t) , and u3+ (t) :
u1- (t)
u2- (t)
u3- (t) =
(2.42a)
(2.42b)
(2.42c)
2Yk - Y1 - Y2 - Y3
Y1 + Y2 + Y3
(2.43)
Using this reflection coefficient for each branch, the Eqs. (2.42) may be rewritten as
[
]
u2- (t) = r2 u2+ (t) + (1 + r2 ) [u1+ (t) + u3+ (t)]
u3- (t) = r3 u3+ (t) + (1 + r3 ) [u1+ (t) + u2+ (t)]
u1- (t) = r1 u1+ (t) + (1 + r1 ) u2+ (t) + u3+ (t)
(2.44a)
(2.44b)
(2.44c)
Y1 p1+ (t) - p1- (t) + Y2 p2+ (t) - p2- (t) + Y3 p3+ (t) - p3- (t) = 0
(2.45)
(2.46)
From the above two equations it is possible to solve for the outgoing pressure signals
p1- (t), p2- (t), and p3- (t). This results in the following three equations
p1- (t) =
(2.47a)
45
(2.47b)
(2.47c)
p3- (t) =
The reflection coefficient for each branch can be defined similarly as earlier for the
volume velocity signals. However, the transmission coefficients, which were simply
1 + rk for the kth branch in the case of volume velocity, are now different in every case.
Equations (2.47) can be rewritten as
p1- (t) = r1 p1+ (t) + (1 + r2 ) p2+ (t) + (1 + r3 ) p3+ (t)
(2.48a)
(2.48b)
(2.48c)
Zs (w )
+
us
ua+
Z0
ua-
+
ps
us
ps-
pa+
pb+
pa
pb
ub+
ub-
Z0
Fig. 2.11 A three-port junction where a side branch is connected to a uniform tube.
46
Ys ( w )
Z0
=Ys (w ) + 2Y0
Z0 + 2Zs (w )
Ys (w ) - 2Y0 Z0 - 2Zs (w )
=
Ys (w ) + 2Y0 Z0 + 2Zs (w )
(2.49)
(2.50)
Note that the configuration in Fig. 2.11 is symmetrical, and hence the reflection coefficients Ra (w ) and Rb (w ) are identical and we may define
R0 (w ) = Ra (w ) = Rb (w )
(2.51)
(2.52)
Ub+ (w )
Us+ (w )
(2.53a)
(2.53b)
(2.53c)
These equations can be rewritten using the reflection function R0 (w ) defined by Eq.
(2.51) as
[
]
Ub+ (w ) = R0 (w )Ub- (w ) + [1 + R0 (w )][Ua+ (w ) + Us- (w )]
Us+ (w ) = -[1 + 2R 0(w )]Us- (w ) - 2R0 (w ) [Ua+ (w ) + Ub- (w )]
Ua- (w ) = R0 (w )Ua+ (w ) + [1 + R0 (w )] Ub- (w ) + Us- (w )
(2.54a)
(2.54b)
(2.54c)
It is possible to derive an efficient configuration for the side branch junction for volume velocity that is reminiscent of the one-multiplier KL junction developed in Section
2.2.1. This is achieved by regrouping the terms in the above equations. This results in
Ua- (w ) = Ub- (w ) + Us- (w ) + W(w )
(2.55a)
(2.55b)
47
(2.55c)
(2.55d)
The term W(w ) is common to Eqs. (2.55a)(2.55c) and the reflection function R0 (w )
appears only in Eq. (2.55d). Thus the implementation of this side branch junction
involves only one filtering operation.
In Vlimki et al. (1993b) this result was derived using characteristic impedances
instead of admittances. A special case of the three-port junction, where the volume
velocity us- was assumed to be zero, has been presented by Borin et al. (1990).
Side Branch Junction for Sound Pressure
We shall now derive the model for the side branch junction in terms of sound pressure
signals. The equations for the pressure signals are obtained from Eqs. (2.47) by substituting Y1 = Y2 = Y0 , Y3 = Ys (w ), p1 = pa , p2 = pbm , and p3 = psm . This yields
Pa- (w ) =
(2.56a)
Pb+ (w ) =
(2.56b)
Ps+ (w )
(2.56c)
Ys (w ) + 2Y0
These equations can be simplified, using the reflection function R0 (w ) that has been
defined by Eq. (2.51), as
Pa- (w ) = R0 (w ) Pa+ (w ) + [1 + R0 (w )]Pb- (w ) - R0 (w )Ps- (w )
(2.57a)
(2.57b)
(2.57c)
(2.58a)
(2.58b)
(2.58c)
48
(2.58d)
Vocal Folds
Lips
Vocal Tract
L = cT
Fig. 2.12 Approximation of the vocal tract profile using conical tube sections.
49
formant structure can be obtained using a sparse tube model with only a few sections.
In this section, we first introduce a digital filter model for a junction of two conical
tube sections with different tapers and diameters. A useful special case is a junction of
two conical sections with different tapers but the same diameter. This is applicable to
the modeling of at least partly conical acoustic tubes as well as modeling of arbitrary
tube profiles. In addition, a discrete-time model for the radiation and reflection at the
open end of a conical tube is developed. This model can be applied to the modeling of
wind instruments.
We first discuss wave propagation in a conical tube and derive the reflection and
transmission functions that govern the scattering of pressure waves in a junction of two
conical tube section.
2.4.1 Traveling Wave Solution for Spherical Waves
The wave equation for spherical waves propagating in a conical tube is obtained by
writing the general three-dimensional wave equation in spherical coordinates and
assuming that the pressure p is a function of r and t only. This yields
1 2 p 1 2 p
r
=
r 2 r r c2 t 2
(2.59)
where r is the distance from the cone apex. This may be rewritten as
2
2 rp
2 rp
=
c
t 2
r 2
(2.60)
Substituting y = rp it is seen that the equation is of the same form as the onedimensional wave equation [see Eq. (2.1)]
2
2y
2 y
=
c
t 2
r 2
(2.61)
The general solution thus has a form similar to that given by Eq. (2.2), that is
(2.62)
where f + and f - are two traveling waves of arbitrary shape. The solution in terms of
pressure p is obtained by substituting y = rp so that
p(r,t) =
1 +
1
f (r - ct) + f - (r + ct)
r
r
(2.63)
where (1 r) f + (r - ct) represents a pressure wave component propagating in the positive x direction with speed c whereas (1 r) f - (r + ct) represents a pressure wave propagating in the negative x direction with the same speed.
The solution is similar to Eq. (2.2) except for the factor 1/r. This difference is interpreted so that the amplitude of a pressure wave is decreased as it travels further away
from the apex of the cone and, equivalently, the pressure is increased as the wave travels towards the apex. Note that the amplitude of the wave would be infinite at r = 0, and
therefore it is not possible to consider the behavior of the wave at the cone apex.
50
Cone a
Cone b
ra
rb
+
pa (t,r)
ua+(t,r)
Ab
Aa
ua (t,r)
pa (t,r)
pb (t,r)
ub+(t,r)
-
ub (t,r)
pb (t,r)
Ya+
Ya
Yb+
Yb
Fig. 2.13 A junction of conical tubes that have a different diameter at the junction. In
this example Ab > Aa , ra < 0, and rb > 0 .
2.4.2 Junction of Conical Tube Sections
In the following we derive equations that govern the reflection and transmission of
pressure waves at the junction of two conical tubes of different diameter and taper. We
shall not discuss the scattering of volume velocity waves, since this case is essentially
more complicated (see Ayers et al., 1985).
The following conditions must be fulfilled at the junction:
pa+ (r,t) + pa- (r,t) = pb+ (r,t) + pb- (r,t)
(2.64)
(2.65)
The physical interpretation of the above and the following equations is depicted in Fig.
2.13. Our notation follows the guidelines of Gilbert et al. (1990) .
The admittances in different directions determine the ratio of volume velocity and
pressure components of the propagating wave, that is
Ya =
ua
pa
(2.66a)
Y b =
ub
pb
(2.66b)
On the other hand, the acoustic admittances of spherical waves in a conical tube are
defined as (see, e.g., Kinsler and Frey, 1950, pp. 300303)
Ya (w ) =
Aa
j
1 m
r c kra
(2.67a)
Note that in some references, e.g., Martnez and Agull (1988) and Gilbert et al. (1990), the superscripts + and are used to denote opposite directions with respect to our notation.
51
Y b (w ) =
Ab
j
1 m
r c krb
(2.67b)
where k = w / c is the wave number. Note that the acoustic admittance is always a
function of frequency in the case of conical tubes. For this reason we use frequencydomain description in the following.
Based on Eqs. (2.65) and (2.66) we may write
Pa+ (w )Ya+ (w ) - Pa- (w )Ya- (w ) = Pb+ (w )Y b+ (w ) - Pb- (w )Y b- (w )
(2.68)
Eliminating Pb+ (w ) using Eq. (2.6) and solving for Pa- (w ) yields
Pa- (w ) =
Ya+ (w ) - Y b+ (w ) +
Y b+ (w ) + Y b- (w ) P
(
w
)
+
Pb (w )
a
Ya- (w ) + Y b+ (w )
Ya- (w ) + Y b- (w )
(2.69)
Here the first term determines the pressure signal that reflects back from the junction
while the second term determines the transmitted pressure signal. Thus we may define
the reflection function R + (w ) and the transmission function T - (w ) in the following
way:
R + (w ) =
Ya+ (w ) - Y b+ (w )
Ya- (w ) + Y b+ (w )
(2.70a)
T - (w ) =
Y b+ (w ) + Y b- (w )
Ya- (w ) + Y b+ (w )
(2.70b)
Similarly, we may eliminate Pa- (w ) from Eq. (2.68) and solve for Pb+ (w ). We then
obtain
Pb+ (w ) =
Ya+ (w ) + Ya- (w ) +
Y b- (w ) - Ya- (w ) Pa (w ) + +
Pb (w )
Y b+ (w ) + Ya- (w )
Y b (w ) + Ya- (w )
(2.71)
Y b- (w ) - Ya- (w )
Y b+ (w ) + Ya- (w )
(2.72a)
T + (w ) =
Ya+ (w ) + Ya- (w )
Y b+ (w ) + Ya- (w )
(2.72b)
Now we can derive general formulas for the reflection and transmission functions by
substituting Eqs. (2.67a) and (2.67b). After some algebraic manipulation this yields
R + (w ) =
B -1
2Ba
B + 1 (B + 1)( jw + a )
(2.73a)
R - (w ) =
1- B
2a
1 + B (1 + B)( jw + a )
(2.73b)
52
2B
a
+
1 = 1 + R (w )
B +1
jw + a
(2.73c)
T - (w ) =
2
a
1 = 1 - R (w )
B +1
jw + a
(2.73d)
where B is the area ratio of the conical sections a and b at the junction, i.e.,
B=
Aa
Ab
(2.74)
and a is defined by
a=-
Aa Ab
c
-
Aa + Ab ra rb
(2.75)
A major difference between the junction of cylindrical tubes used in the KL model
and of the conical tubes presented above is that now the scattering is described by
means of a filter instead of a real coefficient. Note, however, that in limit ra , rb the
reflection and transmission functions given by Eqs. (2.73) approach the reflection and
transmission coefficients for a junction of two cylindrical tubes specified for pressure
waves (see Section 2.2.4).
The expression for the reflection function R + of a junction of conical tubes of different diameters and tapers has also been derived by Martnez and Agull (1988) but using
another approach. Also, they use the subscripts + and to mean exactly the opposite.
Above we have derived equations for the reflection and transmission functions of a
junction of two conical tubes using acoustic admittances. This choice was made because
it leads to simpler formulas. As an example we give the equation for R + (w ) using
acoustic impedances to show the difference between these two formulations:
1
Zb+ (w ) - Za+ (w )
R (w ) =
= +
1
1
Za (w )Zb+ (w )
+
+ Za+ (w )
+
Za (w ) Zb (w )
Za (w )
+
Za+ (w )
Zb+ (w )
(2.76)
a
jw + a
a
jw + a
(2.77a)
(2.77b)
53
Cone a
rb
+
pa (t,r)
ua+(t,r)
Cone b
ra
pb (t,r)
ub+(t,r)
ua (t,r)
pa-(t,r)
Ya
Ya
ub (t,r)
pb (t,r)
Yb
Yb
Fig. 2.14 A junction of conical tubes that have the same diameter A at the junction. In
this example ra < 0 and rb > 0.
where a is defined as in Eq. (2.75). However, now when the areas Aa and Ab are equal,
the coefficient a is simplified into the form
a=-
c
c
c(ra - rb )
+
=
2ra 2rb
2ra rb
(2.78)
where c is the speed of sound, and ra and rb are the distances from the (imagined) tips
of cones a and b, respectively, as illustrated in Fig. 2.14. Note that now the reflection
and transmission functions given by Eqs. (2.77) are the same seen from both sides of
the junction.
2.4.4 Closed and Open End of a Truncated Conical Tube
When the end of a truncated conical tube is closed, the radiation impedance Zb+ (w )
tends to infinity, which implies that the radiation admittance Y b+ (w ) tends to zero. Then
the reflection function of a spherical wave is given by
R + (w ) =
Ya+ (w ) - 0 1 - j / kra jw + c / ra
2c / ra
=
=
=1+
jw - c / ra
Ya (w ) + 0 1 + j / kra jw - c / ra
(2.79)
This equation can be used for modeling the reflection of acoustic wave at the closed end
of a conical tube. For ra = 0 Eq. (2.79) is not well defined. This is not a problem in
practice because the case is not physically meaningful. Note that in the limit ra
the transfer function approaches unity, which is the reflection coefficient for a closed
end of a lossless cylindrical tube (see Section 2.2.5).
The acoustic impedance of an open end of a conical tube has not been formulated.
Thus it is not possible to derive the corresponding reflection function. Martnez and
Agull (1988) have proposed an ad hoc continuous-time reflection function that can be
used as a first approximation:
-a 2te -at
r0 (t) =
0
with
t0
t<0
(2.80a)
54
2ra
Do
(2.80b)
where c is the speed of sound, Do is the diameter of the open end, ra the distance of the
opening from the apex of the conical tube section, and e the Neperian (e = 2.71828).
The function given by Martnez and Agull approximates the reflection of a pressure
wave from the open end of a conical tube in an infinite wall. This assumption is often
used in models of speech production. The above reflection function is not a very accurate approximation as may be observed by comparing the impulse responses shown in
Martnez and Agull (1988), but it may be good enough for a speech synthesizer.
Note that the case of a conical tube in an infinite wall is not interesting for modeling
wind instruments. Rather, we should find a proper approximation for the situation
where there is no wall of any kind, but the tube is in free space.
Another approximation for the radiation impedance of the open end of a conical tube
is obtained by using the impedance of a spherical source of an equivalent radiation area.
Interestingly enough, this impedance has the same form as that of an infinite conical
section.
2.4.5 Discrete-Time Reflection and Transmission Filters
In this section we discuss digital waveguide modeling of conical tube systems. We convert the continuous-time reflection functions derived in Section 2.1 into discrete-time
filters using the impulse-invariant transformation. This approach has been suggested by
Smith (1987) to be used in digital waveguide modeling.
The reflection function for the taper and diameter discontinuity in a conical tube is
given by Eq. (2.73a) as
R(w ) =
B -1
2Ba
B + 1 (B + 1)( jw + a )
(2.81)
where the coefficient a is defined by Eq. (2.75). The digital filter for the reflection
junction is obtained applying the impulse-invariant transform to Eq. (2.81):
R(z) =
B - 1 2BTa (B + 1)
B + 1 1 - e - aT z -1
(2.82)
b0
1 + a1z -1
(2.83a)
(2.83b)
pk
pk-
55
R(z)
pk++1
pk-+1
Fig. 2.15 The two-port junction that models the scattering of pressure signals at the
junction of two conical tubes when there is no discontinuity of diameter.
where a is defined as in Eq. (2.78). It is necessary to scale this transfer function to have
a gain of 1 at w = 0. This is achieved by setting
b0 = -(1 + a1 ) = -1 + e - aT
(2.83c)
The discrete-time model for the junction of two conical tubes that have the same diameter at the junction is illustrated in Fig. 2.15.
Figure 2.16 shows the magnitude responses of the continuous-time filter R(w ) and
its discrete-time version R(e jw ) (obtained with the impulse-invariant method) with five
ra = 0.1
-5
-10
rb = 0.01
-15
rb = 0.02
Magnitude (dB)
-20
-25
rb = -0.2
-30
rb = 0.2
-35
-40
rb = 0.12
-45
-50
-55
0
10
Frequency (kHz)
15
20
Fig. 2.16 Magnitude responses of the analog (dashed lines) and digital (solid lines)
reflection functions with taper discontinuity only. The sampling rate is 44.1
kHz.
56
R(
z)
s
ta
bl
e
R(
z)
u
R(z) stable
ns
ta
bl
rb
(0,0)
ta
bl
e
R(z) unstable
R(
z)
s
R(
z)
u
ns
ta
bl
ra
Fig. 2.17 The stability of the reflection filter R(z) depends on the parameters ra and rb .
The filter is unstable in the shaded regions and stable in the white ones.
different values of rb . The parameter ra is equal to 0.1 m in these examples. The effect
of aliasing due to the impulse-invariant transformation is clearly seen by comparing the
curves of the analog (dotted lines) and the digital (solid lines) filters. Note, however,
that the difference between the magnitude response pairs is several dB only at frequencies over 10 kHz. At low frequencies, which are of great importance, the digital approximations coincide with the analog ones.
2.4.6 Stability of the Reflection Filter
The reflection filter R(z) defined by Eq. (2.83) is stable when e - aT < 1 which implies
that a must be positive (since T > 0). The sign of a depends on the values of ra and rb
according to Eq. (2.78).
The regions of stability and instability of R(z) are illustrated in Fig. 2.17 on a
parameter plane where the abscissa corresponds to the value of ra and the ordinate to
the value of rb . There are three distinct regions where the filter R(z) is stable and three
where it is unstable. The unstable regions include half of all possible conical tube configurations. These cases may be impossible to avoid in practice.
Fortunately it appears that having an unstable filter as a part of a larger system does
not necessarily imply that the overall system is unstable. Physical systems are passive
and they are always stable. Also digital models that simulate physically realizable systems have to be stable.
2.4.7 Modeling the Ends of a Conical Tube
The reflection of pressure waves at an open or a closed end of a conical tube depends on
the frequency as was discussed in Section 2.4.4. The continuous-time formulas that
describe the reflection at the end of a conical tube may be converted into digital filters
using the impulse-invariant transformation.
The impulse-invariant transform of the reflection function (2.79) for the closed end
of a conical tube yields a first-order IIR filter
57
R(z) = 1 +
b0
1 + a1z -1
(2.84)
with the coefficients b0 = 2cT / ra and a1 = -ecT / ra . This transfer function is stable
when ra < 0 but unstable when ra > 0.
The open end of a conical tube in an infinite wall can be modeled using the continuous-time reflection function of Eq. (2.80) designed by Martnez (Martnez and Agull,
1988). It is easily transformed into the frequency domain by using the property that
differentiation in the frequency domain corresponds to multiplication by time in the
time domain. Thus, the Laplace transform may be written as
R0 (s) = -a 2
d 1
a3
=ds s + a
(s + a)2
(2.85)
d
1
-aT -1
dz 1 - e z
(2.86)
This yields
R0 (z) =
b1z -1
1 + a1z -1 + a2 z -2
(2.87a)
(2.87b)
The impulse response of the discrete-time filter given by Eq. (2.87) is a sampled version of Eq. (2.80a), that is
h(nT ) = -a 2 nTe -anT
for n = 0, 1, 2, K
(2.88)
58
such as the saxophone. Furthermore, the stability of the overall waveguide model is an
issue that should be demonstrated in the general case or at least for certain useful configurations.
Another important topic for future research is to verify the accuracy of the digital
model by acoustic measurements. Tube systems including conical parts can be excited
by a short pulse and the response measured by a microphone. The actual impulse
response is then obtained by deconvolution of the pulse and the response. These results
may be compared to those obtained using a waveguide model. Agull et al. (1994,
1995) have measured conical tube systems and showed that the results are in good
agreement with the theoretical formulations reported in Martnez and Agull (1988).
A major unsolved problem is the reflection function for the open end of a conical
tube. The exact mathematical solution is obviously involved. Hence we suggest a series
of measurements of impulse responses and reflection functions of open conical horns of
different taper and diameter. A useful digital reflection filter may then be designed by
applying one of the standard techniques used in digital signal processing.
leff f s
c
(2.89)
where f s is the sampling rate. Note that in general L is not an integer and again a fractional delay filter must be used to implement a real-valued delay.
The reflection model consists of a digital filter that brings about the frequencydependent reflection and radiation of the sound wave at the end of the bore. Note that
the reflection model has two outputs, one for the outgoing signal and the other for the
reflected, ingoing signal.
The excitation model includes a nonlinearity which is an essential part of the sound
production mechanism in wind instruments. In the case of the clarinet, this system
models the operation of a reed that controls the air flow through the mouth piece. The
same nonlinearity is also suitable for other reed wind instruments, such as the saxo-
Input
Excitation
Model
Delay Line
Delay Line
Reflection
Model
Output
59
phone (Cook, 1988). In the flute, the nonlinearity models the airflow into the bore and
is similar to a sigmoid function (Karjalainen et al., 1991; Vlimki et al., 1992a;
Vlimki, 1992).
Cook (1991, 1992) introduced a waveguide model that is capable of synthesizing
brass instrument tones. His system is also based on the principle shown in Fig. 2.18.
However, the nonlinearity has been obtained by modeling the lips of the player as a
mass-spring oscillator. For discrete-time simulation the differential equations have been
replaced with difference equations.
2.5.1 Need for Finger-Hole Models
Later in this work we will address the problem of simulating finger holes in the bore of
a woodwind instrument. Finger holes (or tone holes, or side holes) are a unique feature
of woodwinds. They have two essential functions:
1) controlling the fundamental frequency, and
2) influencing the sound quality.
The effect of open finger holes on the sound is characterized by the open-hole lattice
cutoff frequency (Benade, 1960, 1976). Above this frequency the sound waves mainly
radiate out of the tube and produce only weak resonances in the tube. Furthermore, even
closed tone holes have their effect, as they effectively lengthen and enlarge the bore in
their vicinity (Benade, 1960, 1976).
So far the tone holes have usually been neglected in digital waveguide models for
woodwind instruments (see Smith, 1986; Vlimki et al., 1992a, 1992b; Cook, 1992).
The obvious limitation in these models is that the fundamental frequency of a synthesized tone can only be controlled by changing the length of the delay lines that form the
digital waveguide model for the bore. From the viewpoint of the sound wave propagating inside the bore, this is a very unnatural procedure. Dropping or adding unit delays
causes a discontinuity in the signal which can be heard as an annoying click. Another
deficiency of simplified woodwind bore modeling is that the tone hole lattice cutoff
effect is not automatically included in the model. Thus, its effect has to be added separately, usually during the design of the radiation filter.
In addition to the above mentioned problems, the lack of tone holes limits the playing styles that are available for the user of the woodwind instrument model. Multiphonics, for example, cannot be produced by a model without finger holes. When
proper finger-hole models are included, multiphonics as well as other special effects,
like the key claps, become available to the user of the computer model such as they are
for the player of an acoustic instrument.
2.5.2 Modeling of a Single Woodwind Finger Hole
In traditional transmission line models for woodwind instruments a side hole is presented as a T network (Keefe, 1982). A two-by-two linear transformation will convert T
parameters to reflections and transmission coefficients. The T network model includes
both shunt and series impedances. In practice, the series impedance has a small value
and can be neglected (Coltman, 1979).
It is most important to simulate accurately the open finger holes since they determine
the fundamental frequency of the tone. Closed holes have a considerably less significant
60
effect and for simplification they are omitted in our model. In the following a computational model for an open hole is derived.
Model for an Open Finger Hole
The simplest approximation for the acoustic impedance Zs of a small open hole is pure
acoustic inertance, that is (see, e.g., Fletcher and Rossing, 1991)
Zs (w ) = jw
rl
As
(2.90)
where r is the density of air, As is the cross-sectional area of the finger hole, and l is
its effective height (see Keefe, 1982, 1990). Multiplication by j w in the frequency
domain corresponds to differentiation in the time domain. In the discrete-time model the
impedance of an open hole can be approximated by means of a digital differentiator
HD (z). The transfer function of a digital filter approximating Eq. (2.89) is then
rl
Zs (z) =
HD (z)
As
(2.91)
Let us denote the discrete-time version of the reflection function of Eq. (2.49) by
R(z). This reflection function is expressed as
R(z) = -
Z0
Ys (z)
=Ys (z) + 2Y0
Z0 + 2 Zs (z)
(2.92)
where Ys (z) and Zs (z) denote the discrete-time approximations for the continuous-time
functions Ys (w ) and Zs (w ), respectively. When Eq. (2.91) is substituted into Eq. (2.92)
the digital reflection filter associated with the finger-hole junction is given by
R(z) = -
Z0
r c A0
=
(r c A0 ) + (2 r l As ) HD (z)
Z0 + 2 Zs (z)
(2.93)
1
(1 - z -1 )
T
(2.94)
where T is the sampling interval. This filter computes the difference of two successive
input samples and acts as a high-pass filter. Using this approximation Eq. (2.93) can be
simplified into the form
1+ a
1 + az -1
(2.95a)
2 A0 l
2 A0 l + AsTc
(2.95b)
Equation (2.95a) is the transfer function of a first-order all-pole filter. Since the coeffi-
61
-5
Magnitude (dB)
-10
-15
-20
-25
-30
-35
-40
0
10
12
Frequency (kHz)
14
16
18
20
Fig. 2.19 Analog (dashed lines) and digital (solid lines) approximations to the reflection function of a finger-hole junction.
cient a is always negative this is a lowpass filter. At very low frequencies it acts like a
constant multiplier, multiplying by 1.
The magnitude responses of the analog reflection function R0 (w ) computed using
Eq. (2.49) with the impedance defined in Eq. (2.90) and the digital reflection function
R(e jw ) of Eq. (2.95) are illustrated in Fig. 2.19. The sampling rate used in the discretetime simulation is 44.1 kHz, i.e., T = 22.7 ms. The upper curves are computed for a hole
radius of 8.0 mm and an effective height of 17.5 mm, and the lower curves for a hole
radius of 6.25 mm and an effective height of 31.7 mm. These examples correspond to
two different finger holes of the flute (Coltman, 1979). The bore radius in the flute is
9.5 mm. The approximation error in the magnitude response of the digital filter is less
than 1 dB at frequencies below 10 kHz.
Each finger-hole model requires one filter consistent with Eq. (2.95). By controlling
the filter coefficient a in the range [ a0 , 0] (where a0 is the nominal value for the open
hole) in a suitable way, the finger hole can be closed and opened as in real woodwind
instruments to attain desired changes in the effective length of the bore.
2.5.3 Discrete-Time Scattering Junction
Let us consider the implementation of a finger-hole model via a three-port junction as
discussed in Section 2.3.3. The frequency-domain Eqs. (2.55) that describe the threeport junction for volume velocity waves can be expressed in the (discrete) time domain
62
us
us
ua+
ua-
R(z)
ub+
w(n)
ub-
Fig. 2.20 A computational model for scattering of volume velocity waves in a fingerhole junction.
as
ua- (n - D) = ub- (n - D) + us- (n) + w(n)
(2.96a)
(2.96b)
(2.96c)
where D is the real-valued delay corresponding to the position of the center of the finger
hole and w(n) is the output of the reflection filter R(z) (see Fig. 2.20), i.e.,
w(n) = r(n)*w1 (n)
(2.97)
where r(n) is the impulse response of the reflection filter R(z), * denotes the convolution sum, and the input signal w1 (n) is
w1 (n) = ua+ (n - D) + ub- (n - D) + us- (n)
(2.98)
The signal flow graph for a finger-hole model is depicted in Fig. 2.20. In this figure
the variable quantities are the same as in Fig. 2.11.
63
64
a practical digital waveguide system also incorporates losses in terms of gain factors
less than one or digital (lowpass) filters whose magnitude response is not greater than
unity at any frequency.
The relationship of the feedback delay network and the digital waveguide network
has been studied by Smith and Rocchesso (1994). Recently, the digital waveguide
technique has been generalized to two dimensions (Van Duyne and Smith, 1993).
Further generalization to three or more dimensions is straightforward. Savioja et al.
(1994) have demonstrated that a three-dimensional waveguide mesh can be used for the
simulation of room acoustics.
2.7 Conclusions
In this chapter the fundamentals of waveguide modeling were discussed. Digital
waveguides provide a systematic method for designing computational models for
physical systems. So far this approach has primarily been applied to one-dimensional
resonators since two and three-dimensional extensions have been introduced only
recently.
The digital waveguide methods have been used for synthesis of music and speech.
Other techniques that lead to similar structures as WGFs were reviewed. These include
the KarplusStrong plucked string algorithm, the KellyLochbaum vocal tract model,
delay networks for reverberation, and wave digital filters.
One of the main drawbacks of the WGFs is the discretization of the time variable,
which leads to non-accurate modeling of propagation delays. This is the motivation for
the use of fractional delays in waveguide models. In the next chapter we discuss the
theory of fractional delay filters.
(3.1)
65
66
xc (t)e
jt
dt
(3.2)
where W = 2pf is the angular frequency in radians. The Fourier transform Yc (W) of the
delayed signal yc (t) can be written in terms of Xc (W)
Yc (W) =
yc (t)e
- jWt
dt =
xc (t - t )e
- jWt
dt = e - jWt Xc (W)
(3.3)
The transfer function Hid (W) of the delay element can be expressed by means of
Fourier transforms Xc (W) and Yc (W). This yields
Hid (W) =
(3.4)
(3.5)
where D = t T is the desired delay as multiples of the unit delay. Note that t T is generally irrational since t is usually not an integral multiple of sampling interval T.
Unfortunately, Eq. (3.5) is meaningful only for integral values of D. Then the samples
of the output sequence y(n) are equal to the delayed samples of the input sequence x(n),
and the delay element is called a digital delay line. However, if D were real, the delay
operation would not be this simple, since the output value would lie somewhere
between the known samples of x(n). The sample values of y(n) would then have to be
obtained by way of interpolation from the sequence x(n).
The spectrum of a discrete-time signal can be expressed by means of the discretetime Fourier transform (DTFT). In this integral transform, the time variable is discretized, but the frequency variable is continuous. The DTFT of signal x(n) is defined as
(see, e.g., Roberts and Mullis, 1987, pp. 8788; Jackson, 1989, pp. 106118)
X(w ) =
x(n)e- jwn ,
w p
(3.6)
n=-
where w = 2pfT is the normalized angular frequency. The DTFT of the output signal
y(n) can be written as
y(n)e
- jwn
n=-
67
=
(3.7)
n=-
w p
(3.8)
This is the same result as given in Eq. (3.4)only now the angular frequency is circular
due to discretization in time. In order to be consistent with the z-transform notation used
commonly in digital signal processing, let us express the transfer function as
Hid (z) =
Y(z) z - D X(z)
=
= z- D
X(z)
X(z)
(3.9)
where D R+ is the length of the delay in samples. The delay D may be written in the
form
D = Dint + d
(3.10)
where 0 d < 1 is the fractional delay and the integer part Dint is given by
Dint = D
(3.11)
where is the greatest integer function. It is often called the floor function and is
defined as
x = max k
(3.12)
k x
where x R and k Z. The block diagram of the ideal delay element is illustrated in
Fig. 3.1.
Note that the z-transform representation in (3.9) is used in the Fourier transform
sense so that z = e jw . In principle, the z-transform is defined only for integral powers of
z and thus, if D were real, the term z - D should be written as an infinite series making
the notation unnecessarily involved.
To understand how to produce a fractional delay using a discrete-time system it is
necessary to discuss interpolation techniques. Interpolation of a discrete-time signal is
based on the fact that the amplitude of the corresponding continuous-time bandlimited
signal changes smoothly between the sampling instants.
x(n)
-D
y(n)
68
w
sin s (t - nT )
w
= x(nT )sinc s (t - nT )
xc (t) = x(nT ) 2
ws
2p
n=-
n=-
(t - nT )
2
(3.13)
where w s = 2pf s is the sampling angular frequency in radians per second and T is the
corresponding sampling interval, i.e., T = 1 f s . The sinc function is defined as
sinc(t) =
sin(pt)
pt
(3.14)
wt
sin s
2
wt
= sinc s
hc (t) =
w st
2p
2
(3.15)
for t R. This impulse response that Shannon called the Whittaker cardinal function
converts a discrete-time signal to a continuous-time one. In delay applications, however,
it is necessary to know the value of a signal at a single time instant between the
samples. The desired result may be obtained by shifting Eq. (3.15) by D and then
sampling it at equidistant points. Hence, the output y(n) of the ideal discrete-time fractional delay element is computed as
y(n) = x(n - D) =
x(k)sinc(n - D - k)
(3.16)
k =-
for n Z and D R. Here we have simplified the notation setting the sampling rate f s
to 1 (and consequently, the sampling interval T = 1), since with a discrete-time system
description it is not necessary to fix the actual sampling rate.
It can be concluded that producing a fractional delay requires reconstruction of the
discrete-time signal and shifted resampling of the resulting continuous-time signal. In
Eq. (3.16) these two operations have been combined.
69
w p
(3.17)
The frequency response of the ideal delay element has the following magnitude and
phase characteristics
Hid (e jw ) 1
(3.18a)
(3.18b)
In other words, the delay element is a linear-phase allpass system. Its phase delay and
the group delay are, respectively,
Q id (w )
=D
w
(3.19a)
Q id (w )
=D
w
(3.19b)
t p, id (w ) = and
t g, id (w ) = -
where Q id (w ) is the ideal phase response as defined in Eq. (3.18b). From these results
it can be concluded that the ideal delay element passes all the frequency components of
an incoming signal with the same delay D.
The inverse discrete-time Fourier transform of Eq. (3.17) gives the impulse response
(IR) of an ideal delay element [compare with Eq. (3.16)] as
p
1
1
1
Hid (e jw )e jwn dw =
e - jwDe jwn dw =
e jw (n- D) dw
hid (n) =
2p -p
2p -p
2p -p
(3.20)
=
If D is an integer (d = 0), the IR of the delay element is zero at all sampling points
except at n = D, that is
1 for n = D
d = 0 hid (n) =
0 otherwise
(3.21)
In this case, the delay element is implemented by a cascade of unit delays as discussed
earlier. When D is a fractional number, i.e., 0 < d < 1, the IR has non-zero values at all
index values n Z , or
0 < d < 1 hid (n) 0 for all n
(3.22)
70
Amplitude
0.5
-0.5
-2
-1
3
4
Sample Index
-1
3
4
Sample Index
Amplitude
0.5
-0.5
-2
Fig. 3.2 (Upper) The sinc function (dashed line) shifted by D = 3.0. The circles indicate the sampled values of the sinc function. All the sample values except the
centermost are zero when the delay is an integer, that is d = 0. (Lower) The
sinc function shifted by D = 3.3. The dotted line indicates the center point of
the shifted sinc function. This figure illustrates the fact that all the sample values (circles) are non-zero in the case of a fractional delay, i.e., d 0.
These time-domain properties of the ideal delay element are illustrated in Fig. 3.2. In
the upper part of this figure, the delay D is an integer and only one sample is non-zero
because the zero crossings of the sinc function coincide with the other sampling points.
In the lower part of Fig. 3.2, however, the delay D is a fractional number and all the
samples on the interval (, ) are non-zero.
To conclude, the impulse response hid (n) of the delay element is a shifted and sampled version of the sinc function which is infinitely long. Due to this the IR corresponds
to a noncausal filter which cannot be made causal by a finite shift in time. In addition,
the filter is not BIBO stable since the impulse response (3.20) is not absolutely
summable. This kind of filter is nonrealizable. To produce a realizable fractional delay
filter, some finite-length approximation for the sinc function has to be used.
Before considering these approximations we notify one particular property of the
ideal transfer function that makes the fractional delay approximation difficult. The
imaginary part of the ideal transfer function is
71
} w =p = - sin(pD)
Im Hid (e jw )
(3.23)
This implies that when D is a fractional number, the transfer function has a complex
value at w = p. Discrete-time filters with real coefficients have the property that
H(e jw )
w =p
(3.24)
This implies that, at the Nyquist frequency, the approximation error cannot be smaller
than sin(pD) . Cain and Yardim (1994) call this the Tarczynski bound.
Some of the useful fractional delay approximation techniques will be discussed in
the following sections.
3.1.5 Conclusion and Discussion
Up to now we have discussed the theoretical background of noninteger digital delay, or
fractional delay. The relationship between the underlying analog signal and fractional
delay was clarified. Furthermore, the characteristics of the ideal fractional delay element were reviewed and it was shown that the ideal FD system (i.e., ideal bandlimited
interpolator) is nonrealizable, since the corresponding impulse response is infinitely
long and noncausal. Thus, approximation techniques have to be employed to obtain a
finite-length causal implementation of FD.
Formerly, similar theoretical aspects have been considered in the context of signal
recovery and multirate signal processing. Implementation of a fractional delay requires
that the system is in principle able to compute the amplitude of the signal at any time
instant between known samples. Thus, the FD problem and recovery of an analog signal
from its samples are essentially equivalent.
The major differences between the FD problem and sampling-rate conversion are as
follows.
1) In FD processing only one new sample per sampling interval needs to be computed, whereas in interpolation for increasing the sampling rate, several new
samples per original sample interval may be needed.
2) In the FD problem the fractional interval d to be approximated is in general irrational, whereas in sample-rate conversions, simple rational ratios are often used.
The underlying theory of these two DSP problems is, of course, the same.
Even today, FD processing is not a well-known topic in signal processing.
Discussion on this topic has appeared only in few textbooks (see, e.g., Crochiere and
Rabiner, 1983, pp. 271274; Regalia, 1993, pp. 948953).
72
h(n)z -n
(3.25)
n=0
where N is the order of the filter and h(n) (n = 0, 1, ..., N) are the real coefficients that
form the impulse response of the FIR filter. Note that the length of the impulse response
(i.e., the number of the filter coefficients) is
L=N+1
(3.26)
In the design procedure our aim is to minimize the error function defined by
E(e jw ) = H(e jw ) - Hid (e jw )
(3.27)
i.e., the difference of the ideal frequency response and the approximation.
3.2.1 Least Squared Integral Error Design
The intuitively most attractive method for designing realizable FD filters is certainly the
truncation of the ideal IR defined by Eq. (3.20). This method minimizes the least
squared (LS) error function ELS which is equal to the L2 norm (integrated squared
magnitude) of the error frequency response E(e jw ), that is
p
ELS
2
2
1
1
= E(e jw ) dw = H(e jw ) - Hid (e jw ) dw
p0
p0
(3.28)
Using Parsevals relation this equation can be converted into the time domain. This
results in
ELS =
n=-
(3.29)
n=-
From this equation, it is possible to derive a closed-form solution for the squared integral error in the case of fractional delay approximation. According to Parsevals relation, the first term in (3.29) can be evaluated in the following way:
2
2
1
1
hid (n) = p Hid (e jw ) dw = p e- jwD dw = 1
n=-
0
0
2
(3.30)
The second and third term of Eq. (3.29) include the coefficients of the Nth-order FIR
filter and thus the summation indices can be limited. The closed-form solution is
ELS = 1 +
(3.31)
n=0
73
maximum value, i.e., the central point of hid (n). The approximation error resulting from
this approach may be written as
ELS =
-1
hid (n)
hid (n)
(3.32)
n= N +1
n=-
It is seen that the approximation error decreases as N increases. This is intuitively clear.
The impulse response h(n) of the LS FD FIR filter can be expressed as
sinc(n - D), 0 n N
h(n) =
otherwise
0,
(3.33)
The delay D should be located between the two central taps of the filter when N is
odd (L = N + 1 is even), or within half a sample from the central tap when N is even (L
odd), since then the approximation error is smallest. This means that the delay D should
be chosen so that the following inequality is valid:
N -1
N +1
D
2
2
(3.34)
For odd-order FIR interpolators (N odd and L even) this simply implies that the integer
part Dint of the delay has to be chosen in the following way:
Dint =
N -1
2
(3.35)
When N is even (L odd) the integer part of the delay should be chosen so that
Dint
1
N
when 0 d <
2
= 2
N
1
- 1 when d < 1
2
2
(3.36)
This will ensure that point D is located within half a sample from the midpoint of the
interpolator and consequently that the approximation error is smallest possible with the
used FIR design technique.
Above we have assumed that the FIR filter coefficients are truncated from the
beginning of the ideal impulse response. This is not the only possible choice. The index
M of the first non-zero sample should be chosen in the following way:
round(D) - N
2
M=
D - N - 1
for even N
(3.37)
for odd N
where round() denotes the operation of rounding to the nearest integer, and is the
greatest integer function as given by Eq. (3.12). The FIR filter h(n) will be causal if M
0. If M < 0, the filter will be noncausal and thus nonrealizable. An integer must then
be added to D in order to shift the central point of the sinc function so that the requirement of Eq. (3.34) is fulfilled. This is equivalent to adding unit delays to the delay line.
74
Magnitude (dB)
5
0
-5
-10
-15
-20
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
2
1.8
1.6
1.4
1.2
1
0
Fig. 3.3 Magnitude responses (upper) and phase delay curves (lower) of a third-order
FIR filter designed by truncating the shifted sinc function. The curves are
plotted for 11 fractional delay values between 1.0 and 2.0. Note that the
magnitude responses for N - D are the same as for D.
However, unit delays cannot be added if the actual delay D, including the integer part, is
important. Another method to make the FIR filter causal is to choose a smaller value for
N, that is, to design a filter of lower order. This will obviously make the approximation
less accurate.
If Eq. (3.31) is valid, then M = 0 and the entire delay D is implemented by the FIR
filter. This case corresponds to the best (highest-order) causal approximation but also to
the heaviest computational load that an LS FIR filter can offer. If M > 0 then part of the
desired delay has to be implemented by a sequence of unit delays while the rest of that
delay is approximated by the FD filter. The implementation of a fractional delay with
FIR filters will be discussed further in Sections 3.5.2 and 4.1.
Truncating the shifted sinc function is an easy way to design FD FIR filters. This
approach has been proposed, e.g., by Sivanand et al. (1991) (see also Sivanand, 1992)
and Cain et al. (1994). However, it is often not useful since truncation of the impulse
response introduces ripple to the frequency response. This is called the Gibbs phenomenon. It causes the maximum deviation from the ideal frequency response to remain
approximately constant irrespective of the filter order. This goes for both the magnitude
and the phase response.
The magnitude and phase delay characteristics of an LS FD FIR filter are illustrated
in Fig. 3.3. The order N of the filter is 3 and thus the best approximation (M = 0) is
75
obtained with delay values 1 D < 2. Were N = 4, M would be 0 for 1.5 D < 2.5.
Notice the overshoot in both the magnitude and the phase delay curves. It is seen that
the phase delay curves are symmetrical with respect to the delay of 1.5 samples, which
corresponds to the central point of the filters impulse response.
3.2.2 LS Design with Reduced Bandwidth
A variation of the LS design technique is to use a lowpass interpolator as a prototype
filter instead of a fullband filter (Laakso et al., 1994). The ideal solution is then defined
in the interval [0, ap] where 0 < a < 1. The solution corresponding to Eq. (3.33) is now
sin[a p(n - D)] for M n M + N
h(n) = p(n - D)
0
otherwise
(3.38)
with M defined as in Eq. (3.37). It appears that the Gibbs phenomenon will be reduced
considerably, as desired, but the usable bandwidth will contract by a. Thus we conclude
that narrowing the bandwidth does not necessarily improve the LS FD filter design.
Another variation of the basic LS design is to use a reduced bandwidth with a smooth
transition band function (Parks and Burrus, 1987, pp. 6370). This will make the
impulse response of the FD element decay fast. The IR will still be infinitely long and
must be truncated, but the Gibbs phenomenon is guaranteed to be reduced, since the
discontinuity is not as sharp as originally. A good choice for the transition band is a
low-order spline multiplied by e - jwD . Using this design the magnitude response of the
FD FIR filter will remain constant with high precision. However, the price paid is that
the phase response will be severely nonlinear (Laakso et al., 1994).
3.2.3 Windowing the Ideal Impulse Response
A well-known method to reduce the Gibbs phenomenon in FIR filter design is to use a
bell-shaped window function for weighting in the time domain (see, e.g., Parks and
Burrus, 1987, pp. 7183). Truncation corresponds to windowing with a rectangular
window function. An extensive tutorial on window functions and their properties has
been written by Harris (1978). The impulse response of an FIR filter designed by the
windowed LS method can be written in the form
w(n - D)sinc(n - D) for M n M + N
h(n) =
otherwise
0
(3.39)
Note that the midpoint of the window function w(n) of length N + 1 has been shifted by
D so that the shifted sinc function will be windowed symmetrically with respect to its
center. The window function w(n D ) is asymmetric, however. Many window functions, such as the Hamming and von Hann windows, can be easily delayed by a fractional value D (Laakso et al., 1994; Cain and Yardim, 1994; Cain et al., 1994, 1995)
whereas some others cannot. For instance, there is no known method to design exact
fractionally shifted DolphChebyshev or Saramki windows for an arbitrary D. This is
because these windows have been designed in the frequency domain. An approximative
technique for shifting these windows has been proposed by Laakso et al. (1995b).
It is readily seen that the approximation error ELS [see Eq. (3.29)] will be larger with
76
a window function than without it [see Eq. (3.32)]. Thus, this design method does not
minimize the LS error measure, but it is an ad hoc modification of the LS technique. In
general, the frequency response of an FD FIR filter designed using this technique has a
lower ripple but also a wider transition band than a corresponding LS filter (see, e.g.,
Parks and Burrus, 1987, p. 73). A drawback is that the control of the magnitude error is
difficult when changing the parameters of, e.g., the Kaiser or the DolphChebyshev
window.
The windowing method is suitable for real-time systems where the fractional delay is
changed since the coefficients can be updated quickly. The samples of the sinc and the
window function can be stored in memory for several values of D. The filter coefficients for delay values between the stored ones may be obtained, e.g., by linear interpolation (Smith and Gossett, 1984). Smith (1992a) has studied the implementation and
error analysis of this kind of interpolator in detail.
3.2.4 General Least Squares FIR Approximation of a Complex Frequency
Response
In principle, the FIR fractional delay filter with the smallest LS error in the defined
approximation band is accomplished by defining the response only in that part of the
frequency band and by leaving the rest out of the error measure as a dont care band
(Laakso et al., 1994). This scheme also enables frequency-domain weighting of the LS
error. This technique is called general least squares (GLS) FIR filter approximation, and
it results in the following error function
EGLS
1
=
p
ap
W(w ) E(e
1
) dw =
p
jw 2
ap
(3.40)
where the error is defined in the lowpass frequency band [0, ap] only and W(w ) is the
nonnegative frequency-domain weighting function [which has nothing to do with the
time-domain window function w(n)]. The derivation of the bandlimited squared integral
error function for the case W(w ) 1 is presented in Appendix A. Now it is shown how
a filter design algorithm can be obtained based on this approach.
The DTFT of the FIR filter can be written as
H(e jw ) = hT e
(3.41a)
(3.41b)
and e is defined as
e = 1 e - jw
L e - jNw
(3.41c)
The error function (3.40) can now be rewritten in the following way:
EGLS
1
=
p
ap
][
(3.42)
77
where the superscript * denotes complex conjugation. This equation can be further
elaborated as
EGLS
1
=
p
ap
W(w )h
0
2
Ch - 2hT Re Hid (e jw )e* + Hid (e jw ) dw
(3.43a)
where
{ }
C = Re ee H
1
cos(w )
L
cos(Nw )
cos(w )
1
cos[(N - 1)w ]
=
M
O
M
1
[
]
(3.43b)
Here the superscript H stands for the Hermitian operation, i.e., transposition with conjugation. The error function can further be expressed as
EGLS = hT Ph - 2hT p1 + p0
(3.44a)
ap
ap
W(w )Cdw
W(w )[Re{Hid (e
0
1
p0 =
p
ap
(3.44b)
jw
}]
) c - Im Hid (e jw ) s dw
2
W(w ) Hid (e jw ) dw
(3.44c)
(3.44d)
c = [1 cos(w ) L cos(Nw )]
(3.44e)
s = [0 sin(w ) L sin(Nw )]
(3.44f)
and
The optimal solution in terms of the L2 norm is obtained by solving for the minimum
of the error measure (3.44a). The unique minimum-error solution is found by setting its
derivative with respect to h to zero. This results in the following normal equation
2Ph - 2p1 = 0
(3.45)
(3.46)
78
Magnitude (dB)
5
0
-5
-10
-15
-20
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
2
1.8
1.6
1.4
1.2
1
0
Fig. 3.4 Magnitude responses (upper) and phase delay curves (lower) of a third-order
GLS FIR filter with a = 0.5. The curves are plotted for 11 fractional delay values between 1.0 and 2.0. Note that the magnitude responses for N - D are the
same as for D.
the weighting function is not used, i.e., W(w ) 1, since the solution can be given in
closed form. Then elements of P and p1 can be expressed as
Pk,l
1
=
p
ap
k,l = 1,2,K, L
(3.47)
k = 1,2,K, L
(3.48)
and
p1,k
1
=
p
ap
Note that matrix P is independent of the delay parameter D and only needs to be
inverted once. The resulting FIR filter coefficients depend on the choice of frequency
band parameter a and the weighting function.
Figure 3.4 presents a third-order GLS filter with unity weighting function W(w ) 1
and a = 0.5 (i.e., halfband approximation). This result can be compared with the fullband LS FIR approximation of Fig. 3.3. It is seen that the maximum error in both magnitude and phase delay has been reduced in the approximation band. At high frequencies the error has slightly increased since now the approximation error outside the pass-
79
(3.49)
implying that
N
h(n)e- jnW
= e - jW k N / 2 , k = 1,2,K, K
(3.50)
n=0
where N/2 is the delay of the filter. The filter coefficients can be solved from (3.50) for
a chosen total delay D which is close to N/2. This can be expressed in matrix form as
EWh = e D
(3.51a)
1 e - jW 2
EW =
M
M
- jW K
1 e
e - j2W1
e - j2W 2
M
- j2W K
e
L e - jNW1
L e - jNW 2
M
L e - jNW K
(3.51b)
and
e D = e - jDW1
e - jDW 2
L e - jDW K
(3.51c)
80
Magnitude (dB)
5
0
-5
-10
-15
-20
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
2
1.8
1.6
1.4
1.2
1
0
Fig. 3.5 The magnitude and phase delay responses of a third-order (N = 3) approximately equiripple FIR FD filter. The curves are plotted for 11 delay values
between 1.0 and 2.0. Note that the magnitude responses for N - D are the
same as those for D.
P W h pW
(3.52a)
CW
PW =
SW
(3.52b)
c D
pW =
s D
(3.52c)
with
and
where the matrices and vectors contain appropriate cosine and sine elements such that
E W = CW - jSW and e D = c D - js D . The coefficient vector of the almost-equiripple FIR
FD filter is obtained as
-1
h = PW
pW
(3.53)
Note that the cosine-sine matrix P W depends only on the prototype filter but not on
the delay parameter D. Thus it can be inverted once and the same inverse matrix can be
81
82
= 0
= 0 for k = 0,1,2,K, N
(3.54)
where E(e j ) is the complex error function defined by Eq. (3.27). Equation (3.54) can
83
thus be written as
dk
d k
jn
e jD
h(n)e
n=0
= 0
= 0 for k = 0,1,2,K, N
(3.55)
h(n) 1 = 0
n=0
h(n) = 1
(3.56)
n=0
This is the requirement that the coefficients of the FIR filter must sum up to unity, i.e.,
the magnitude response at = 0 has to be equal to 1. For k = 1, Eq. (3.55) yields
N
jnh(n) + jD = 0
nh(n) = D
n=0
n=0
(3.57)
for k = 2
n 2 h(n) + D2 = 0
n=0
n2h(n) = D2
(3.58)
n=0
and so on until k = N.
The N + 1 requirements of Eq. (3.55) can be collected together as
N
nk h(n) = Dk
for k = 0,1,2,K, N
(3.59)
n=0
This is a set of N + 1 linear equations and may be rewritten in the matrix form as
Vh = v
(3.60)
M
0 N 1N
20
21
22
2N
N 0 1
N 1 0
N 2 = 0
O M M
L N N 0
L
1
1
1
1
2
4
1 2N
1
N
N2
O M
L N N
(3.61a)
(3.61b)
v = [1 D D2 K D N ]T
(3.61c)
and
84
(3.62)
where V 1 may be evaluated applying Cramers rule (see, e.g., Hildebrand, 1974, p.
541). The solution is given in an explicit form as
Dk
k =0 n k
N
h(n) =
for n = 0,1,2,KN
(3.63)
k n
where D is the fractional delay and N is the order of the FIR filter. The coefficients h(n)
can be seen to be equal to those of the Lagrange interpolation formula for equally
spaced abscissas (see, e.g., Hildebrand, 1974, pp. 8991).
Ko and Lim (1988) and Hermanowicz (1992) have proposed a more general maximally flat FD approximation where the frequency 0 of maximal flatness is arbitrarily
chosen from the interval [0, ]. The closed-form solution is (Hermanowicz, 1992)
Dk
k =0 n k
N
h(n) = e j 0 (n D)
for n = 0,1,2,KN
(3.64)
k n
Note that 0 = 0 results in Eq. (3.63). If 0 = the filter coefficients are real and they
are the same as the Lagrange interpolation coefficients except that the sign of every
second one of them has been toggled. This implies that the filter has been modulated to
the Nyquist frequency, i.e., its frequency response has been shifted by .
In the other cases, that is 0 < 0 < , the coefficients h(n) will be complex-valued.
Complex FIR filters are computationally much more expensive than the real-valued FIR
filters of the same order. We prefer the real-valued FIR interpolators because they are
computationally efficient and conceptually simple, and also because in audio signal processing it is normally appropriate to have the smallest approximation error at low frequencies.
3.3.2 Classical Approach
The Lagrange interpolation method is based on the well-known result in polynomial
algebra that using an Nth-order polynomial it is possible to match N + 1 given arbitrary
points (see, e.g., Davis, 1963, pp. 2426). The uniformly spaced Lagrange interpolation
formula can be derived in many different ways. The simplest of them is to approximate
a function x(t), t R, known at N + 1 integral points n = 0, 1, 2, ..., N using Nth-order
polynomials h(n, D). The polynomial approximation x(t) in the range [0, N] is given by
x(D) =
h(n, D)x(n)
(3.65)
n=0
85
(3.66)
(3.67)
The requirement that h(n, D) = 1 when n = D is then fulfilled by the scaling constant
Cn =
1
n(n - 1)L1(-1)L(n - N + 1)(n - N)
(3.68)
Substitution of Eq. (3.68) into (3.67) yields Eq. (3.63), the Lagrange interpolation
formula. This approach is a simple, but purely mathematical way to derive the solution.
It does not throw light upon the signal processing aspects of the technique like the
derivation given in Section 3.3.1.
3.3.3 Symmetry of the Lagrange Interpolation Coefficients
One remarkable feature of Lagrange interpolation is that the coefficients h(n) for the
fractional delay N D are the same as those for D, but in reverse order, that is
h(n, D) = h(N - n, N - D) for n = 0,1,2,K, N
(3.69)
D-k
k =0, k n n - k
(3.70)
=
and similarly,
h(N - n, N - D) =
N - D-k
k =0, k N -n N - n - k
(3.71)
The equality in the second and third row of Eq. (3.71) is obtained by changing the sign
of every term in the denominator and numerator of the second row. This is valid, since
there is the same number of terms (N) in both the denominator and the numerator. It is
seen that the last row of Eq. (3.71) is equivalent to that of Eq. (3.52).
86
(3.72)
Note that in the first-order case D = d since the integer part of D is zero. The formulas
for the filter coefficients of the Lagrange interpolator with N = 2 are
h(0) =
1
1
(D - 1)(D - 2) , h(1) = -D(D - 2) , h(2) = D(D - 1)
2
2
(3.73)
and with N = 3,
1
1
h(0) = - (D - 1)(D - 2)(D - 3) , h(1) = D(D - 2)(D - 3) ,
6
2
1
1
h(3) = D(D - 1)(D - 2)
h(2) = - D(D - 1)(D - 3) ,
6
2
(3.74)
87
D-k
D D - n - 1
= (-1) N -1-n
for n = 0,1,2,K, N (3.75a)
n L - n - 1
k =0, k n n - k
N
where L = N + 1 and
G(D + 1)
D
=
n n! G(D - n + 1)!
(3.75b)
where G() is the gamma function. (The binomial coefficient must be evaluated using
the gamma function when one of its parameters is real or non-positive). It is easy to
verify Eq. (3.75) by examining the form in Eq. (3.70). This result is valid for both even
and odd N.
When N is even, the samples of the sinc function under the window of length L = N +
1 can be expressed as (Kootsookos and Williams, 1995)
hid (n) = sinc(n - D) =
sin[p(n - D)]
p(n - D)
(3.76)
for n = 0, 1, 2, K, N (N even)
Now the formula of the Lagrange interpolation coefficients can be manipulated in the
following way (Kootsookos and Williams, 1995):
1
D D - n - 1
N -n D D - n
h(n, D) = (-1) N -n
= (-1)
n N - n
n N - n D - n
D N 1
D N N + 1
= (-1) N -n
= (-1) N -n
N n D - n
N + 1 n D - n
N + 1 n sin(pD)
p(n - D)
p(N + 1) D N
(3.77)
88
Scaling Coefficient
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
Fractional Delay
0.4
0.6
0.8
Fig. 3.6 Examples of the scaling coefficient of the binomial window for computation of
even-order Lagrange interpolation coefficients (solid line: N = 2; dashed line:
N = 4; dash-dot line: N = 6).
where wbin (n) is the binomial window function defined by
N
wbin (n) =
n
for n = 0, 1, 2, ..., N
(3.78)
and Cbin,e (D) is the scaling coefficient (in the case of even N)
Cbin,e (D) =
p(N + 1) D
sin(pD) N + 1
(3.79)
Note that the binomial window approaches the Gaussian function as N approaches
infinity. The scaling coefficient Cbin,e (D) is illustrated in Fig. 3.6 for some low-order
Lagrange interpolators when D N/2 varies between 1 and 1.
89
Odd N
When N is odd, the sinc function can be rewritten as
hid (n) = sinc(n - D) =
sin[p(n - D)]
p(n - D)
(3.80)
(-1) N -n sin(pD)
p(n - D)
Now the formula of the Lagrange interpolation coefficients can be manipulated in the
following way:
D D - n - 1
h(n, D) = (-1) N -n
n N - n
N + 1 n sin(pD)
p(n - D)
= (-1)
(3.81)
p(N + 1) D N
p(N + 1) D
sin(pD) N + 1
(3.82)
Figure 3.7 shows the scaling coefficient for some low-order Lagrange interpolators
(with odd N) for fractional delay values 0 d < 1.
General Result
The two results can be combined by defining a scaling coefficient Cbin (D, N) as
Cbin (D, N) = (-1) N
p(N + 1) D
sin(pD) N + 1
(3.83)
and the general form of the window-based design of Nth-order for Lagrange interpolator
can be expressed as
h(n) = Cbin (D, N)wbin (n)sinc(n - D), for 0, 1, 2, K, N
(3.84)
90
8
7
6
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Fractional Delay
0.7
0.8
0.9
Fig. 3.7 Examples of the scaling coefficient of the binomial window for computation of
odd-order Lagrange interpolation coefficients (solid line: N = 1; dashed line: N
= 3; dash-dot line: N = 5).
3.3.6 Approximation Errors of Lagrange Interpolation
The approximation error of Lagrange interpolation depends drastically on the fractional
part d of the interpolation interval D. The error vanishes when d = 0. Namely, if D is an
integer, Eq. (3.63) can be written as
h(n) = d (n - D) for n = 0,1,2,KN
(3.85)
(3.86)
In other words, the impulse response of the Lagrange interpolator reduces to a delayed
unit impulse when D is an integer. This implies that the Lagrange interpolation polynomial goes through the known signal samples.
The worst case is obtained with d = 0.5. Then, in the odd-order case, the impulse
response of the interpolator is even-symmetric with respect to its center of gravity, or
h(n) = h(N - n) for n = 0,1,2,K, N
(3.87)
91
Magnitude (dB)
0
-5
-10
-15
-20
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
1
0.8
0.6
0.4
0.2
0
0
Fig. 3.8 The magnitude (upper) and phase delay (lower) response of a linear interpolator for eleven different fractional delay values (D = 0, 0.1, 0.2, ..., 1.0). Note
that there are only six different curves in the upper figure (not eleven), because
the magnitude responses for fractional delays d and 1d are the same. In the
lower figure, the ideal phase delay in each case is illustrated by a dotted line.
The interpolator is thus a linear-phase FIR filter and its phase response is exactly -Dw
as it should be. Unfortunately, for an odd-order FIR filter this also means that it has a
real zero at the Nyquist frequency (see, e.g., Parks and Burrus, 1987, pp. 2326) and
consequently the total error is larger than for any other value of d. For an even-order
Lagrange interpolator there is no particular symmetry when d = 0.5, but the approximation error in both magnitude and phase is the largest.
The magnitude and phase responses of the first, second, and third-order Lagrange
interpolators are illustrated in Figs. 3.83.10. Note that in these figures the normalized
frequency 0.5 corresponds to the Nyquist frequency. It is seen that both the magnitude
and the phase delay curves coincide with the ideal response at low frequencies as
expected.
Cain et al. (1994) have compared the approximation error of Lagrange interpolation
with that of other FD FIR filters. They concluded that it is preferable for applications
where a low-order FD filter is needed and a fullband approximation is not necessary.
From the viewpoint of waveguide models, Lagrange interpolation has two favorable
features:
1) it is accurate at low frequencies and
92
Magnitude (dB)
0
-5
-10
-15
-20
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
1.5
0.5
0
Fig. 3.9 The magnitude (upper) and phase delay (lower) response of a second-order
Lagrange interpolator (N = 2) for eleven different fractional delay values (D =
0.5, 0.6, ..., 1.0, ..., 1.4, 1.5). Note that the magnitude responses for fractional
delays D and N D are the same. In the lower figure, the ideal phase delay in
each case is illustrated by a dotted line.
2) it never overestimates the amplitude of the signal when the delay has been chosen
so that (N 1)/2 D (N + 1)/2 when N is odd and (N/2) 1 D (N/2) + 1
when N is even.
The first advantage is justified by the fact that audio signals are usually lowpass signals
and thus it is wise to use an approximation technique that has the smallest error at low
frequencies.
The second property is called passivity and it implies that the magnitude response of
the Lagrange interpolator is less than or equal to one for the mentioned values of D.
This property is advantageous because in digital waveguide models the interpolator is
normally used inside a feedback loop and then it is extremely important to preserve the
loop gain less than unity. Otherwise the system may become unstable. Since a Lagrange
interpolator is a passive filter, the interpolation error only decreases the loop gain but
never increases it. It is interesting that in the case of even-order Lagrange interpolators,
the filter is passive on a range of two unit delays, while odd-order filters only have a
range of one unit delay.
It has been experimentally noticed that the magnitude response of the Lagrange
interpolator exceeds unity when the delay parameter is out of the optimal range. An
example of this phenomenon is given in Fig. 3.11 for the third-order Lagrange interpo-
93
Magnitude (dB)
0
-5
-10
-15
-20
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
2
1.8
1.6
1.4
1.2
1
0
Fig. 3.10 The magnitude (upper) and phase delay (lower) response of a third-order
Lagrange interpolator (N = 3) for 11 equally spaced fractional delay values
(D = 1.0, 1.1, 1.2, ..., 2.0).
lator when the parameter D varies from 0 to 1. (In this case the delay parameter should
be in the range 1 D 2). When 0.5 D < 1, the magnitude response is greater than
one at all other frequencies except w = 0. When 0 < D < 0.5, the magnitude response
exceeds unity at middle frequencies but shows lowpass behavior at the high end. (Note
that the scale of the magnitude response in Fig. 3.11 is different than in Figs. 3.83.10.)
Also the phase delay approximation appears to be substantially worse now than when D
is on the optimal range (compare with the phase response in Fig. 3.10).
The squared integral error has been computed for Lagrange interpolators of different
order using Eq. (3.31). Figure 3.12 shows the error as a function of the delay parameter
for odd-order filters N = 1, 3, and 5 and Fig. 3.13 for even-order filters N = 2, 4, and 6.
The odd-order Lagrange interpolators have an error curve that is symmetrical with
respect to the point D = N/2. The lobe between the middle taps has approximately the
form of the sine-squared function (Laakso et al., 1995a).
The even-order Lagrange interpolators also have error curves that are symmetrical
with respect to D = N/2 (see Fig. 3.13). Note, however, that individual lobes of the error
curves are asymmetrical. In practice the shape of these curves suggests that the lowest
error is obtained when the even-order interpolator is used for the values (N 1)/2 D
(N + 1)/2. This is the same requirement as for the odd-order filter, although the evenorder filter is passive when (N/2) 1 D (N/2) + 1.
94
Magnitude (dB)
2
0
-2
-4
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
1
0.8
0.6
0.4
0.2
0
0
Fig. 3.11 The magnitude (upper) and phase delay (lower) response of a third-order
Lagrange interpolator (N = 3) the for 11 equally spaced fractional delay values (D = 0.0, 0.1, 0.2, ..., 1.0). Note that now D is varied on a non-optimal
range since D (N 1)/2.
As seen above, both even and odd-order Lagrange interpolators can be used for FD
approximation. This is in contrast to some statements in the literature that recommend
the use of odd-order interpolators only (see, e.g., Erup et al., 1993, p. 999). However, in
dynamic FD applications even-length interpolators introduce a practical problem: when
d passes the value 0.5, the locations of the filter taps have to be changed to maintain d
within half a sample from the center point of the filter. This ensures that the overall
approximation error is always as small as possible. However, this change of filter taps
causes a discontinuity to the output signal. The odd-order Lagrange interpolators do not
have this problem since the filter taps need to be moved only when D passes an integer
value.
95
1.5
Squared Error
0.5
0
0
0.5
1.5
2.5
Delay
3.5
4.5
c(k)Dk
(3.88)
k =0
that takes on the value x(n) when D = n. The coefficients c(k) are solved from a set of N
+ 1 linear equations. Farrow (1988) suggested that every filter coefficient of an FIR
interpolating filter could be expressed as an Nth-order polynomial in the delay parameter D. He stated that this results in N + 1 FIR filters with constant coefficients. The
above approach to Lagrange interpolation is seen to be related to Farrows idea.
After publication of the theory of this new structure, it was found out that the same idea had already
been mentioned by Erup et al. (1993, Appendix, pp. 10071008) but without derivation. This is however
an independent work. To the knowledge of the author, the derivation given here has not been published
by anybody else.
96
Squared Error
2.5
1.5
0.5
0
0
3
Delay
(3.89)
where X(z) and Y(z) are the z-transforms of the input and output signal, x(n) and y(n),
respectively, and the transfer function H(z) is now expressed as a polynomial in D
(instead of z -1).
H(z) =
Ck (z)Dk
(3.90)
k =0
The familiar requirement that the output sample should be one of the input samples for
integer D may be written in the z-domain as
Y(z) = z - D X(z) for D = 0,1,2,K, N
(3.91)
Together with Eqs. (3.89) and (3.90) this leads to the following N + 1 conditions
N
Ck (z)Dk = z - D
k =0
for D = 0,1,2,K, N
(3.92)
97
(3.93)
M
N 0
01
11
21
02
12
22
N1
N2
0 N 1 0
1N 1 1
2 N = 1 2
O M M
L N N 1 N
L
O M
L N N
0
1
4
N2
0
1
2N
(3.94)
vector c is
c = [C0 (z) C1 (z) C2 (z) K CN (z)]
(3.95)
z = 1 z -1
z -2 K z - N
(3.96)
(3.97)
The inverse matrix U -1 , that we shall denote by Q, may be solved using Cramers rule.
The rows of the inverse Vandermonde matrix Q contain the filter coefficients used in
the new structure, and thus it is convenient to write
Q = [q 0
q1 q 2 K q N ]T
(3.98)
qn (k)z -k
for n = 1,2,K, N
(3.99)
k =0
The coefficients qn (k) for the FIR filters Cn (z) are computed inverting the Vandermonde matrix U.
By setting D = 0 in Eq. (3.92), it is seen that
N
Ck (z)0 k = 1
C0 (z) 1
(3.100)
k =0
This implies that the transfer function C0 (z) = 1 regardless of the order of the interpolator. The other transfer functions Cn (z) given by Eq. (3.99) are Nth-order polynomials in
z -1 , that is, they are Nth-order FIR filters.
We shall call this the implementation technique describe above the Farrow structure
of Lagrange interpolation. A remarkable feature of this form is that the transfer func-
98
x(n)
CN -1 (z)
CN (z)
D
C1 (z)
C2 (z)
D
y(n)
Fig. 3.14 The Farrow structure of Lagrange interpolation implemented using Horners
method. The overall transfer function has been formulated as a polynomial in
the delay parameter D. The transfer functions Cn (z) are Nth-order FIR filters
with constant coefficients for a given N.
tions Cn (z) are fixed for a given order N. The interpolator is directly controlled by the
fractional delay D, i.e., no computationally intensive coefficient update is needed when
D is changed.
The Farrow structure is most efficiently implemented using Horners method (see,
e.g., Hildebrand, 1974, p. 28), that is
N4
647
8
C
(z)D
=
C
(z)
+
[C
(z)
+
[C
(z)+K+[C
(z)
+
C
(z)D]DL]D
(3.101)
k
0
1
2
N -1
N
N
k =0
(3.102)
-1
C0 (z) 1 0 1 1 0 1 1
C (z) = 1 1 z -1 = -1 1 z -1 = -1 + z -1
(3.103)
It is seen that C1 (z) = z -1 - 1 when N = 1. The overall transfer function H(z) of the
Farrow structure of linear interpolation is written as
99
a)
b)
x(n)
-1
x(n)
-1
1d
y(n)
y(n)
Fig. 3.15 a) The direct-form FIR filter structure for linear interpolation and b) the
equivalent Farrow structure.
H(z) = 1 + (z -1 - 1)D
(3.104)
Note that in this case D = d. The linear interpolator may thus be implemented by the
structure illustrated in Fig. 3.15b. It is seen that the Farrow structure is as efficient as
the direct-form nonrecursive structure (Fig. 3.15a) when N = 1, since the number of
operations is the same in both. If multiplication is more expensive than addition, like it
is in VLSI implementations, then the Farrow structure (Fig. 3.15b) is preferable.
For N = 2, Eq. (3.93) is written as
1 0 0 C0 (z) 1
1 1 1 C (z) = z -1
1
1 2 4 C2 (z) z -2
(3.105)
-1
0
0
1
= - 3 2 2 -1 2
1 2 -1 1 2
(3.106)
3
1
1
1
+ 2z -1 - z -2 , C2 (z) = - z -1 + z -2
2
2
2
2
(3.107)
(3.108)
100
x(n)
-1
-1
1/2
1/2
3/2
y(n)
Fig. 3.16 The Farrow structure for a second-order (N = 2) Lagrange interpolator. This
structure involves 6 additions and 6 multiplications. The best fractional delay
approximation is obtained when 0.5 D 1.5.
[e.g., the third coefficient of C1 (z) and C2 (z) ]. These special cases are often met with
the inverse Vandermonde matrix.
Modified Farrow Structure
The Farrow structure can be made more efficient changing the range of the parameter D
so that the integer part is removed. The new parameter range is 0 d 1 (for odd N) or
0.5 d 0.5 (for even N). This change can be obtained introducing a transformation
matrix T defined by
Tn,m
n-m n
N
round
for n m
=
m
2
0
for n < m
(3.109)
T = 0 1 2
0 0 1
(3.110)
= TQ = 0 1 2 -3 / 2 2 -1 / 2 = -1 / 2 0 1 / 2
Q
0 0 1 1 / 2 -1 1 / 2 1 / 2 -1 1 / 2
(3.111)
(3.112)
x(n)
101
-1
-1
1/2
1/2
d
y(n)
Fig. 3.17 The modified Farrow structure for a second-order (N = 2) Lagrange interpolator. This structure involves 4 additions and 4 multiplications. In this case the
best fractional delay approximation is obtained when 0.5 d 0.5. Note,
however, that the structure produces an extra delay of one sample so that the
actual delay then lies within the range [0.5, 1.5].
Figure 3.17 shows the implementation of this modified second-order Farrow structure of Lagrange interpolation. It is seen that now only 4 multiplications and additions
are needed in contrast with 6 in the basic implementation shown in Fig. 3.16.
The final example of the modified Farrows structure of Lagrange interpolation presents the third-order system. The inverse Vandermonde matrix Q associated with the
third-order case is given by
1
-1 1
Q=U =
1
1
0
1
2
3
0 0 -1 1
0
0
0
1 1
-11 / 6
3
-3 / 2 1 / 3
=
4 8
-5 / 2
2
-1 / 2
1
-1 / 6 1 / 2 -1 / 2 1 / 6
9 27
(3.113)
This coefficient matrix does not yield a very efficient implementation since there are
for the
only two pairs of coefficients that can be combined. The coefficient matrix Q
modified structure is obtained multiplying Q by T, which yields
1
= TQ = 0
Q
0
0
1
1
0
0
1
2
1
0
1
0
0
1
0
-1 / 3 -1 / 2
1
-1 / 6
3
Q=
-1
1/ 2
0
3
1/ 2
-1 / 6 1 / 2 -1 / 2 1 / 6
1
(3.114)
The block diagram of this system is shown in Fig. 3.18. It is seen that 11 additions and
9 multiplications are needed in the implementation of this system.
For comparison we consider third-order Lagrange interpolation implemented using
the direct-form FIR filter structure with the coefficients updated according to the technique suggested in Section 3.3.4. The computational load for the coefficient update
would be 3 additions and 10 multiplications, and for the computation of the output 3
additions and 4 multiplications. Altogether this yields 6 additions and 14 multiplica-
102
x(n)
z-1
z-1
z-1
1/2
1/3
1/2
1/2
+
+
1/6
1/6
y(n)
Fig. 3.18 The modified Farrow structure for a third-order (N = 3) Lagrange interpolator. This structure involves 11 additions and 9 multiplications. In this case the
best fractional delay approximation is obtained when 0 < d 1. Note, however, that the structure produces an extra delay of one sample so that the
actual delay lies within the range [1, 2].
tions, which makes 20 operationsthe same number as with the modified Farrow
structure. The number of multiplications, however, is smaller in the case of the modified
Farrow structure.
3.3.8 Related Polynomial Interpolation Techniques
Lagrange interpolation is one representative of a class of polynomial interpolation
techniques. Other interpolation methods based on polynomials have not been widely
studied. Here we discuss some examples from recent DSP literature.
Matsui et al. (1991) proposed a tangent-line interpolation (TLI) technique for fractional delay approximation. They used it to control the fundamental frequency of a
speech synthesizer. The basic idea of the TLI method is to approximate the value of a
sampled signal in the neighborhood of a known sample x(n) by a straight line that takes
on the value x(n) at point n. This technique is not equivalent to linear interpolation since
the slope of the line is computed from the previous and next sample values. This results
in a three-tap FIR filter with coefficients
h(0) =
1- D
D -1
, h(1) = 1, and h(2) =
2
2
(3.115)
The frequency-domain properties of the TLI method were studied in Vlimki (1994b).
It was found that there is an overshoot in the magnitude response of the filter with a
maximum at about f s / 4. In the worst case (d = 0.5) this overshoot is about 1 dB. The
phase delay of the TLI filter is exact w = 0 as in the case of Lagrange interpolators.
Unfortunately the phase delay approaches the value D = 1 almost linearly as a function
of frequency. Thus this method is unsuitable for tasks where the accuracy of fractional
delay approximation is important.
103
(3.116)
h(3) = a d 2 - a d
where a is a real-valued parameter and d is the fractional delay (0 d 1). This interpolation technique produces a constant delay of one sample plus a delay of approximately d. Note that the coefficients h(0) and h(3) are identical. With a = 0 , the PPI filter reduces to linear interpolation.
Erup et al. (1993) recommend the PPI method with parameter value a = 0.5 for FD
approximation. This filter was analyzed in Vlimki (1994b). Both the magnitude
response and the phase delay are almost flat at low frequencies. At the high end the
magnitude decreases and the phase delay approaches 1 or 2 depending on d. The main
drawback of this technique is that the magnitude response exceeds unity at middle frequencies. In the worst case (d = 0.5), the overshoot at the culmination point 0.2 f s is
0.74 dB or slightly less than 9%.
The properties of the PPI filter are quite comparable to those of low-order Lagrange
interpolators. Furthermore, the PPI filter can be implemented very efficiently with the
Farrow structure (Erup et al., 1993). This method is preferred over the third-order
Lagrange interpolator if the small overshoot in its magnitude response is not a serious
drawback. In waveguide models, the PPI filter could be used for tuning the delay-line
lengths if the loop gain were much less than unity at middle frequencies, that is, if a
great deal of losses were included. In this case the overshoot would not risk the stability
of the waveguide system.
Splines are a class of polynomial interpolation techniques that have several applications in numerical computation. So far they have been tried in many DSP applications,
such as image processing (Hou and Andrews, 1978; Unser et al., 1993b), sampling-rate
conversion (e.g., Cucchi et al., 1991; Zlzer and Boltze 1994), and signal reconstruction
(Unser et al., 1992, 1993a). Aldroubi et al. (1992) have proved that a cardinal spline
interpolator converges toward the ideal interpolator as the order of the filter approaches
infinity. An interesting field of future research is to compare this technique with other
known methods, e.g., Lagrange interpolation.
3.3.9 Conclusion and Discussion
In this section the maximally flat design of an FD FIR filterwhich is equivalent to
classical Lagrange interpolationwas discussed. It appears to be well suited to waveguide modeling because the approximation error is small at low frequencies and it is
passive (i.e., its magnitude response never exceeds unity) when the delay parameter is
within the optimal range. It was shown that both the even and odd-order Lagrange
interpolation filters can also be designed by windowing the shifted and sampled sinc
function with a scaled binomial window.
A novel implementation technique for the Lagrange interpolation was derived. This
104
aN
x(n)
z-1
a N-1
y(n)
- a1
z-1
z-1
a1
z-1
- aN -1
-1
- aN
-1
105
z N D(z 1 ) aN + aN 1z 1 +K+a1z ( N 1) + z N
=
D(z)
1 + a1z 1 +K+aN 1z ( N 1) + aN z N
(3.117)
where N is the order of the filter, D(z) is the denominator polynomial, and the filter
coefficients ak (k = 1, 2, ..., N) are real. The block diagram of the allpass filter is
depicted in Fig. 3.19.
The numerator of the allpass transfer function is a mirrored version of the denominator polynomial. The poles (i.e., roots of the denominator) of a stable allpass filter are
located inside the unit circle in the complex plane and as a result its zeros (i.e., roots of
the numerator) are located outside the unit circle so that their angle is the same but the
radius is the inverse of the corresponding pole. For this reason the magnitude response
of an allpass filter is flat. It is expressed as
A(e j ) =
e jN D(e j )
1
D(e j )
(3.118)
Obviously, the name allpass filter comes from the above property that this filter passes
signal components of all frequencies without attenuating or boosting them.
The frequency response of the allpass filter can be expressed in the form
A(e j ) = e j A ( )
(3.119)
This form stresses the fact that the main feature of an allpass filter is its phase response
A ( ) . It can be written as
A ( ) = arg A(e j ) = N + 2 D ( )
(3.120)
The term allpass filter is sometimes used to denote any discrete-time system (FIR or IIR) whose
magnitude response is almost flat at all frequencies (|H(ejw)| 1). Throughout this text this term is used
only for recursive discrete-time filters which have the exact allpass property, that is, |H(ejw)| 1.
106
ak sin(kw )
aT s
1
k =0
Q D (w ) = arg
= arctan T
jw = arctan N
D(e )
a c
a cos(kw )
k
k =0
(3.121a)
(3.121b)
s = [0 sin w
sin(2w ) L sin(Nw )]
c = [1 cos w
cos(2w ) L cos(Nw )]
T
T
(3.121c)
(3.121d)
The phase delay t p, A (w ) of an allpass filter can be expressed with the help of Eq.
(3.120) as
t p, A (w ) = -
Q A (w )
= N - 2 t p,D (w )
w
(3.122)
where t p,D (w ) is the phase delay of 1 D(e jw ). The corresponding group delay is given
by
t g, A (w ) = -
dQ A (w )
= N - 2 t g,D (w )
dw
(3.123)
(3.124)
as may be seen from the ideal frequency response given by Eq. (3.8). The phase error of
an allpass filter can be defined as the deviation from the desired phase function Q id (w )
as
a T sb
DQ(w ) = Q id (w ) - Q A (w ) = 2 arctan T
a cb
where a is the coefficient vector given by Eq. (3.121b), sb and cb are defined as
(3.125)
107
(3.126a)
(3.126b)
and b (w ) as
b (w ) =
1
[Qid (w ) + Nw ]
2
(3.126c)
b (w ) = -
wd
2
(3.127)
ELS
1
= W(w ) DQ(w ) 2 dw
p0
(3.128)
where W(w ) is a nonnegative weighting function. Lang and Laakso (1994) have introduced an iterative algorithm for the design of the coefficient vector a. The algorithm
typically converges to the desired solution, but this cannot be guaranteed.
The LS phase delay error can be defined as
p
ELS
2
1
1
DQ(w ) 2
= W(w ) Dt p (w ) dw = W(w )
dw
p0
p0
w
p
1 W(w )
2
=
2 DQ(w ) dw
p0 w
(3.129)
In other words, the phase delay error solution is obtained by introducing an additional
weighting function W(w ) = 1 w 2 to the phase error, Eq. (3.128). Allpass filters may be
designed using the iterative algorithm modified from that mentioned above.
There exists a large variety of algorithms for equiripple or minimax allpass filter
design in terms of phase, group delay, or phase delay (see Laakso et al., 1994 for references). New iterative techniques for equiripple phase error and phase delay error design
are described in Laakso et al. (1994).
It appears that, just as with the FD FIR filters, a well suited digital allpass filter for
FD approximation in audio signal processing is obtained by using the maximally flat
design at w = 0. This technique deserves more attention than the others.
3.4.3 Maximally Flat Group Delay Design of FD Allpass Filters
The maximally flat group delay design is the only known FD allpass filter design technique that has a closed-form solution (Laakso et al., 1992, 1994). Here we discuss this
solution.
Thiran (1971) proposed an analytic solution for the coefficients of an all-pole lowpass filter with a maximally flat group delay response at the zero frequency
108
2d + n
ak = (-1)
k n=0 2d + k + n
k N
for k = 0,1,2,K, N
(3.130)
(3.131)
It has been shown by Thiran (1971) that when d > 0 the denominator polynomial
obtained by the above method will have its roots within the unit circle in the complex
plane, i.e., the filter will be stable. A drawback of Thirans design technique is that the
magnitude response of the all-pole lowpass filter cannot be controlled. In the allpass
design, however, this kind of problem does not exist. Thus it seems that the result of
Thiran is better suited to the design of allpass than all-pole filters.
As can be seen from Eq. (3.123) the group delay of an allpass filter is twice that of its
all-pole counterpart. Thus Thirans solution may easily be applied to the design of allpass filters by making the substitution d = d/2. The solution for the coefficients of a
maximally flat (MF) allpass filter is thus (Laakso et al., 1992)
N N d + n
ak = (-1)k
k n=0 d + k + n
for k = 0,1,2,K, N
(3.132)
Note that a0 = 1 always and thus the coefficient vector a need not be scaled. This
closed-form solution to the allpass filter approximation is called the Thiran allpass filter.
The allpass filters designed using Eq. (3.132) approximate a group delay of N + d
samples. This kind of parametrization is rather impractical. For this reason we substitute
d = D N into Eq. (3.132). As a result we obtain the following formula:
N N D - N + n
ak = (-1)k
k n=0 D - N + k + n
for k = 0,1,2,K, N
(3.133)
Now the delay parameter D refers to the actual delay rather than the offset from N
samples. In Table 3.1 we give the coefficients for the Thiran allpass filters with N = 1,
2, and 3.
Table 3.1 Coefficients of the Thiran allpass filters of order N = 1, 2, and 3.
a1
a2
a3
D -1
D +1
N=1
N=2
-2
D-2
D +1
(D - 1)(D - 2)
(D + 1)(D + 2)
N=3
-3
D-3
D +1
(D - 2)(D - 3)
(D + 1)(D + 2)
(D - 1)(D - 2)(D - 3)
(D + 1)(D + 2)(D + 3)
109
D=3
D = 2.75
2.5
D = 2.5
D = 2.25
2
D=2
D = 1.75
1.5
D = 1.5
D = 1.25
D=1
D = 0.75
0.5
D = 0.5
D = 0.25
0
0
D=0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
Fig. 3.20 The phase delay of a first-order (N = 1) Thiran allpass filter. The dashed lines
indicate the ideal phase delay which is constant.
Thirans proof of stability implies that this allpass filter will be stable when D > N.
We have experimentally observed that the Thiran allpass filter is also stable when the
delay is N 1 < D < N. At D = N 1, one of the poles (as well as one zero) of the
Thiran allpass filter will be on the unit circle making the filter asymptotically unstable.
When D < N 1, one or more poles will be outside the unit circle and the filter will be
unstable.
In Figs. 3.20, 3.21, and 3.22, the phase delay responses of first, second, and thirdorder MF allpass interpolators are presented for several values of the delay parameter D
starting at D = N 1. The ideal phase delay (constant at all frequencies) is illustrated
with a dotted line in each figure. It is seen that the Thiran allpass filter is most accurate
at low frequencies. This is an expected result since Thiran designed the denominator
polynomial so that the group delay and its N derivatives evaluated at the zero frequency
match the desired result. As the filter order is increased, the phase delay curves remain
nearly constant at higher frequencies.
The closed-form design of the first-order allpass filter for varying the length of a
delay line has been discussed in papers by Jaffe and Smith (1983) and Smith and
Friedlander (1985). They obtained the solution by first expressing the phase response of
the first-order allpass filter by its Taylor series with the expansion point z = 0 and then
truncating this series after the first term. The expression for the filter coefficient a1 of
the first-order (N = 1) allpass filter is the same as that given in Table 3.1.
110
D=4
D = 3.75
3.5
D = 3.5
D = 3.25
3
D=3
D = 2.75
2.5
D = 2.5
D = 2.25
D=2
D = 1.75
1.5
D = 1.5
D = 1.25
D=1
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
2
2
1
1
ES = E(e jw ) dw = A(e jw ) - Hid (e jw ) dw
p0
p0
][
2
1
1
e jQ A (w ) - e - jwD dw = e jQ A (w ) - e - jwD e - jQ(w ) - e jwD dw
p0
p0
(3.134)
2
= 1 - cos[Q A (w ) + w D]dw
p 0
Integration of the last form is difficult and we thus use numerical methods for investigating the approximation error. We use a numerical integration technique to evaluate
the error for Thiran allpass filters of different order. Figure 3.23 gives the error ES as a
111
D=5
D = 4.75
4.5
D = 4.5
D = 4.25
4
D=4
D = 3.75
3.5
D = 3.5
D = 3.25
D=3
D = 2.75
2.5
D = 2.5
D = 2.25
2
0
D=2
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
(3.135)
To solve for the optimal choice for D0 we may evaluate the average error Eave written
as
D0 +1
Eave (D0 ) =
ES (D)dD
(3.136)
D0
We can evaluate this integral numerically. Figure 3.24 presents the average error as a
function of D0 for some low-order Thiran allpass filters. The minimum of each curve is
indicated by a circle in Fig. 3.24. These points correspond to the optimal D0 for each
112
0.6
0.5
0.4
0.3
0.2
0.1
0
-1
-0.5
0.5
D-N
1.5
Fig. 3.23 Squared integral error of Thiran allpass filters of orders N = 1 to 6 (top to bottom).
filter. The values of D0 and the corresponding average error Eave are presented in
Table 3.2.
Some general observations can be made from Fig. 3.24. Apparently, the approximation error of the Thiran allpass filter is quite small for delay values N 1 < D < N
reaching a minimum at the value given in Table 3.2. For much larger values of D, the
approximation error increases considerably. It is seen in Fig. 3.24 that the average error
does not increase very rapidly in the neighborhood of the minimum. Thus, if the best
possible accuracy is not essential, one may use a simple rule of thumb to choose the
Table 3.2 The optimal values for D0 and minimum average errors for low-order Thiran
allpass filters. Eave (N - 0.5) is the average error when D0 = N 0.5.
N
1
2
3
4
5
6
D0, opt
Eave (N - 0.5)
0.418
1.403
2.396
3.392
4.390
5.389
0.040
0.030
0.025
0.022
0.019
0.018
0.043
0.033
0.027
0.024
0.022
0.020
113
0.16
0.14
0.12
Average Error
0.1
0.08
0.06
0.04
0.02
0
-1
-0.5
D0 - N
Fig. 3.24 The average error of low-order Thiran allpass filters as a function of D0 . The
filter orders from top to bottom are N = 1, 2, 3, 4, 5, and 6. The minimum of
each error function is indicated by a circle.
range for D: for the Thiran allpass filter, a close-to-minimum error is obtained when D0
= N 0.5, which means that D is
N - 0.5 D < N + 0.5
(3.137)
A similar statement as in the case of FIR FD filters can be made: the total delay of the
Thiran allpass filters increases linearly with order N.
3.4.5 Discussion
This section has dealt with allpass filter approximation of fractional delay. Digital allpass filters are a good choice for this task since their magnitude response is exactly flat
and the design can concentrate on the phase properties. The Thiran interpolator was discussed in detail since it is easy to design with closed-form formulas and it is very accurate at low frequencies.
The design of general IIR filters (with different numerator and denominator polynomials) for FD approximation has not been much discussed in the DSP literature. One
example is the paper by Tarczynski and Cain (1994) where reduced-bandwidth IIR filters are optimized iteratively. The design of optimal IIR filters is a difficult problem but
new techniques are certainly worth trying since generally speaking recursive filters are
more powerful than FIR filters. This is a fascinating topic for future research.
114
x(n)
z M
h(0)
z1
h(1)
115
z1
z1
h(2)
h(N)
x(n M N)
x (n M D)
Fig. 3.25 A variable-length digital delay line implemented using FIR interpolation.
h = [h(0) h(1) h(2) L h(N)]T
(3.138)
For simplicity of notation, the dependency of the coefficients h(n) on the delay D is not
explicitly shown. It should be remembered that we should use h(n, D) in all the equations. Various techniques for determining the interpolation coefficients h(n) have been
discussed earlier in this chapter.
An estimate for the ideal interpolated signal value x(n M D) is computed as (see
Fig. 3.25)
x(n M D) =
h(k)x(n M k) = hT x n
(3.139a)
k =0
where M is the number of unit delays in the delay line before the FIR interpolator [as
defined by Eq. (3.37)] and the vector of signal samples to be used in interpolation is
x n = [x(n M) x(n M 1) x(n M 2) L x(n M N)]T (3.139b)
The length of a delay line can now be changed continuously by computing new values
for the FIR filter coefficients h(n) at every sampling interval.
In the following we concentrate on the case where the delay parameter is changed at
time index nc. The delay parameter is thus a function of time so that
D1, n < nc
D(n) =
D2 , n nc
(3.140)
If the next delay value D2 does not fulfill the requirement (N 1)/2 D2 < (N + 1)/2,
the filter taps of the FIR FD filter have to be moved to obtain the best approximation
(for a given approximation method and given N). In other words, the taps of the FIR
filter have to be moved with respect to the delay line if d passes the value 0 when N is
odd or d passes the value 0.5 when N is even. This also implies that an integer must be
added to or subtracted from D2 to have a delay that is within the optimal range.
Special attention must be paid to correctly implement the delay change where D2 >
D1 and the filter taps need to be moved: the transient signal will be completely eliminated if the delayed values of the input signal are available, i.e., the delay line is long
enough. One solution is to postpone the change of delay parameter for the time that it
takes to store the input signal values into the buffer memory (i.e., to wait until the delay
line is filled with samples needed by the FIR filter) and change the delay only then. This
will, however, cause an extra processing delay which depends on the value of the delay
parameter and which may not be acceptable.
116
(3.141)
Here the N / 2 extra samples are required because the fractional point D should be
located near the central tap of the FIR interpolator. This issue has been discussed in
Section 3.2.1 for the LS FIR filter design but it applies to other FD FIR filters as well.
It is also necessary to consider the case D 2 < D 1 when the filter taps should be
moved. Since the filter taps should be located in such a way that (N 1)/2 D2 < (N +
1)/2, a delay D2 < (N 1)/2 cannot be approximated with good accuracy. (This discussion is irrelevant for the case N = 1 since for the first-order filter the smallest delay in
the optimal range is zero.) There are two possible solutions to the problem of a small
fractional delay:
1) to approximate the delay outside the optimal range or
2) to reduce the order N of the FIR FD filter.
Both methods imply reduced accuracy in the FD approximation. The choice depends on
the application. In digital waveguide modeling, when Lagrange interpolation is
employed, it is better to use a lower order filter (choice 2) since the Lagrange interpolator is not a passive filter outside its optimal range. From the discussion in Section 3.3.6
we can also deduce that this solution will actually give a smaller total error than using a
higher-order filter outside its optimal range. Notice that the above discussion is mainly
relevant to real-time implementation of an FIR delay filter .
The use of an FIR interpolator is favorable in the sense that no disturbing transients
will be generated when the delay D is changed. However, a discontinuity will occur at
the output. If the change in the coefficient values is small enough the discontinuity will
be unnoticeable, but if the change is large a serious disturbance may occur for certain
signals. For computational savings it may be necessary to update the interpolating coefficients more seldom than for every sampling interval, say every 10 or 100 sampling
cycles. It is recommended to experimentally verify that updating is frequent enough, so
that the discontinuities do not cause trouble.
3.5.3 Transients in Time-Varying Recursive FD Filters
Due to the recursive nature of IIR filters, abrupt changes in their coefficients cause disturbances in the future values of the internal state variables and transients in the output
values of the filter. These disturbances depend on the filters impulse response so that
they are in principle infinitely long. In the case of stable filters, however, the disturIn off-line processing where the entire input signal is stored in the computer memory and filtering is
executed afterwards, the main concern is to take into account the edge effects due to the finite data size
(see, e.g., Hamming, 1983): near the ends of the signal buffer some of the FIR filters input signal values
are missing (assumed to be zero). This would, in general, give rise to an error (transient) in the output
signal. In most cases this transient at the beginning and at the end of the output signal can be removed by
truncating the output signal.
117
bances decay exponentially and can be treated as having a finite length. In practice,
these transients are often serious enough to cause problems, such as clicks in audio
applications, and they are typically the most serious problem in the implementation of
tunable or time-varying recursive filters.
The waveform of the transient signal depends on the structure of the filter. For
example, recursive filters implemented with direct form (DF) I and DF II structures will
yield a different kind of transient signal after the change of filter coefficients when their
initial and target coefficients and the input signal are identical.
If the coefficients of a recursive filter are changed at constant time intervals, the disturbance caused by the transient bursts can yield a quasiperiodic signal. In audio applications, this is heard as a tonal sound.
3.5.4 Transient Elimination Methods for Time-Varying Recursive Filters
Despite the importance of the problem, there exists only a handful of research reports
on strategies for elimination of transients in time-varying recursive filters. To the
knowledge of the author, there are altogether five different techniques for this task:
1) a cross-fading method,
2) gradual variation of coefficients using interpolation (Mourjopoulos et al., 1990),
3) intermediate coefficient matrix (Rabenstein, 1988),
4) an input-switching method (Verhelst and Nilens, 1986), and
5) updating of the state vector (Zetterberg and Zhang, 1988).
These methods are briefly described in this section.
Cross-Fading Method
The cross-fading method is the most intuitive of all transient elimination techniques. In
this method, typically two copies, H1 (z) and H2 (z), of the time-varying filter are used.
The input signal is connected to both of these filters.
Let us assume that first filter H1 (z) is used for filtering the input signal. When the
coefficients need to be changed, the filter H2 (z) is given the target filter coefficients.
During a transition time Dn, the outputs of the filters are cross-faded (e.g., using linear
interpolation) so that afterwards only the output of filter H2 (z) is connected to the
overall output.
This method has been used, for example, in audio applications such as auralization,
where the incident angle of the binaurally processed audio signal is changed using digital filters designed based on measurements of head-related transfer functions (Jot et al.,
1995). In this application it may be necessary to cross-fade between three or four filters,
if the elevation anglein addition to the horizontal angleis being taken into account.
The overall output signal is then a linear combination of the filter outputs.
Gradual Variation Technique
In gradual variation of coefficients one also allows a transition time Dn (in samples)
during which the coefficients monotonically change from their initial values to the target values (Mourjopoulos et al., 1990; Zlzer et al., 199). Mourjopoulos et al. (1990)
defined a filter variation parameter h as
118
h=
M +1
DnT
(3.142)
where M is the number of intermediate coefficient sets and T is the sampling interval.
This parameter describes the number of intermediate coefficients per unit time. Note
that it is assumed that each intermediate coefficient set is running for several sampling
cycles.
Mourjopoulos et al. (1990) studied the useful range of values for parameter h by
analyzing the transients in an audio filtering application using a computational model of
human auditory properties. Their results were that when h is small (i.e., a small number
of intermediate filter coefficients is used) the transition is perceivable because there are
discontinuities in the filter parameters; at least 18 to 35 states per seconddepending
on parameter settingsshould be used to ensure smooth coefficient transition. Abrupt
changes generate audible transients. Mourjopoulos et al. (1990) conclude that these
distortions are most easily heard when the input signal is a sine wave. Speech and music
sounds mask the majority of the transient effects.
Nikolaidis et al. (1993) have considered a variation of the gradual variation technique where filter parameters (instead of coefficients) are interpolated. In this paper
they also propose a generalization where two filters, H1 (z) and H2 (z), are used in parallel. Both filters receive the input signal all the time but the output signal of only one
of them is connected to the output of the overall system. This goes on for nu sampling
cycles, which is determined as
nu =
Dn
M +1
(3.143)
At the same moment when the output of the filter H1 (z) is connected to the overall output, the coefficients of H2 (z) are changed to correspond the next intermediate set and
its output is disconnected. After nu samples, the output is switched again and the coefficients of H1 (z) are changed. Nikolaidis et al. (1993) also propose a two-speed implementation where the off-time filter runs at a higher sampling rate. With this approach
this filter in effect has a longer time (a multiple of n u ) to settle before its output is
switched on.
Intermediate Coefficient Matrix
Another technique to reduce the transients is the use of an intermediate coefficient
matrix (Rabenstein, 1988). There are altogether three coefficient matrices involved in
one change of filter characteristics: the initial, the intermediate, and the target matrices.
The intermediate matrix is used for one sampling cycle to compensate for the transient.
This approach involves assumptions on the nature of the input signal. The drawback of
this technique is that the filter has to be implemented in a state-variable form which is
far more expensive in terms of operations than a direct-form realization. For this reason
we feel that this technique is not suitable for real-time DSP applications.
Input-Switching Method
The input-switching method was introduced by Verhelst and Nilens (1986). Their
application was speech synthesis using a cascade formant filter bank where the coeffi-
119
(3.144a)
(3.144b)
where x(n) and y(n) are the input and output signal of the filter, respectively. The column vector v(n) contains the state variables, and the matrix F, vectors q and g, and
coefficient g0 contain the filter coefficients. These matrices and vectors depend on the
implementation structure of the filter.
The state-variable vector can be expressed as a function of the input signal x(n) and
coefficient matrices when the coefficients have been changed at time index nc, that is
(Zetterberg and Zhang, 1988)
120
k =0
v(n) =
n-1
F 2n-nc v(nc ) + F 2n-1-k qx(k), n > nc
k =nc
(3.145)
where v(0) is the initial state of the filter, and F1 and F 2 are the coefficient matrices
before and after sample index nc, respectively. After the coefficients have been
changed, the state vector can be expressed as [substituting the upper form of Eq. (3.145)
into the lower one and elaborating]
v(n) =
n-1
(3.146a)
k =0
with
Dv(nc ) =
nc -1
(3.146b)
k =0
The first term in (3.146a) is the contribution of the initial conditions to the state of the
filter. We assume that the initial transient has died out so that this term can be
neglected. The second term is the response of the filter to the input after the parameters
have changed, and the last term represents the transient in the state vector due to coefficient change. Note that Dv(nc ) depends on the difference of the coefficient matrices
F1 and F 2 which implies that a small change in the coefficients causes a transient
response with a smaller amplitude than a large change.
A way to completely eliminate the transient caused by the change of coefficients is to
subtract the term Dv(nc ) from the state vector at time nc (Zetterberg and Zhang, 1988).
Equivalently, the state vector can be replaced with the following vector (Vlimki et al.,
1995b, 1995d)
v 2 (nc ) =
nc -1
F2n -k -1qx(k)
c
(3.147)
k =0
121
H1 (z)
n = nc
x(n)
y(n)
H2 (z)
(3.148a)
1
M
F= 0
M
O
0
0
0
0 L
1
0
(3.148b)
q = [1 0 L 0]
(3.148c)
g = [b1 - b0 a1 b2 - b0 a2 L bN - b0 aN ]
(3.148d)
g0 = b0
(3.148e)
where ak and bk are the denominator and numerator coefficients of the recursive filter,
respectively (k = 0, 1, 2, ..., N). The coefficient a0 is equal to 1. The next-state and output equations of the state-variable representation are given by Eq. (3.144a) and
(3.144b), respectively. The transfer function of the recursive filter is
H(z) =
(3.149)
A key observation is that the state vector of a DF II structure is affected by the past
input values only in proportion to the magnitude of the impulse response hr (n) of its
recursive part, that is
122
x(n)
Signal
Filter
v 2 (n)
Transient
1 A(z)
Eliminator
H(z)
y(n)
z- Na
D(n)
hr (n) = 1,
n=0
q T F n-1q, n > 0
(3.150)
This is because vector g, that contains the feedforward coefficients, does not affect the
state vector in (3.144a). The elements of the updated state vector (at time n = nc)
]T
(3.151)
can be rewritten by means of the IR h2,r (n) of the recursive part of the filter H2 (z):
v2,m (nc ) =
nc -1
m = 1,2,K, N (3.152)
k =0
where yr (n) is the output signal of the recursive part of the IIR filter.
Thus, the knowledge of the effective length of the impulse response hr (n) (i.e., how
many values of this impulse response are observably nonzero for the application) helps
to estimate how many past input samples need to be taken into account in updating the
state vector. The effective length of an infinite impulse response can be determined
based on its time constant or by specifying an energy-based criterion. These approaches
are discussed in Appendix B of this thesis. Denoting the effective length of the filters
impulse response (in samples) by Na we can replace expression (3.147) by (Vlimki et
al., 1995b, 1995d)
v2 (nc ) =
nc -1
F 2nc -k -1qx(k)
k =nc - Na
Na
F2k -1qx(nc - k)
(3.153)
k =1
Let us consider implementation of the transient elimination technique when the timevarying filter H(z) = B(z)/A(z) is implemented using the DF II structure. The transient
elimination method then works as shown in Fig. 3.27: the input signal x(n) is fed into
two systems, the actual IIR filter H(z) (hereafter called the signal filter) and the transient eliminator that consists of its recursive part 1/A(z). When the filter coefficients are
changed, the coefficients of the eliminator are updated immediately. The coefficients of
the signal filter are updated Na samples after that, and at the same time the state vector
is copied from the transient eliminator to the signal filter. The parameter Na is called the
advance time (in samples) of the transient eliminator. The larger the value of this
parameter, the more the transient signal is being suppressed.
The state variables of the transient eliminator are updated according to the following
123
Output Signal
1.5
Amplitude
1
0.5
0
-0.5
-1
0
10
15
20
25
30
Time (samples)
35
40
45
50
35
40
45
50
Transient
Amplitude
0.2
-0.2
-0.4
0
10
15
20
25
30
Time (samples)
Fig. 3.28 The output signal of a first-order Thiran allpass filter when the delay parameter is changed from 1.42 to 0.42 at time index 20. The upper part shows the
output signal of the time-varying filter without transient elimination (solid
line) and the ideal output signal (dashed line). The lower part is the transient
signal which is obtained as the difference of the two upper signals.
equation
v2,k (n) = v2,k -1 (n - 1) for k = 2,3,K, N
(3.154a)
(3.154b)
k =1
It is seen that Eqs. (3.154a) and (1.154b) give a state-variable description of an allpole
filter without output. The output signal of the transient eliminator is not needed since
the purpose of this filter is to provide the state vector for the signal filter.
This implementation technique requires that the coefficient update is deferred by Na
sample cycles and during that time the eliminator is executed in parallel with the signal
filter as illustrated in Fig. 3.27. If only one eliminator is used, it is then possible to
change the coefficients every Nath sample at the most. This lag can be compensated for
if the next coefficient values are known beforehand at least Na samples prior to the
change. Then it is not necessary to defer the coefficient update but instead the eliminator can be started Na samples before the actual time of change.
The above method of transient elimination can be used with practically all kinds of
124
Amplitude
1
0.5
0
-0.5
-1
0
10
15
20
25
30
Time (samples)
35
40
45
50
35
40
45
50
Transient (Na = 0)
Amplitude
0.6
0.4
0.2
0
-0.2
0
10
15
20
25
30
Time (samples)
Fig. 3.29 The same example as in Fig. 3.28 but now the signal with the solid line is
obtained using the new transient elimination technique with advance time Na
= 0. This corresponds to clearing the state of the allpass filter at time index
nc.
recursive filters. The technique is most efficiently implemented when the feedforward
and feedback part of the filter have their own state variables or when only the recursive
part affects the state of the filter, such as in the case of the DF II structure. In these
cases the transient eliminator can be an allpole filter. On the other hand, if the state
variables depend on both the recursive and the nonrecursive parts, the transient eliminator must be a pole-zero filter.
Two-Speed Implementation of the New Method
The new transient elimination technique can be modified to enable instantaneous coefficient update as often as neededeven every sampling interval (Vlimki et al.,
1995b, 1995d). This can be realized by keeping the last N a input samples in a buffer
memory and executing the necessary computations in a single sample interval by running the eliminator at N a times the sampling rate of the signal filter. This two-speed
method, of course, requires sufficiently fast hardware.
125
Output Signal
Amplitude
1
0.5
0
-0.5
-1
0
10
15
20
25
30
Time (samples)
35
40
45
50
35
40
45
50
Transient (Na = 1)
Amplitude
0.1
0
-0.1
-0.2
10
15
20
25
30
Time (samples)
Fig. 3.30 The same example as in Fig. 3.28. The signal with the solid line is obtained
using the new transient elimination technique with advance time Na = 1.
3.5.6 Examples with a Time-Varying Thiran Allpass Filter
We have experimented with the new transient elimination method described in Section
3.5.5 to verify its quality. In the following examples, the input signal is a low-frequency
sine signal. The delay parameter D of a first-order Thiran allpass filter is changed from
1.418 to 0.418 at time index nc = 20. The corresponding pole radii are 0.173 and 0.410,
respectively. Note that the decay rate of the transient is primarily determined by the
pole of the target filter (i.e., Thiran allpass filter with D = 0.418).
The result without transient elimination is given in Fig. 3.28. The upper part shows
the output signal of the filter with a solid line and the output signal of an ideal timevarying FD filter with a dashed line. The ideal result is obtained by concatenating two
shifted sine signals (i.e., output signals of two ideal interpolators). The lower part of
Fig. 3.28 gives the difference of the two output signals. This signal can be interpreted as
the transient. An exponentially decaying transient with peak value 0.41 is produced.
In Fig. 3.29, the state of the filter is cleared at the time of coefficient change. This
case is obtained with the transient eliminator when the advance time parameter Na is set
to zero. The amplitude of the transient is now 0.58substantially larger than without
using the eliminator. This trick is definitely not worth using in practice.
Figure 3.30 presents the result obtained with advance time Na = 1. The amplitude of
the transient is 0.24. In effect, the first sample of the transient in Fig. 3.29 has now been
truncated. It appears that Na should be larger to suppress the transient.
126
Amplitude
1
0.5
0
-0.5
-1
0
10
15
20
25
30
Time (samples)
35
40
45
50
35
40
45
50
Transient (Na = 4)
-3
x 10
Amplitude
10
-5
0
10
15
20
25
30
Time (samples)
Fig. 3.31 The same example as in Fig. 3.28. The signal with the solid line is obtained
using the new transient elimination technique with advance time N a = 4. The
slow oscillation in the lower figure is caused by the phase error of the Thiran
allpass filter.
Finally, Fig. 3.31 shows the result with Na = 4. Now the maximum amplitude of the
difference signal is 0.0099. The low-frequency oscillation due to the phase approximation error of the Thiran allpass filter is clearly visible in Fig. 3.31.
3.6 Conclusions
Methods for the design of FIR and allpass fractional delay digital filters were studied.
The maximally flat design of both the FIR and allpass interpolating filters has turned
out to be useful in the context of waveguide modeling and they are also the only FD
filter designs that have closed-form formulas for the coefficients. These techniques were
thus discussed in detail. A novel implementation technique for Lagrange interpolation
was introduced in Section 3.3.7. A new kind of parametrization was proposed for the
Thiran allpass filter that was originally introduced by Laakso et al. (1992). The approximation errors were investigated and an optimal range for the delay parameter of Thiran
allpass filters of different orders was determined.
Finally, this chapter discussed time-varying fractional delay filters. The FIR filters
do not suffer from transients when their coefficients or locations of filter taps are
changedunless the filter has been implemented in an incorrect way. The output signal
127
of an FIR filter will, however, be discontinuous at the time of the change. Recursive
filters suffer from both problems: transients and discontinuities. Several transient elimination methods from recent DSP literature were reviewed.
A new technique to suppress the transients in recursive filters was proposed and its
quality was analyzed. This method can suppress the transient signal as much as needed
by increasing the advance time parameter. The amount of computation is directly proportional to this parameter. The implementation costs of this technique also depend on
the structure of the recursive filter. The direct-form II realization results in an efficient
method where effectively 1.5 filters need to be implemented and executed in parallel:
the signal filter itself and a copy of its recursive part. This solution is computationally
more efficient than any of the previously known transient elimination techniques.
In the next chapter, interpolated waveguide filter structures that employ fractional
delay filters are developed.
4.1 Deinterpolation
Another basic operation employed in fractional delay waveguide models side by side
with interpolation is called deinterpolation. It is needed when connecting digital
waveguides at fractional points. Deinterpolation was first introduced in Vlimki et al.
(1993a). Earlier, Karjalainen and Laine (1991) had been using this technique for modeling a vibrating string, but the novelty of this technique was not recognized at that time.
4.1.1 Introduction to Deinterpolation
Interpolation is used for estimating the value of the signal between known samples. In
contrast, deinterpolation is used for computing the sample values when the amplitude of
128
b)
x(n D)
x(n)
129
x(n D)
x(n)
Delay Line
Delay Line
Fig. 4.1 a) Interpolation from a delay line. b) Deinterpolation into a delay line.
x(n D)
a)
z1
z d
z1
z1
x(n D)
b)
z1
z d
Fig. 4.2 More detailed diagrams of a) interpolation and b) deinterpolation showing the
fractional delay elements.
the signal between sampling points is known. Thus, deinterpolation can be understood
as an inverse operation to interpolation in the sense that it is an operation used for
computing the input of a system while interpolation estimates the value of the output of
a system.
In Fig. 4.1a, an output is located at an arbitrary point on the delay line, and the output
value is generally obtained by bandlimited interpolation. In Fig. 4.1b, an input is located
at an arbitrary point along the delay line. If the input point does not correspond to one
of the sample points of the delay line but is directed between them, bandlimited deinterpolation must be applied to spread the incoming signal to the adjacent signal samples
of the delay line.
More detailed versions of the block diagrams given in Fig. 4.1 are presented in Fig.
4.2 showing the individual unit and fractional delay elements. One of the unit delay
elements of the delay line has been split into two pieces, that is
z 1 = z (d + ) = z d z
(4.1)
where d is the fractional delay (0 < d < 1) and d is the complementary fractional delay
defined by
d =1- d
(4.2)
By examining the two cases presented in Figs. 4.2a and 4.2b, it can be understood
The name deinterpolation has been given to this operation instead of the more obvious inverse interpolation because in mathematics, inverse interpolation denotes the problem of finding an estimate for the
argument x when the value of the function y = f(x) is given (see, e.g., Hildebrand, 1974, pp. 6870 or
Kreyszig, 1988, p. 969). Deinterpolation is thus not the same operation as inverse interpolation. Note also
that the operation of decimation, which is used in multirate signal processing as an inverse operation of
interpolation, is yet another technique.
130
z
z
-1
b)
-d
x(n - D)
z
-1
-1
z-d
x(n - D)
z-1
z-1
z-1
D=ND
(4.3)
where N is the order of the FIR interpolator. The fractional part d of D is related to d
according to Eq. (4.2). The relation of D and D is illustrated in Fig. 4.4 for an even filter
order N.
The ideal FIR interpolator and the corresponding ideal FIR deinterpolator can thus
be presented as systems with transfer functions z - D and z -D , respectively. Were an
Nth-order FIR interpolator and the corresponding Nth-order FIR deinterpolator casD
D
d
0
N/2
d
N1
Fig. 4.4 A nonrecursive interpolator approximates the delay D but a nonrecursive deinterpolator the complementary delay D. Also shown are the fractional delay d
and the complementary fractional delay d.
131
caded, the resulting system would merely cause an incoming sequence x(n) to be
delayed by N samples.
If the impulse response of an FIR interpolating filter approximating delay D is
reversed in time, the resulting filter approximates the complementary delay D . This
statement is easily confirmed by considering a hypothetical Nth-order linear-phase
interpolating filter with the frequency response
H(e jw ) = H(e jw ) e - jwD
(4.4)
The inversion in time corresponds to inverting the sign of the phase function. To make
the filter causal, it is necessary to add a delay of N samples. The resulting time-reversed
and delayed interpolator has a frequency response
Hrev (e jw ) = e - jwN H(e - jw ) = e - jwN H(e jw ) e jwD = H(e jw ) e - jw ( N - D)
= H(e jw ) e - jwD
(4.5)
where D is the complementary total delay defined by (4.3). This clarifies the relation of
the nonrecursive interpolator and deinterpolator. In Section 3.3.3 it was shown that for
Lagrange interpolation the coefficients for the delay N - D are the same as for D, but in
reversed order.
4.1.3 Implementing Deinterpolation Using an FIR Filter
Nonrecursive deinterpolation can be thought of as a superposition of the signal values
onto the delay line using the transposed FIR filter structure with interpolating coefficients (Vlimki et al., 1993a). The transpose of a digital filter structure is obtained by
changing the direction of all branches by replacing all branch nodes with summation
nodes (i.e., adders) and adders with branch nodes, and by interchanging the input and
output nodes of the system (see, e.g., Jackson, 1989, pp. 7476).
z-1
x(n)
h(0)
v1 (n)
h(1)
z-1
v2 (n)
h(2)
v N (n)
z-1
h(N)
y(n)
Fig. 4.5 The standard direct-form FIR filter structure that can be used for nonrecursive
interpolation when interpolating coefficients h(n) are used.
x(n)
h(N)
h(N - 1)
z-1
h(N - 2)
v N (n)
z-1
h(0)
v N -1 (n)
-1
v1 (n)
y(n)
132
The standard direct-form FIR filter is shown in Fig. 4.5 and its transpose in Fig. 4.6.
Note that the transposed structure in Fig. 4.6 has been vertically and horizontally mirrored to have the signal propagate from left to right or downwards.
The direct-form FIR filter computes a discrete convolution between its input
sequence x(n) and the coefficients h(n) whereas the transposed FIR structure copies
each input sample, multiplies each of them by one coefficient h(n) and adds the products to the delay line, which is represented by the state variables vk (n) for k = 1, 2, 3,
..., N. The output of the transpose structure is computed as the sum of the input sample
x(n) multiplied by one of the weighting coefficients h(n) and the value of the state variable v1 (n). This can be expressed using the state-variable description which is a standard tool in discrete-time signal processing. The next-state equation is given by
v(n + 1) = FT v(n) + gx(n)
(4.6a)
(4.6b)
where we have defined the state vector (i.e., the delay line) as
v(n) = [v N (n) vN -1 (n) L v1 (n)]
(4.6c)
the N N matrix F as
0
0
F = 0
M
1 0 L
0 1
0 0 O
O
0 0 L
0
0
M
1
0
(4.6d)
(4.6e)
(4.7a)
(4.7b)
133
x(n)
h(0)
h(1)
z-1
h(N)
h(2)
z-1
v N (n)
z-1
v N -1 (n)
y(n)
v1 (n)
(4.7c)
The vector v(n) and matrix F are defined as in Eqs. (4.6c) and (4.6d) above.
Equivalently, one may imagine that both the delay line of the transpose structure as
well as the order of state variables vk (n) have to be reversed to obtain the FIR realization of deinterpolation. Figure 4.7 illustrates the implementation of nonrecursive deinterpolation by the transpose nonrecursive structure with a reversed impulse response.
Apart from the ordering of coefficients this filter is equivalent to the transpose structure
of Fig. 4.6.
Based on the above discussion it is understood that the transfer function of the FIR
deinterpolator can be written as
Hd (z) =
hd (n)z -n =
n=0
(4.8)
n=0
This implies that if the transfer function H(z) of the FIR interpolator approximates the
transfer function z - D , then Hd (z) approximates the z-transform z -D as discussed earlier.
4.1.4 Adding a Signal to a Fractional Point of a Delay Line Using an FIR
Deinterpolator
Consider addition of a signal to an arbitrary fractional point on a delay line as illustrated
in Fig. 4.1b. The implementation of this system by means of nonrecursive deinterpolation is presented in Fig. 4.8. Note that now the deinterpolator does not have independent
state variables as was assumed in the preceding section, but employs the unit delays of
the delay line.
When a signal w(n) is deinterpolated onto the delay line, the resulting signal may be
expressed as
w(n)
h(0)
x(n)
z- M
h(1)
-1
h(2)
z-1
h(N)
-1
x (n - M - N)
Fig. 4.8 Adding a signal to a fractional point of a delay line using a deinterpolator.
134
(4.9)
where hrev (n) are the coefficients of the deinterpolation filter, and x(k) and x(k) are the
signals in the delay line before and after deinterpolation, respectively. For the next sampling cycle, the updated signal values are shifted so that we can write
x(n - M - k - 1) = x(n - M - k) for k = 0,1,2,K, N - 1
(4.10)
(4.11a)
(4.11b)
and h rev is the coefficient vector of the FIR deinterpolator containing the coefficients
hrev (n) that are the same as those used with interpolation but in reversed order, i.e.,
hrev (n) = h(N - n) for n = 0,1,2,K, N :
h rev = [hrev (0) hrev (1) L hrev (N)]T
= [h(N) h(N - 1) L h(0)]T
(4.11c)
The vector notation shows that the result of the deinterpolation appears in the delay line
itself.
4.1.5 Discussion
The same number of additions and multiplications is needed for the implementation of
interpolation and deinterpolation. However, in deinterpolation, cumulative additions to
a delay line are needed and in software implementation they usually require more than
one instruction, e.g., a read from memory, an addition, and a save to memory. Thus, in
practice deinterpolation may be computationally more expensive than interpolation.
It can be argued that deinterpolation could be implemented just as well by a directform FIR interpolator that approximates the delay of D samples instead of D. This is
true and seems to make the definition of a new technique such as deinterpolation quite
unnecessary. However, the main advantage of deinterpolation can be clearly observed
by examining addition to the delay line as illustrated in Fig. 4.8. If this system were
realized by a direct-form FIR interpolator, N extra unit delays would be required. Also,
the implementation would not be as intuitive, since the fractional delay would have to
be processed separately and the result be added to the delay line signal at a point located
about N/2 samples before the fractional point D. In the following, other benefits of
deinterpolation will become clear. Especially the implementation of fractional delay
junctions of waveguides is straightforward using nonrecursive deinterpolation .
135
Output
Delay Line
-r
-r
Delay Line
Fig. 4.9 Block diagram of the waveguide model for an ideal string.
Input
-M
-1
h(0)
z
h(1)
-1
h(2)
-1
Output
h(N)
Interpolation
-r
-r
Deinterpolation
h(0)
z- M
h(1)
-1
h(2)
z-1
h(N)
z-1
Fig. 4.10 FD waveguide filter for modeling a vibrating string of arbitrary length.
136
a)
b)
Delay Line 1
Delay Line 3
Interpolation
Delay Line 1
Deinterpolation
2-Port
Junction
Deinterpolation
Delay Line 2
2-Port
Junction
Interpolation
Delay Line 4
Delay Line 2
Fig. 4.11 Two alternative configurations for the implementation of a fractional delay
junction of two digital waveguides.
interpolator.
In certain cases the two delay lines that form the waveguide may be combined and a
more efficient realization results. For example, the system of Fig. 4.10 could be implemented by a single delay line with a reflection coefficient r 2 at one end. A single fractional delay filter could then be used to control its length. Only the delay between the
input and output points of the one-delay-line version is different from that of the
implementation of Fig. 4.10. This is usually acceptable. Other properties of the modified waveguide system remain the same.
4.2.2 Connecting Waveguides at Arbitrary Points
Deinterpolation proves to be an extremely useful operation in the context of fractional
delay junctions of digital waveguides. An example of the simplest FD junction is the
fractional delay endpoint shown in Fig. 4.10 above. Other elementary cases are
a junction of two waveguides,
a junction of a side branch connected to a waveguide, and
a junction of three waveguides.
A junction indicates a change of impedance. A junction of two waveguides disappears or reduces to pure transmission if the impedances of the connected waveguides
are equal. When more waveguides are coupled, the signal inside each one of them sees
that the input impedance of the junction is equal to the parallel connection of the
impedances of the other waveguides. Thus scattering occurs. Part of the wave is transmitted through the junction to the other waveguides while the rest is reflected back.
Figure 4.11 depicts two ways of connecting two digital waveguides at fractional
points. In Fig. 4.11a the two waveguides are represented by four distinct delay lines:
delay lines 1 and 2 form one digital waveguide and lines 3 and 4 the other. In contrast,
in Fig. 4.11b the left-going and right-going delay lines of the two waveguides have been
combined. In both cases, the signals entering the two-port scattering junction are interpolated out from the delay lines and the outcoming (transmitted) signals are deinterpolated onto the delay lines.
In the case of Fig. 4.11a some extra unit delays have to be reserved for each delay
line when FIR FD filters are used for realizing interpolation and deinterpolation. This is
because FIR interpolators need to make use of signal samples at both sides of the inter-
137
3-Port
Junction
Delay Line 4
Delay Line 1
Delay Line 3
Side Branch
Delay Line 2
Fig. 4.12 An example of a fractional delay junction of three digital waveguides: a side
branch in a waveguide.
polation point. The system of Fig. 4.11b does not usually require these extra data, since
there are unit samples on both sides of the junction anyway. It is, nevertheless, important to make sure that the junction is not located nearer than about N/2 unit delays away
from either end of the delay lines. Otherwise the accuracy of the interpolators has to be
reduced by lowering the order of the FD filter.
The above remarks are valid for FD junctions of three or more waveguides as well. A
special case of a junction of three waveguides is illustrated in Fig. 4.12. In this example
a side branch represented by delay lines 3 and 4 is connected to a waveguide. The threeport junction determines how much of each input signal is reflected back and how much
is transmitted to the other waveguides. The input signals of the junction are obtained by
interpolation and the output signals are superimposed onto the delay lines by deinterpolation. Later in this chapter this structure is applied to modeling of a finger hole of a
woodwind instrument. It is also useful for implementing the junction of the vocal and
nasal tract in speech production models.
138
Lips
Vocal Tract
L1
L2
L3
L4
b)
Vocal Folds
L5
Lips
Vocal Tract
L1
L2
L3
L4
L5
Fig. 4.13 The profile of the vocal tract and its approximation by a) cylindrical and b)
conical tube sections of different length.
4.3.1 Interpolated Scattering Junction
As discussed earlier in this chapter, the length of a waveguide may be adjusted by
applying interpolation and deinterpolation techniques. With the help of these operations
it is moreover possible to locate a scattering junction at a fractional point of a waveguide. The block diagram of a single fractional delay, or interpolated, two-port junction
is presented in Fig. 4.14. The input signals of the scattering junction (denoted by S)
are now obtained using interpolation and the output signals of the junction are fed back
to the delay lines using deinterpolation. The wave scattering component is computed by
139
Delay Line
Interpolation
Deinterpolation
S
Deinterpolation
Interpolation
Delay Line
Fig. 4.14 Block diagram of the interpolated two-port scattering junction. The block
denoted by S is a two-port scattering junction.
multiplying the difference of the interpolated signals with the reflection coefficient r .
A more detailed block diagram of a single FD two-port junction is shown in Fig.
4.15. This example is valid for pressure signals. In this figure it is seen that only the
reflected and transmitted signal components have to be explicitly computed; the direct
signals propagating in the two delay lines need no attention since the effect of the junction is superimposed onto the delay lines. Note that in the lower delay line, the interpolator approximates the complementary delay d and the deinterpolator the delay d. In
other words, the roles of these two operations are interchanged.
Based on the observation that only the reflected and transmitted signals have to be
explicitly interpolated and deinterpolated we may simplify the block diagrams for the
FD junctions. Figure 4.16 illustrates the junction computation for the volume velocity
and the acoustic pressure. The signal paths starting from the delay line indicate that the
signal is obtained via interpolation. Consequently, the paths leading to the delay line
denote that the signal has to be deinterpolated onto the delay line. It is seen that two
-1
-d
-d
z-d
-1
-1
-d
-1
Fig. 4.15 The FD two-port scattering junction for the pressure signal.
a)
b)
Delay Line
Delay Line
Delay Line
1
Delay Line
Fig. 4.16 Diagrams of the FD two-port junctions for a) volume velocity and b) pressure
signals showing that only the reflected and transmitted signal components for
both directions are actually computed.
140
Delay Line
Input
S0
S1
S2
R(z)
SM -1
Output
Delay Line
Fig. 4.17 Fractional delay M-tube model for sound pressure.
interpolations and two deinterpolations are required for the realization of an FD twoport junction, i.e., four filters are needed to turn a basic KL junction into a fractional
delay one.
A complete fractional delay M-tube model is shown in Fig. 4.17. In principle an arbitrary number of interpolated two-port junctions may be used between the endpoints of
the waveguide. One of the endpoints of the model may be fixed (the left end in Fig.
4.17) while the other has to be interpolated to enable continuous change of the total
length. In principle, this model does not suffer from the restrictions of the traditional
KL model that were discussed in Section 2.2.8 (see also Vlimki et al., 1994b). A
vocal tract model based on the FD waveguide approach can be controlled in a more
straightforward way, since the scattering junctions are movable (Vlimki et al., 1994a;
Kuisma, 1994).
4.3.2 Implementing the FD Scattering Junction Using FIR Filters
Interpolation and deinterpolation can be implemented by means of FIR filters as
described in Section 4.1. The computation of the interpolated KL junction using Nthorder FIR filters is depicted in detail in Fig. 4.18. It is assumed that the filter coefficients h(n) of all the interpolating and deinterpolating FIR filters in this configuration
are the same. The FIR filter coefficients h(n) can be designed using one of the methods
discussed in Chapter 3.
The signal value is first interpolated out from both the upper and the lower delay
line, i.e.
p+ (n, m + D) = hT p + (n, m)
(4.12a)
p- (n, m + D) = hT p- (n, m)
(4.12b)
where
h = [h(0) h(1) h(2) L h(N)]
(4.12c)
p + (n, m + 1)
p + (n, m + 2) L p + (n, m + N)
p - (n, m + 1)
p - (n, m + 2) L p - (n, m + N)
p + (n, m) = p + (n, m)
p- (n, m) = p - (n, m)
(4.12d)
(4.12e)
Here n is the time index and m the spatial index as was discussed in the beginning of
Chapter 2. Now it is again helpful to use the spatial variable although it is in principle
141
Deinterpolation
h(1)
h(0)
p + (n, m)
h(0)
h(2)
z-1
h(1)
h(0)
h(N)
Interpolation
-1
h(1)
+ r
p (n, m + D)
p (n, m + D)
h(2)
z
-1
Scattering
Interpolation
h(1)
z
Upper
Delay Line
z-1
h(0)
p (n, m)
z-1
h(2)
h(N)
h(2)
h(N)
z
-1
Lower
Delay Line
h(N)
Deinterpolation
Fig. 4.18 Implementation of an FD two-port junction for pressure signals using Nthorder FIR interpolators and deinterpolators. The interpolating coefficients
h(n) are assumed to be the same for all the filters.
redundant. Note that the interpolation is essentially a spatial operation in this context,
and it is appropriate to call D the fractional spatial index.
The difference of the interpolated signals p+ (n, m + D) and p- (n, m + D) is computed and multiplied by the reflection coefficient r, that is
w(n) = r[ p+ (n, m + D) - p- (n, m + D)] = r hT [p + (n, m) - p- (n, m)]
(4.13)
The resulting signal w(n) is then deinterpolated onto both delay lines. We can write the
updated delay line vectors as
p+ (n, m) = p + (n, m) + hT w(n)
(4.14a)
(4.14b)
Since the last term hT w(n) of the above equations is the same, it is economical to
compute it separately and then use it in both equations. Also, as shown by Eq. (4.13) it
is practical to first compute the difference of signal vectors p + (n, m) and p- (n, m) and
only then evaluate the inner product with hT . Thus it is seen that by reorganizing the
computation it is possible to achieve reduction of complexity. In Fig. 4.19, the two
interpolators as well as the two deinterpolators have been combined. This does not
reduce the number of additions (or subtractions) which is still 4N + 3. However, in the
straightforward configuration (Fig. 4.18) the number of multiplications is 4N + 5, but in
the optimized one (Fig. 4.19) only 2N + 3. The reduction in the number of multiplica-
142
Upper
Delay Line
Lower
Delay Line
h(0)
h(1)
z-1
-1
h(1)
h(N)
h(2)
-1
z-1
Interpolation
-1
z-1
+
h(2)
w(n)
h(N)
143
ing is defined as
H(z) =
h(n)z -n
(4.15)
n=0
It is assumed that the filter coefficients h(n) have been designed so that the filter
approximates the ideal bandlimited interpolation is some sense. The nominal value of
the total delay of this filter is D = D + d .
The following notation is used: the superscripts + and are used to denote that
the filter operates on the signal that propagates in the positive and the negative direction, respectively; the subscripts i and d indicate that the filter is used for interpolation or deinterpolation, respectively.
Let us first write the transfer function for each of the four FIR filters involved in the
FD junction. The signal value is interpolated from the upper delay line by the FIR interpolator with the transfer function Hi+ (z), that is
Hi+ (z)
hi+ (n)z -n
n=0
h(n)z -n = H(z)
(4.16)
n=0
The deinterpolator that superimposes the scattered signal onto the upper delay line
has the transfer function Hd+ (z). This filter approximates the complementary delay D =
D + d where d is the complementary fractional delay. Thus the impulse response of
the filter is the mirror-image of the prototype filter H(z), that is, hd+ (n) = h(N - n) for n
= 0, 1, ..., N, and the transfer function may be expressed as
Hd+ (z)
hd+ (n)z -n
n=0
h(N - n)z
-n
=z
-N
n=0
h(n)z n = z - N H(z -1 )
(4.17)
n=0
In the lower delay line the fractional delays d and d have changed their roles as is
seen, e.g., in Fig. 4.15. Thus, the interpolator Hi- (z) approximates the complementary
fractional delay d and the deinterpolator Hd- (z) the fractional delay d. This implies that
now the impulse response of the interpolator must be reversed and the transfer function
Hi- (z) is written as
Hi- (z)
hi- (n)z -n
n=0
(4.18)
n=0
The transfer function Hd- (z) of the filter that deinterpolates the scattered signal onto
the lower delay line is thus
Hd- (z)
n=0
hd- (n)z -n
h(n)z -n = H(z)
(4.19)
n=0
It is seen that the interpolator of the upper line and the deinterpolator of the lower
line have the same transfer function, that is
Hi+ (z) = Hd- (z) = H(z)
(4.20)
The equivalence of these transfer functions is a consequence of the fact that they both
144
(4.21)
since both the filters Hd+ (z) and Hi- (z) approximate the complementary delay D .
4.3.4 Choice of FIR FD Filter Design
In the above discussion on interpolated waveguide models, the properties of the FIR
fractional delay filters used for interpolation and deinterpolation have not been taken
into account. The theory of the FDWFs is independent of the choice of filter design
method. Let us now consider how to choose the design technique.
The FIR FD filter should yield a good approximation of bandlimited interpolation at
least at low frequencies (from about 50 Hz to some kHz). This is because the human
auditory system is more sensitive at these frequencies than at higher frequencies.
Strictly speaking the filter design technique should take into account the properties of
human hearing. One possibility is to specify an auditory-based weighting function in the
frequency domain and use that as an error weighting function in a filter design algorithm. For example, the complex LS design and minimax (Chebyshev) design algorithms allow the use of an arbitrary non-negative weighting function (Laakso et al.,
1994).
However, the author feels that it is unnecessarily involved to try to design FIR filters
that are optimal in the auditory sense. Bandlimited approximation of ideal interpolation
can be considered a more practical choice. It is apparent that oversampling by a small
factor is necessary in order to obtain a satisfying approximation with a low-order FD
filter. The upper part of the frequency band is left as a guard band.
FIR FD filter design methods that give good approximation on a limited low-frequency band include Lagrange interpolation, general complex least-squares design, and
equiripple design (Laakso et al., 1994). Lagrange interpolation has several good properties (see Section 3.3): it naturally gives a low-frequency approximation, its coefficients
can be computed using a closed-form formula, and its magnitude response does not
exceed unity.
In the general complex LS design (see Section 3.2.4), the approximation band can be
specified (leaving the rest of the frequency band as a dont care band). The filter coefficients are designed solving a matrix equation. In principle, the complexity of the
design should not influence the choice of the filter since it is possible to design the coefficients beforehand and store them into a lookup table. In most practical cases, this is
also the most efficient implementation technique for coefficient update.
From the viewpoint of waveguide modeling, a drawback of the general complex LS
design is that its magnitude response oscillated around unity, potentially exceeding it at
several frequencies. When the FD filter is used in a feedback loop, the magnitude
response of the filter must not be greater than one at any frequency. Otherwise the overall system may become unstable. Hence, it is necessary to determine the maximum of
the magnitude response of the filter for each delay value D, and then scale the output of
the filter (or, equivalently, the filter coefficients) by the inverse of the maximum value.
Also the equiripple (minimax) design results in a filter which has an oscillating
magnitude response. Thus scaling of the filter is necessary in order to restore the stability of the interpolated waveguide filter model. Oetken (1979) has proposed a simple
145
T (z)
FD KL
Junction
+
R (z )
R (z)
T (z)
Fig. 4.20 Definitions of the transfer functions.
design technique that yields an approximately equiripple solution (see Section 3.2.5).
This technique involves design of a linear-phase prototype filter, solving the frequencies
where the approximation error (deviation from the ideal interpolator) is zero, and a
matrix multiplication.
In order to have an FDWF system that can be implemented efficiently, it would be
advantageous to use low-order FIR FD filters. Even-length (i.e., odd-order) FIR FD filters are easier to use especially in time-varying systems, since the filter taps need to be
moved only when the integer part of the total delay D is changed. Based on experiments, it has been found that a third-order interpolating filter is a suitable compromise
in terms of implementation costs and quality.
An easy first choice for an FIR FD filter design technique is Lagrange interpolation.
If it is good enough, it is not necessary to consider the other methods. The complex
general LS design and equiripple design yield smaller approximation error at middle
frequencies and give a wider usable frequency band. Later in Section 4.3.7 these design
methods are applied for the simulation of an acoustic tube.
4.3.5 Reflection and Transmission Functions
We proceed by studying the reflection and transmission transfer functions of the FD
junction. These results were originally published in Vlimki et al. (1994b) but the
notation differs slightly from that used here. The definitions of the transfer functions are
illustrated in Fig. 4.20.
The transfer function through the FD junction in the positive direction, i.e., T + (z) in
Fig. 4.20, may then be expressed as
= z - N 1 + r H(z) 2
(4.22)
It can be seen from this z-transform that the impulse response (IR) through the port consists of a delayed unit impulse plus the convolution of h(n) and the time-reversed (and
delayed) IR hrev (n) scaled by r. The length of the IR is thus 2N + 1. Note that when the
transfer functions Hi+ (z) and Hd+ (z) are cascaded, their phase functions cancel each
other out and a linear-phase IR results.
Similarly, a linear-phase transfer function is obtained for transmission in the negative
direction (see Fig. 4.20)
Strictly speaking the last equality in (4.20) is true only when z = ejw but we use this notation for convenience.
146
Linear Magnitude
1
0.8
0.6
0.4
0.2
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
1
0.8
0.6
0.4
0.2
0
0
Fig. 4.21 The magnitude and phase delay responses of the reflection function R + (e jw )
of the FD two-port junction. In the upper part the magnitude of the reflection
function using first (dashed line) and third-order (solid line) interpolators is
illustrated. The corresponding phase delay curves are shown in the lower
part. The dotted line corresponds to the desired curve. In this example r = 0.5
and d = 0.4.
(4.23)
(4.24)
(4.25)
Here the transfer functions of both the interpolation and deinterpolation filters appear in
the time-reversed and delayed form. Also this transfer function does not in general have
a linear phase.
147
Linear Magnitude
2
1.5
1
0.5
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
1
0.8
0.6
0.4
0.2
0
0
Fig. 4.22 The magnitude (upper) and phase delay (lower) responses of the transmission
function T + (e jw ) of the FD KL junction. The dashed lines correspond to the
linear interpolator and the solid lines to the third-order Lagrange interpolator.
The dotted line corresponds to the desired ideal curve. In this example r = 0.5
and d = 0.4.
By setting z = e jw in the above transfer functions it is possible to study the behavior
of a single FD two-port junction. In the following examples we use first and third-order
Lagrange interpolators and show the magnitude response and phase delay in two cases.
In these examples r = 0.5 corresponding to a configuration where two tubes with diameters A1 and A2 = 3A1 are connected. The fractional delay is d = 0.4, which is nearly the
worst case .
In Fig. 4.21 the reflection function R + (z) is studied. In the upper part of Fig. 4.21
the magnitude response R + (e jw ) obtained from Eq. (4.24) is shown for the cases
where the first (N = 1) and third-order (N = 3) Lagrange interpolators are used for the
implementation of the FD two-port junction. Both magnitude responses have lowpass
characteristics, which follows from the lowpass nature of the Lagrange interpolators.
Remember that the error due to FIR FD approximation is applied twice in the reflection
function as demonstrated by Eq. (4.24).
In the lower part of Fig. 4.21 the phase delay response is plotted for both cases. We
have subtracted a delay of 2 samples from the phase delay of the third-order interpolator
In the worst case, i.e., when the fractional delay is 0.5, the odd-order Lagrange interpolators have linear
phase, and the effect of phase error would then not be shown in the figures. For this reason we use a delay
value that is nearly but not exactly 0.5.
148
to more clearly view the difference of the two responses. The ideal phase delay 2d is
represented by a dotted straight line. The phase delay of both responses is identical with
the ideal one at low frequencies, but at high frequencies the curves approach zero.
The transmission function T + (z) of the two-port FD junction is examined in Fig.
4.22. The magnitude response T + (e jw ) obtained from Eq. (4.22) is again shown for
both the linear and the third-order interpolator (see upper part of Fig. 4.22). At low frequencies both curves coincide with the ideal result (dotted straight line), but at high frequencies they approach unity because the magnitude responses of the Lagrange interpolators approach zero. The third-order interpolators yield a slightly better result. The
phase delay of both transmission functions is constant corresponding to a linear phase
function. This is due to the cancellation of the phase responses of the interpolators and
deinterpolators, as mentioned earlier.
From these examples we may derive some general properties of the FD KL junction
implemented using Lagrange interpolation.
The reflected signal is a lowpass filtered version of the input signal. The phase of
the reflected signal can be severely distorted at high frequencies.
Transmission through the junction attenuates high frequencies of a signal but
does not cause any phase distortion.
4.3.6 Analysis of Errors Due to FD Approximation
The approximation errors of the FD filters degrade the performance of the waveguide
model. The interpretation illustrated in Fig. 4.23 helps in the error analysis (Vlimki et
al., 1994b). The blocks e - jwD and e - jwD denote the frequency responses of the ideal
fractional delay and ideal complementary fractional delay, respectively. Here we have
normalized the sampling rate to 1, i.e., the sample interval T is also equal to 1. The
complementary total delay D in Fig. 4.23 is defined by Eq. (4.3) as
- jwD
+
e - jwD
jw
Gd (e jw )
r
- jwD
Gi (e jw )
Gi (e
Gi (e
jw
)
e - jwD
Fig. 4.23 Temporal flow diagram of the FD KL junction for pressure waves with ideal
fractional delays of lengths D and D . The frequency response functions
Gi+ (e jw ) and Gi- (e jw ) include the interpolation error and Gd+ (e jw ) and
Gd- (e jw ) the deinterpolation error, and r is the reflection coefficient. Note
that in this figure the direction of the lower delay line has been reversed to
avoid crossing of signal paths.
149
25
Magnitude (dB)
20
15
10
-5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
Fig. 4.24 The magnitude responses of the FD two-tube model with linear interpolation
(solid line) and ideal interpolation (dashed line). The dotted line shows the
magnitude response of a uniform tube.
D=N-D
(4.26)
The frequency response functions G(e jw ) include the approximation error due to the
fractional delay filters. Were ideal interpolating filters used, the frequency responses
G(e jw ) would be equal to unity. These functions are defined as
Gi+ (e jw ) = e jwD Hi+ (e jw ) = 1 - e jwD Ei+ (e jw )
(4.27a)
(4.27b)
(4.27c)
(4.27d)
where the error functions E represent the difference between the frequency response of
the ideal delay element and the approximation. They are defined by
Ei+ (e jw ) = e - jwD - Hi+ (e jw )
(4.28a)
(4.28b)
150
25
Magnitude (dB)
20
15
10
-5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
Fig. 4.25 The magnitude responses of the FD two-tube model with third-order
Lagrange interpolation (solid line) and ideal interpolation (dashed line). The
dotted line presents the magnitude response of a uniform tube.
Ei- (e jw ) = e - jwD - Hi- (e jw )
(4.28c)
(4.28d)
By using the above definitions it is possible to study the effects of the FD approximation errors due to the two-port junction. Setting all E(e jw ) = 0 or, equivalently, all
G(e jw ) = 1, results in the ideal frequency response.
4.3.7 Simulation of a Two-Tube System Using FIR FD Filters
As an example we shall investigate the frequency response of a two-tube model implemented with an interpolation waveguide model. The total length of the tube corresponds
to eight unit delays. The lengths of the two tube sections are 3.5 and 4.5 samples and
the scattering junction is thus located half way between unit delays. This can be considered the worst case for FIR FD filters since their approximation error is largest when d
= 0.5. The reflection coefficient at the junction of the two tubes is 0.5. One end of the
tube is assumed to be closed and the other one open, but in order to simplify the
example the terminations have been approximated with constant reflection coefficients
r1 = 0.9 and r2 = 0.9.
Figures 4.24 and 4.25 show the magnitude responses of this simulated system with
first and third-order Lagrange interpolation, respectively. The solid line in both figures
151
25
Magnitude (dB)
20
15
10
-5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
Fig. 4.26 The magnitude responses of the FD two-tube model with general LS FIR
design (N = 3, a = 0.5) (solid line) and ideal interpolation (dashed line). The
dotted line presents the magnitude response of a uniform tube.
is obtained by substituting the sampled frequency response of the interpolating filter
while the dashed line results from setting the error frequency response to zero as proposed above. This corresponds to the case of ideal interpolation, or equivalently, an
infinite sampling rate. The dotted line in figures 4.24 and 4.25 shows the magnitude
response of a uniform tube that is eight unit delays long. (The uniform tube is obtained
when the reflection coefficient is zero in the two-tube model.) The formants of the
uniform tube are equally spaced along the linear frequency axis.
It is seen in both cases that at low frequencies the magnitude response of the FD
model coincides with the ideal one, whereas at high frequencies it joins the magnitude
response of the uniform tube. This is because the interpolators are lowpass filters that
have a transmission zero at the Nyquist frequency, and thus the reflection coefficient of
the FD junction effectively approaches zero (i.e., the uniform case) when the frequency
is increased. The third-order interpolator (Fig. 4.25) yields a much better result at
middle frequencies than the linear interpolator (Fig. 4.24).
For comparison, the two-tube system was simulated using general LS (GLS) and
equiripple FIR FD filters. These filter design methods were discussed in Sections 3.2.4
ad 3.2.5, respectively. The magnitude responses of these two-tube simulations are presented in Figs. 4.26 and 4.27. Also in these figures the solid line gives the result of the
FIR filter approximation, the dashed line the ideal result, and the dotted line the magnitude response of a uniform tube. The bandwidth of approximation for the GLS and
152
25
Magnitude (dB)
20
15
10
-5
0.05
0.1
0.15
0.2
0.25
0.3
Normalized Frequency
0.35
0.4
0.45
0.5
Fig. 4.27 The magnitude responses of the FD two-tube model with equiripple (Oetken)
FIR interpolation (N = 3, a = 0.5) (solid line) and ideal interpolation (dashed
line). The dotted line presents the magnitude response of a uniform tube.
equiripple filters was a = 0.5, i.e., the results are optimal up to the normalized frequency 0.25 in Figs. 4.26 and 4.27. The filter coefficients of the GLS and the equiripple
filter were divided by 1.0157 and 1.0224, respectively, in order to force the maximum
of the magnitude response to be equal to one.
To enable comparison of the four simulated results, the errors in the amplitude of the
formants were computed in each case. The results are collected in Table 4.1. The number in parentheses after the name of the filter design method indicates the order N of the
FIR filter approximation. The leftmost column gives the formant number and the sec-
Table 4.1 Formant amplitude errors (dB) in the two-tube simulations (see text).
Formant
1
2
3
4
5
6
7
8
fk/fs
0.021
0.10
0.15
0.22
0.28
0.34
0.42
0.46
Oetken (3)
0.410
0.340
0.0798
0.164
0.642
3.94
3.53
5.13
153
ond column the normalized center frequency of each formant. The error in each case is
the difference of the peak magnitude of the approximation and the ideal magnitude
response. Note that there are also remarkable errors in the center frequencies of
formants at high frequencies in Figs. 4.244.27 but they are not presented in Table 4.1.
It is seen that the magnitude error of the Lagrange interpolators increases with frequency. The errors of the GLS and Oetken (equiripple) filter approximations are less
than 1 dB for the five lowest formants. At high frequencies the errors are several dB in
all cases.
It can be concluded that the use of FIR FD filters for the realization of an interpolated waveguide model leads to a low-frequency approximation of a tube system.
Implementation of a vocal tract model, or other acoustic tube simulation, thus requires
oversampling and possibly a high-order interpolator to achieve a small enough approximation error up to a desired frequency. In Vlimki et al. (1994a) it was proposed that
the use of a sampling rate of 22 kHz and a third-order Lagrange interpolation yields an
effective bandwidth of about 5 kHz, which is quite adequate for high-quality speech
synthesis.
4.3.8 Conclusion
In this section we tackled simulation of an acoustic tube system, such as the human
vocal tract, with fractional delay filters. Combined use of nonrecursive interpolation and
deinterpolation enabled us to derive a structure for an interpolated scattering junction
which can be located at an arbitrary point along a digital waveguide. The main motivation for this new structure is the elimination of the most severe limitation of the KL
vocal tract modelthe use of equal-length tube sections. In the proposed new model,
the locations of the junctions are not restricted by sampling of the waveguide system.
Realization of the interpolated scattering junction using FIR-type fractional delay
filters was considered. The effects of approximation error on the reflection and transmission functions of the scattering junction, as well as on the frequency response of a
simulated tube system were discussed. FIR interpolation yields a low-frequency approximation of a waveguide system. The application of Lagrange interpolation, general LS
FIR design, and Oetkens almost-equiripple FIR FD design were considered for the
implementation of interpolated waveguide tube models. High-quality simulation
requires oversampling by a factor of two and at least four-tap FD filters for interpolation
and deinterpolation.
In Section 4.6 we consider implementation of acoustic tube models using allpass FD
filters. Next we tackle the simulation of an interpolated three-way scattering junction.
154
(4.29a)
(4.29b)
(4.29c)
where D is the real-valued spatial index corresponding to the position of the center of
the finger hole, and signal w(n) is the output of the reflection filter R(z), i.e.,
w(n) = r(n)*w1 (n)
(4.29d)
where r(n) is the impulse response of the reflection filter R(z), * denotes the convolution sum, and the input signal w1 (n) is
w1 (n) = ua+ (n, m + D) + ub- (n, m + D) + us- (n)
(4.29e)
The signal flow graph for a finger-hole model has been depicted in Chapter 2 in Fig.
2.20.
The input signals of the tone-hole model, ua+ (n, m + D) and ub- (n, m + D), can be
computed at an arbitrary point on the digital waveguide using interpolation. When FIR
interpolation is employed, the approximations to these signals can be written in vector
form as
ua+ (n, m + D) = hT u + (n, m)
(4.30a)
(4.30b)
(4.30c)
(4.30d)
(4.30e)
and N is the order of the FIR interpolator. The coefficients h(n) can be, e.g., the
Lagrange interpolation coefficients (see Section 3.3).
The results ua- (n, m + D) and ub+ (n, m + D) of the computation in Eqs. (4.29a) and
(4.29b) also have fractional spatial indices m + D and should thus be stored between
the samples of the delay line. In practice this is not necessary, since the signal
ub- (n, m + D) really is the same as ua- (n, m + D)both are part of the lower delay line
of the waveguide. Also ua+ (n, m + D) is the same as ub+ (n, m + D) . The subscripts a
and b essentially refer to the order of computation. This situation is exactly the same
155
as in the case of the FD KL junction (see Section 4.3.2) where the direct signal was not
processed at all. Thus the computations of the FD finger-hole model can be expressed in
vector form as
[
]
u+ (n, m) = u + (n, m) + h[us- (n) + w(n)]
u- (n, m) = u n- (n, m) + h us- (n) + w(n)
(4.31a)
(4.31b)
(4.32a)
(4.32b)
(4.32c)
Here the signal w(n) is the output of the reflection filter R(z). Its input signal w1 (n)
defined by Eq. (4.29e) also involves fractional indices, and thus it has to be computed
using interpolation, that is
w1 (n) = hT u + (n, m) + hT u - (n, m) + us- (n)
= hT [u + (n, m) + u - (n, m)] + us- (n)
(4.33)
The last form suggests an efficient way to calculate the input for the reflection filter
R(z): first add the values in the two delay lines of the digital waveguide and only after
that interpolate . Note that as the signals u + (n) and u - (n) propagate in opposite directions the coefficients of the interpolating filter appear in time-reversed order with
respect to one of them. In other words, in the upper delay line the FIR interpolating filter approximates the fractional delay d, while in the lower line it approximates the
complementary fractional delay d.
Note that altogether two FD filterings are involved in the interpolated finger-hole
model: interpolation is needed when Eq. (4.33) is evaluated and deinterpolation when
Eq. (4.32c) is evaluated. In addition, there is a simple digital filter R(z) is the system,
and two additions are needed. It is seen that there is considerably more computation in
this model than, e.g., in the FD KL model discussed in Section 4.3.
4.4.2 Approximation Error Due to FD Filters
As in the case of the FD KL junction, the approximation error due to the fractional
delay filters is in effect applied twice to the FD three-port: first in the interpolation of
the delay-line value and then in the deinterpolation of the result back to the waveguide.
This trick in the interpolated finger-hole model can be compared with the interpolated tube model
discussed above in Section 4.3. Figure 4.19 illustrates the idea that both interpolation and deinterpolation
only need to be applied once in an FD scattering junction.
156
us(n)
us+(n)
u(n - D )
Ei (z)
R (z )
w(n)
Ed (z)
digital waveguide
Fig. 4.28 A signal flow diagram of a finger-hole model showing the interpolation and
deinterpolation errors.
It is worth pointing out that the interpolated signal values are only used in the computation of the effect of the junction and deinterpolation is only applied to the result of this
calculation. Thus, the interpolation error does not affect the signal that would anyway
propagate in the waveguide if the junction were not there.
Figure 4.28 shows a high-level block diagram of the finger-hole model. The transfer
functions Ei (z) and Ed (z) present the errors of the interpolating and deinterpolating
filter, respectively. In this diagram the waveguide is considered ideal, i.e., no errors due
to digital implementation are present.
The effect of the FD approximation in the finger-hole model derived above can be
studied by computing the transfer function of a simple bore model. In this experiment
one end of the tube is assumed to be ideally closed (i.e., reflection coefficient is 1) and
the other end is assumed to be matched (or equivalently an infinite-length tube is considered). A finger hole is placed at a distance K = 50.5 unit samples from the closed
end. The transfer function of this system is
H(e jw ) =
-2 jw Ei (e jw )R(e jw )e - jwK
1 - Ei (e jw )R(e jw )Ed (e jw )e - jwK
(4.34)
where the transfer functions presenting the interpolation and deinterpolation error are
Ei (e jw ) = e jwTD Hi (e jw )
(4.35a)
Ed (e jw ) = e jwTD Hd (e jw )
(4.35b)
Figure 4.29 presents the magnitude responses for the ideal analog system, a system
with the reflection filter R(z) but with ideal fractional delays, and a realistic digital system with the digital reflection filter and an FD interpolator and deinterpolator implemented using first-order Lagrange interpolation (i.e., linear interpolation). The sampling
rate used in this example is 44 kHz. It is seen that in this worst case (i.e., the fractional
delay is d = 0.5) the resonances are only slightly disturbed in the frequency band of
interest, i.e., below 10 kHz. Thus it seems to be sufficient to use simple linear interpolators in the implementation of finger-hole models.
157
Magnitude (dB)
-5
-10
-15
-20
-25
0
10
Frequency (kHz)
15
20
Fig. 4.29 The magnitude responses of the ideal bore model (dashed line), a model with
a digital reflection function but ideal interpolators (dotted line), and its realistic implementation with first-order Lagrange interpolation and deinterpolation (solid line).
4.4.3 Implementation Issues
A single model for an open finger hole is computationally relatively efficient. However,
woodwind instruments have several, up to approximately twenty finger holes, and if all
of them were simulated by the technique introduced above, the computational load
would be extremely large. Fortunately, it is not necessary to implement all the open
holes since only the two or three first open holes determine the fundamental pitch of the
instrument (Benade, 1960).
The closed holes have only a slight influence on the effective length of the bore and
they also act as lowpass filters. These effects can be taken into account by far simpler
methods than using a fractional delay three-port junction. Thus, it is suggested that a
woodwind bore model with finger holes be implemented based on two or three FD
junctions that model the first open holes. When a hole is closed the junction is removed
and it is grown at another location in the waveguide. Using this strategy, the typical
playing techniques of woodwind instruments, including the key claps, can be simulated.
More tone holes may be needed for producing multiphonics.
158
a1
x(n)
z M
z1
x (n M D)
a1
Fig. 4.30 Implementation of a variable-length digital delay line by means of a firstorder allpass filter.
159
form of an allpass filter that has a minimum number of delay elements, since the delays
can be shared with the delay line as shown in Fig. 4.30. This scheme can be extended
for higher-order allpass filters.
The structure illustrated in Fig. 4.30 implements the following difference equation
y(n) = x(n - M - D) = a1[ x(n - M) - y(n - 1)] + x(n - M - 1)
(4.36)
Note, however, that the structure in Fig. 4.30 is not a direct-form structure and thus it
may not be apparent that the underlying difference equation is in fact Eq. (4.36).
The coefficients of the interpolating allpass filter can be computed applying one of
the techniques presented in Section 3.4. The simplest and the most useful allpass design
technique for audio signal processing is the maximally-flat group delay, or Thiran,
design (see Section 3.4.3).
4.5.2 Implementing Interpolation and Deinterpolation with Allpass Filters
Figure 4.31 illustrates the first-order allpass filter structure that realizes an output at a
fractional point of a delay line. This system is comparable to that presented in Fig. 4.30.
The principal difference between these structures is that in Fig. 4.31 we assume that the
delay line continues after the fractional point and the allpass filter merely approximates
the signal value at one point along that delay linenot at its end point, like in Fig. 4.30.
When the filter coefficient a1 has been chosen in an appropriate way, the output y(n) of
this structure approximates the value of signal x(n) at point M + D along the delay line,
that is
y(n) = x(n - M - D)
(4.37)
The structure shown in Fig. 4.31 directly implements the following difference equation:
y(n) = a1[ x(n - M) - y(n - 1)] + x(n - M - 1)
(4.38)
This time we use the first-order allpass structure with two unit delay elements and a
minimum number of multiplications. This choice is appropriate since one of the unit
delays can be shared with the delay line to be processed and the other unit delay is
reserved for the recursive part of the filter. The allpass structure with a minimum number of delay elements would not be more efficient since the recursion prevents the
sharing of the unit delay with the delay line. For different structures for the implementation of first-order allpass filters, see Mitra and Hirano (1974).
Next we develop a recursive version of deinterpolation using an allpass filter. The
+
x(n)
z- M
-1
a1
y(n)
-1
Fig. 4.31 A first-order allpass interpolator is used for the implementation of an output
at a fractional point of a delay line.
160
w(n)
z-1
a1
x(n)
-M
-1
x (n - M - D)
(4.39)
a1 + z -1
1 + a1z -1
(4.40)
Ad (z) =
Note that the filter coefficients a1 and a1 are not equal in these structures and they do
not, in general, have a simple relation. In allpass interpolation the coefficient should be
chosen so that the filter approximates the delay D + d , where d is the fractional delay.
The coefficient a1 of the allpass deinterpolator, however, should be chosen so that the
allpass filter approximates the delay D + d , where d is the complementary fractional
delay.
161
1+ r
z
-1
-d
1- r
z -d
b)
1+ r
z
-1
z -1
z -d
z -1
-r
r
z -1
z -d
z -d
-d
z -1
z -d
z -d
r
-r
z -d
z -d
1- r
z
-1
-1
z -d
-d
c)
1+ r
z
-1
z -1
z -1
-r
AD (z)
AD ( z)
r
1- r
z -1
z -1
z -1
Fig. 4.33 Derivation of the FD scattering junction employing commutation. (a) The
original FD scattering junction. (b) The FD elements have been pushed
through the branch nodes and adders. (c) The structure that is implemented
with two first-order allpass filters.
The block diagram presented in Fig. 4.33c is one possible allpass implementation of
the FD junction. Other configurations can also be derived but this one has some desirable properties. We realize the double fractional delay elements using allpass filters
AD(z) and AD(z). Consequently, the desired delay D and complementary total delay D
are defined by
D = 2d
(4.41)
D = 2d = 2 D
(4.42)
This implies that when the noninteger part d of the position of the junction can have
values in the interval [0, 1], the delays implemented by the two allpass filters will be in
the interval [0, 2].
Let us consider implementation of the allpass FD junction using the Thiran allpass
filter that was discussed in Sections 3.4.3 and 3.4.4. The choice of the optimal range for
162
the delay parameter D was studied in Section 3.4.4. For example, the first-order Thiran
interpolator yields best results when the delay varies between D0 = 0.42 and D0 + 1 =
1.42. For Thiran allpass filters of different order, the optimal range is also different.
However, an easy rule of thumb for the delay parameter is to maintain it in the range [N
0.5, N + 1.5] where N is the order of the allpass filter.
The implementation of the allpass FD junction described above needs special care,
since both the allpass filters used in this structure must be able to approximate fractional
delays within two unit delays. In the case of the first-order Thiran allpass filter, the
junction can be implemented as shown in Fig. 4.33c only when the junction is located
0.250.75 samples away from the nearest unit delay. Only then is the delay of the
allpass filters AD(z) and AD(z) within the desired range [0.5, 1.5].
When the delay d is smaller than 0.25 or larger than 0.75, the delay D approximated
by the filter AD(z) goes beyond the desired range. In order to change the delay of the
reflection function, we may add or subtract unit delays to or from that allpass filter
simply by changing its input point one unit delay to either direction. If the input point of
the allpass filter A D(z) is moved one unit delay to the left in Fig. 4.33c (i.e., the input of
the allpass filter is taken one sample interval earlier), then one sample of delay must be
added to D in order to maintain the overall delay. Similarly, if the input point is
switched one unit delay to the right (i.e., the input to the allpass filter is delayed by one
unit delay), one sample of delay must be subtracted from the filters nominal delay D.
Another possibility is to change the location of the filters output point, i.e., the point
where the output of the allpass filter AD(z) is added to the lower delay line in Fig. 4.33c.
These are two alternative methods that help to maintain the nominal delay of the allpass
filter within the desired range. One of these techniques must be used for the complementary allpass filter A D (z) as well.
The above discussion concerned the first-order Thiran allpass filter. However, similar procedureschange of the location of input or output pointshould be used to
maintain the delay of the filter in its best range. In the case of higher order allpass filters, one or more unit delays must usually be added to the filters nominal delay value
because these filters cannot implement an arbitrarily small delay. Remember that the
Thiran allpass filter is unstable for delay values D < N 1. For example, the secondorder Thiran filter cannot approximate a delay smaller than one sample.
4.6.2 Approximation Errors due to an Allpass Fractional Delay Junction
Let us consider the approximation error due to allpass interpolators. In the structure of
Fig. 4.33c, there are no FD elements in the straight path in neither direction. Thus there
is no approximation error in the transmission function through this allpass FD junction.
The only limitation that this solution brings about is that the point where the transmission occurs is not explicitly implemented and the signal value at that point cannot be
directly obtained. At the open end of a tube this causes an error in the delay of the output signal (radiated sound). This does not affect the formant structure nor other important properties of the signal and can be easily avoidedif desiredby incorporating an
additional FD filter at the output.
The reflection from the FD junction suffers from an approximation error since the
double FDs in Fig. 4.33b are realized using allpass filter approximations A D(z) and
AD(z). When the MF (Thiran) allpass FD filters are used, the nature of this error can be
predicted from the phase delay responses given in Section 3.4.3.
163
25
20
Magnitude (dB)
15
10
-5
0
6
Frequency (kHz)
10
Fig. 4.34 Transfer function of a two-tube system realized using first-order Thiran allpass filters (solid line). For comparison, the simulation result obtained with
ideal interpolation (dashed line) is also given. The location of the junction is
d = 0.25.
The approximation error is zero when the nominal delay of the filter is 1.0. This
implies that the approximation error disappears when the junction is located half way
between unit delay, i.e., d = 0.5. This is in sharp contrast with the FIR FD junctions,
where the approximation error is largest when d = 0.5.
For other values of fractional delay, the approximation error is nonzero, and its effect
can be characterized as follows. At very low frequencies the junction (impedance discontinuity) is located very nearly in the correct position, but with increasing frequency
the junction tends to move towards some erroneous location. This is because the Thiran
allpass filter approximates the delay accurately at low frequencies but poorly at high
frequencies. We may assume that at low frequencies the formant structure of a tube
model implemented with these allpass filters is similar to the ideal one, whereas at
higher frequencies the center frequencies and amplitudes of formants are incorrect. The
following example confirms that this is actually the case.
4.6.3 Simulation of a Two-Tube Model with Allpass Filters
In order to compare the properties of allpass and FIR implementations of the interpolated waveguide model, a two-tube model with one FD junction was simulated. The
example system is the same as the one presented in Section 4.3.7. The total length of the
164
25
20
Magnitude (dB)
15
10
-5
0
6
Frequency (kHz)
10
Fig. 4.35 Transfer function of a two-tube system realized using first-order Thiran allpass filters (solid line). For comparison, the simulation result obtained with
ideal interpolation (dashed line) is also given. The location of the junction is
d = 0.75.
tube system is eight unit delays. The junction between the tubes is located between the
third and fourth unit delays, and the reflection coefficient is 0.5. One end of the tube is
assumed to be closed and the other one open. To simplify the example, the terminations
have been approximated with purely resistive loads, r1 = 0.9 and r2 = 0.9.
Figures 4.34 and 4.35 show the magnitude of the transfer function of this system for
two junction positions. The solid line has been computed using first-order MF allpass
filters and the dashed line was obtained with ideal interpolators. The sampling rate is 22
kHz. In Fig. 4.34, the location of the FD junction is d = 0.25 corresponding to delay
parameters D = 0.5 and D = 1.5. In Fig. 4.35, the junction is situated at d = 0.75 between
the unit delays, which means that D = 1.5 and D = 0.5. These can be considered the
worst-case situations since the delay values are on the edge of the range.
The transfer functions of the ideal and approximative system are nearly identical at
low frequencies but at high frequencies the deviation is quite large. However, the magnitude error of the fourth formant (at about 4.7 kHz) is only 1 dB in Fig. 4.34 and 2 dB
in Fig. 4.35. These figures should be compared with Figs. 4.244.27 that have been
computed with FIR interpolators. It can be concluded that the result obtained with the
first-order allpass filters is clearly better (at low and middle frequencies) than with the
linear interpolators and about as good as with the third-order Lagrange interpolators.
The computational complexity of the allpass filter implementation is less than that of
z- d
-1
165
-d
b)
z
-1
z
z
-1
z- d
z
-1
-d
-1
-1
-d
z-1
-d
-1
-1
- d -1
z- d -1
z
-1
-1
166
junction and has been combined with the CFD element. This helps to realize an allpass
junction with better accuracy. The structure of 4.36 is aimed at an implementation using
a first-order allpass filter. Then the fractional delays zd would be implemented with
two allpass filters whose delay parameters would vary within [0, 1]. The delay
parameters of the other two allpass filters (approximating transfer functions zd1)
would vary in the range [1, 2].
Unfortunately this design proves to be useless. Let us consider the transmission
function through the junction from the left to the right. We denote the allpass filter
approximations for the FD and CFD elements by A D(z) and AD(z), respectively. The
transmission function is then z 2 rAD(z)AD(z). We would like this to be an allpass
function or at least a passive function. However, the approximation errors in the two
allpass filters assure that the phase function of rAD(z)AD(z) is nonlinear, and although
this transfer function and also that of the other signal path, z 2, both are allpass functions (the product of two allpass filters is also an allpass filter), the resulting transmission function does not have this property. In fact, a sum of two allpass filters is in general not an allpass function: the differences in the phase functions of the summed functions lead to cancellation or emphasis of different frequencies in its magnitude spectrum. This approach can be and has been used for designing lowpass and highpass filters, for example.
The above explanation implies that the transmission functions through the FD junction of Fig. 4.36b are not guaranteed to be either allpass or passive. This may endanger
the stability of the overall waveguide system, which consists of several feedback loops.
For this reason, the structure is uselessalthough in theory it has been derived in a
correct way.
167
us+
a)
2
ua+
z-1
z- d
R(z)
ua-
z-1
z- d
1 + R(z )
R(z)
1 + R(z )
z-d
z-1
ub+
z-d
z-1
ub-
z-1
ub+
z-1
ub-
z-1
ub+
z-1
ub-
us+
b)
2
ua+
z-1
z- d
z-d
1 + R(z )
z-d
z- d
R(z)
R(z)
z-d
z- d
ua-
z-1
z- d
z-d
1 + R(z )
us+
c)
2
ua+
z-1
z-1
AD (z)
R(z)
z-1
AD (z)
ua-
z-1
R(z)
z-1
Fig. 4.37 Derivation of the finger-hole model structure where the fractional delays are
approximated using allpass filters. a) The finger-hole junction located at an
arbitrary point (between unit delays) of a digital waveguide. b) The FD elements have been pushed inside the junction. c) The final structure that can be
implemented using two identical reflection filters R(z) and two allpass filters.
the left of Fig. 4.37b can be combined into a single block z 2d while the two CFD elements zd on the right into z2d.
These double FD elements are implemented using allpass filters A D(z) and A D (z).
Consequently, for these allpass filters the desired delay D and complementary total
delay D are defined by D = 2d and D = 2d. This implies that when the noninteger part d
168
of the position of the junction is varied in the interval [0, 1], the delays approximated by
these two allpass filters have values in the interval [0, 2]. Also shown in Fig. 4.37c is an
efficient implementation of transmission functions 1 + R(z) as a sum of the direct signal
and the output of the filter R(z).
The unit delay element in the middle of Fig. 4.37c adjusts the delay between the signals from the upper and lower delay line before their addition. Here we have simplified
the structure considerably by neglecting the FD elements zd and z d that should precede
the adder in the two signal paths. It is believed that this approximation does not affect
the properties of the finger-hole model. In effect we have only neglected a small delay
at the output of the system. This trick does, however, reduce the number of additions
and multiplications, since it saves the implementation of one more fractional delay
filter.
The finger-hole model of Fig. 4.37c can implemented using first-order allpass filters
AD(z) and AD(z). The total computational load of the finger-hole model is thus not very
large since only two one-pole reflection filters and two first-order allpass filters need to
be implemented.
4.8 Conclusions
In this chapter, implementation techniques for fractional delay waveguide filters have
been discussed. Interpolation is a standard method for estimating the value of a signal
between its known sample values. A novel technique, deinterpolation, was introduced.
It can be interpreted as an inverse operation of interpolation in the sense that interpolation is used for computing the output at an arbitrary point of a discrete-time delay line,
whereas deinterpolation is used for locating a corresponding input point. Both operations can be implemented using recursive or nonrecursive digital filters, but deinterpolation is more intuitively implemented using an FIR filter structure. The use of allpass
filters for interpolation and deinterpolation was also discussed.
Interpolation and deinterpolation provide means for creating spatially continuous
digital waveguide models. The simplest examples of such systems are a variable-length
delay line and a variable-length waveguide. These and the more elaborate waveguide
structures are called fractional delay waveguide filters. They are constructed of arbitrarily many digital waveguides that are connected to each other at fractional points. This
approach was applied to the time-domain modeling of acoustic tubes and realization of
two-port and three-port fractional delay junctions. Both FIR and allpass fractional delay
filters were employed for this purpose.
The properties of the FIR and allpass techniques were compared. The results can be
summarized in the following way: FIR structures are conceptually simpler since the
locations of filter taps have a direct relation to the impulse response of the filter and
thus also to the locations of the interpolated input and output points along a digital
waveguide. The use of maximally-flat FIR FD filters or maximally-flat group-delay allpass filters leads to a low-frequency approximation of an ideal tube structure.
Oversampling by a small integral factor is required to obtain a reasonable quality of
approximation. The benefits of allpass filter structures are an identically flat magnitude
response in the FD approximation and a smaller approximation error with a lower-order
filter than that obtainable with an FIR filter structure with a comparable number of
arithmetic operations.
170
The FIR filter implementation of an FD junction is perhaps more intuitive than the
allpass filter approach. This is due to the equivalence of the coefficient vector and
impulse response of the FIR filter. In Section 4.3 it was shown how the FD KL junction
can be efficiently implemented by FIR filters when half of the multiplications are combined. Analysis of approximation errors due to the FD filters was also investigated. The
reflection and transmission functions of the FD KL junction were derived and, as an
example, the magnitude response of a two-tube model was studied when first and thirdorder Lagrange interpolating filters were used.
Modeling of a side branch at an arbitrary point of a cylindrical tube was also studied
in Section 4.4. The main application considered was simulation of finger holes of
woodwind instruments. Another potential application for this structure is waveguide
modeling of the connection of the human oral and nasal tract.
In Sections 4.6 and 4.7, allpass filter models for fractional delay junctions of two and
three tubes were developed. These structures have not been studied in the DSP literature
before.
171
The same is not true for IIR FD filters; merely a subclass of recursive FD filters, the allpass filters, has been thoroughly studied (Laakso et al., 1994). The author is aware of
only one study on fractional delay approximation using pole-zero filters that are not allpass filters (Tarczynski and Cain, 1994).
For real-time applications, it may be interesting to study fractional delay filters
both FIR and IIRthat produce a very small total delay. The FIR filters discussed in
this study have a total delay of approximately N/2 samples (where N is the filter order),
since they interpolate between their centermost state variables. Also, the allpass filters
studied here have a large total delay, typically about N samples.
Fractional delay filters have started to become popular in several areas of digital signal processing (see the very beginning of Chapter 3 for references). There is no doubt
that many more signal processing problems can be solved more efficiently, or better
than before, when someone realizes that fractional delays are of help. Potential future
research projects in this area are, for example, irrational sampling rate conversion or
time-varying fractional resampling of discrete-time signals using allpass or recursive
FD filters.
The above mentioned two new problems are also examples of applications where the
new transient elimination method proposed in this thesis may be employed. Other possible uses can be found in many different kinds of filtering problems where the parameters of a recursive filter need to be changed during operation, such as in equalization of
audio signals.
Nowadays it is not feasible to use very high-order FD filters in real-time applications, such as in waveguide synthesisat least when only a single DSP processor is
available. However, as the computing power of signal processing hardware increases it
will become possible to utilize higher-order interpolators and deinterpolators and still
run the application in real time. This implies that the difference between the practical
FD approximation and the ideal fractional delay will be reduced. At the same time it
will become more and more realistic to talk about virtually continuous processing of
discrete-time signals and systems.
References
Adrien, J. M. 1991. The missing link: modal synthesis. In G. De Poli, A. Piccialli, and C. Roads (eds.)
Representations of Musical Signals. Cambridge, Massachusetts, MIT Press, pp. 267297.
Agull, J., Cardona, S., and Keefe, D. H. 1994. Time-domain measurements of reflection functions for
discontinuities in wind-instrument air columns. Proc. Stockholm Music Acoustics Conf. (SMAC 93),
Stockholm, Sweden, July 28August 1, 1993, pp. 465469.
Agull, J., Cardona, S., and Keefe, D. H. 1995. Time-domain deconvolution to measure reflection functions for discontinuities in waveguides. J. Acoust. Soc. Am., vol. 97, no. 3, March, pp. 19501957.
Aldroubi, A., Unser, M., and Eden, M. 1992. Cardinal spline filters: stability and convergence to the ideal
sinc interpolator. Signal Processing, vol. 28, no. 2, August, pp. 127138.
Alkhairy, A., Christian, K., and Lim, J. 1991. Design of FIR filters by complex Chebyshev approximation
Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP91), Toronto, Canada, May 1417,
1991, vol. 3, pp. 19851988.
Armstrong, J., and Strickland, D. 1993. Symbol synchronization using signal samples and interpolation.
IEEE Trans. Communications, vol. 41, no. 2, February, pp. 318321.
Ayers, R. D., Lowell, J. E., and Mahgerefteh, D. 1985. The conical bore in musical acoustics. Am. J.
Phys., vol. 53, no. 6, June, pp. 528537.
Benade, A. H. 1960. On the mathematical theory of woodwind finger holes. J. Acoust. Soc. Am., vol. 32,
pp. 15911608.
Benade, A. H. 1976. Fundamentals of Musical Acoustics. London, Oxford University Press. 596 p. Also:
1990. New York, Dover Publications.
Beranek, L. L. 1954. Acoustics. Cambridge, Massachusetts, The Acoustical Society of America. 491 p.
Reprinted in 1986.
Bonder, L. J. 1983a. The n-tube formula and some of its consequences. Acustica, vol. 52, no. 4, March,
pp. 216226.
Bonder, L. J. 1983b. Equivalency of lossless n-tubes. Acustica, vol. 53, no. 4, August, pp. 193200.
Borin, G., de Poli, G., Puppin, S., and Sarti, A. 1990. Generalizing the physical model timbral class.
Proc. Colloquium on Physical Modeling, Grenoble, France, September 1990.
Bradley, K., Cheng, M., and Stonick, V. L. 1995. Automated analysis and computationally efficient synthesis of acoustic guitar strings and body. Proc. 1995 IEEE ASSP Workshop on Applications of Signal
Processing to Audio and Acoustics, New Paltz, New York, October 1518, 1995, pages not numbered.
de Bruin, G., and van Walstijn, M. 1995. Physical models of wind instruments: a generalized excitation
coupled with a modular tube simulation platform. J. New Music Research, vol. 24, no. 2, June, pp. 148
163.
Cadoz, C., Luciani, A., and Florens, J. 1984. Responsive input devices and sound synthesis by simulation
of instrumental mechanisms: the CORDIS system. Computer Music J., vol. 8, no. 3, pp. 6073. Reprinted
in C. Roads (ed.) 1989. The Music Machine. Cambridge, Massachusetts, MIT Press, pp. 495508.
Cain, G. D., and Yardim, A. 1994. The tunable fractional delay filter: passport to finegrain delay estimation. Proc. Fourth Cost 229 Workshop on Adaptive Methods and Emergent Techniques for Signal
Processing and Communications, Ljubljana, Slovenia, April 57, 1994, pp. 924. Also: Electrotechnical
Review, Ljubljana, Slovenia, vol. 61, no. 4, pp. 232242.
172
References
173
Cain, G. D., Murphy, N. P., and Tarczynski, A. 1994. Evaluation of several FIR fractional-sample delay
filters. Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP94), Adelaide, Australia, April
1922, 1994, vol. 3, pp. 621624.
Cain, G. D, Yardim, A., and Henry, P. 1995. Offset windowing for FIR fractional-sample delay. Proc.
IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP95), Detroit, Michigan, May 912,
1995, vol. 2, pp. 12761279.
Causs R., Kergomard, J., and Lurton, X. 1984. Input impedance of brass musical instrumentscomparison between experiment and numerical models. J. Acoust. Soc. Am., vol. 75, no. 1, January, pp. 241254.
Chaigne, A., Askenfelt, A., and Jansson, E. V. 1990. Temporal synthesis of string instrument tones.
Speech Transmission LaboratoryQuarterly Progress and Status Report (STL-QPSR), no. 4, October
December. Stockholm, Sweden: Royal Institute of Technology (KTH), pp. 81100.
Chaigne, A. 1992. On the use of finite difference for musical synthesisApplication to plucked string
instruments. J. dAcoustique, vol. 5, no. 2, April, pp. 181211.
Chaigne, A., and Askenfelt, A. 1994. Numerical simulations of piano strings. I. A physical model for a
struck string using finite difference methods. J. Acoust. Soc. Am., vol. 95, no. 2, February, pp. 1112
1118.
Chan, C. F., and Leung, S. H. 1991. A vocoder using high-order LPC filter with very few non-zero coefficients. Proc. Second European Conf. Speech Communications and Technology (EUROSPEECH91),
Genova, Italy, September 2426, 1991, vol. 2, pp. 839842.
Coltman, J. W. 1979. Acoustical analysis of the Boehm flute. J. Acoust. Soc. Am., vol. 65, no. 2, February, pp. 499506.
Cook, P. R. 1988. Implementation of single reed instruments with arbitrary bore shapes using digital
waveguide filters. Stanford, California, Stanford University, Dept. of Music, CCRMA, Tech. Report no.
STANM50, May 1988. 7 p.
Cook, P. R. 1989. Synthesis of the singing voice using a physically parametrized model of the human
vocal tract. Proc. 1989 Int. Computer Music Conf. (ICMC89), Columbus, Ohio, October 25, 1989, pp.
6972.
Cook, P. R. 1990. Identification of Control Parameters in an Articulatory Vocal Tract Model, With
Applications to the Synthesis of Singing Voice. Ph.D. dissertation, Stanford, California, Stanford University, Dept. of Music, CCRMA Tech. Report no. STANM68, December, 1990. 171 p.
Cook, P. R. 1991. TBone: an interactive waveguide brass instruments synthesis workbench for the NeXT
machine. Proc. 1991 Int. Computer Music Conf. (ICMC91), Montreal, Canada, October 1620, 1991,
pp. 297299.
Cook, P. R. 1992. A meta-wind-instrument physical model, and a meta-controller for real time performance control. Proc. 1992 Int. Computer Music Conf. (ICMC92), San Jose, California, October 1418,
1992, pp. 273276.
Cook, P. R. 1993. SPASM: a real time vocal tract model editor/controller and singer: the companion
software synthesis system. Computer Music J., vol. 17, no. 1, spring, pp. 3044.
Cook, P. R. 1995. Integration of physical modeling for synthesis and animation. Proc. 1995 Int. Computer Music Conf. (ICMC95), Banff, Canada, September 37, 1995, pp. 525528.
Crochiere, R. E., and Rabiner, L. R. 1983. Multirate Digital Signal Processing. Englewood Cliffs, New
Jersey, PrenticeHall, Inc. 411 p.
Cucchi, S., Desinan, F., Parladori, G., and Sicuranza, G. 1991. DSP implementation of arbitrary sampling
frequency conversion for high quality sound application. Proc. 1991 IEEE Int. Conf. Acoust., Speech,
Signal Processing (ICASSP91), Toronto, Canada, May 1417, 1991, vol. 5, pp. 36093612.
174
References
175
Hildebrand, F. B. 1974. Introduction to Numerical Analysis. Second Edition. New York, McGraw-Hill.
Also: 1987. New York, Dover. 669 p.
Hiller, L., and Ruiz, P. 1971. Synthesizing musical sounds by solving the wave equation for vibrating
objects: parts I and II. J. Audio Eng. Soc., vol. 19, no. 6, pp 462470 and vol. 19, no. 7, pp. 542550.
Hirschman, S. E. 1991. Digital Waveguide Modeling and Simulation of Reed Woodwind Instruments.
Engineering thesis. Stanford, California, Stanford University, Dept. of Music, CCRMA Tech. Report no.
STANM72, July 1991. 253 p.
Hou, H. S., and Andrews, H. C. 1978. Cubic splines for image interpolation and digital filtering. IEEE
Trans. Acoust., Speech, Signal Processing, vol. 26, no. 6, November, pp. 508517.
Huopaniemi, J., Karjalainen, M., Vlimki, V., and Huotilainen, T. 1994. Virtual instruments in virtual
roomsa real-time binaural room simulation environment for physical models of musical instruments.
Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark, September 1217, 1994, pp. 455
462.
Jackson, L. B. 1989. Digital Filters and Signal Processing. Second Edition. Norwell, Massachusetts,
Kluwer Academic Publishers. 410 p.
Jaffe, D., and Smith, J. O. 1983. Extensions of the Karplus-Strong plucked string algorithm. Computer
Music J., vol. 7, no. 2, summer, pp. 5669. Reprinted in C. Roads (ed.) 1989. The Music Machine.
Cambridge, Massachusetts, MIT Press, pp. 481494.
Jaffe, D. 1995. Ten criteria for evaluating synthesis techniques. Computer Music J., vol. 19, no. 1, spring,
pp. 7687.
Jnosy, Z., Karjalainen, M., and Vlimki, V. 1994. Intelligent synthesis control with applications to a
physical model of the acoustic guitar. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus,
Denmark, September 1217, 1994, pp. 402406.
Jerri, A. J. 1977. The Shannon sampling theoremits various extensions and applications: a tutorial
review. Proc. IEEE, vol. 65, no. 11, November, pp. 15651596.
Jot, J.-M., and Chaigne, A. 1991. Digital delay networks for designing artificial reverberators. Presented
at the 90th AES Convention, Paris, France, February 1922, 1991, preprint no. 3030.
Jot, J.-M., Larcher, V., and Warusfel, O. 1995. Digital signal processing issues in the context of binaural
and transaural stereophony. Presented at the 98th AES Convention, Paris, France, February 2528, 1995,
preprint no. 3980.
Jury, E. I. 1964. Theory and Application of the z-Transform Method. New York, Wiley, p. 298.
Kabasawa, Y., Ishizaka, K., and Arai, Y. 1983. Simplified digital model of the lossy vocal tract and vocal
cords. Proc. 11th Int. Congr. Acoust., Paris, France, July 1927, 1983, vol. 4, pp. 175178.
L. J. Karam and J. H. McClellan. 1995. Complex Chebyshev approximation for FIR filter design. IEEE
Trans. Circuits Syst.II: Analog and Digital Signal Processing, vol. 42, no. 3, March, pp. 207216.
Karjalainen, M., and Laine, U. K. 1991. A model for real-time sound synthesis of guitar on a floatingpoint signal processor. Proc. 1991 IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP91),
Toronto, Canada, May 25, 1991, vol. 5, pp. 36533656.
Karjalainen, M., Laine, U. K., Laakso, T. I., and Vlimki, V. 1991. Transmission-line modeling and
real-time synthesis of string and wind instruments. Proc. 1991 Int. Computer Music Conf. (ICMC91),
Montreal, Canada, October 1610, 1991, pp. 293296.
Karjalainen, M., and Vlimki, V. 1993. Model-based analysis/synthesis of the acoustic guitar. Proc.
Stockholm Music Acoustics Conf. (SMAC 93), Stockholm, Sweden, July 28August 1, 1993, pp. 443
447.
176
Karjalainen, M., Vlimki, V., and Jnosy, Z. 1993. Towards high-quality synthesis of the guitar and
string instruments. Proc. 1993 Int. Computer Music Conf. (ICMC93), Tokyo, Japan, September 1015,
1993, pp. 5663.
Karjalainen, M., Huopaniemi, J., and Vlimki, V. 1995a. Direction-dependent physical modeling of
musical instruments. Proc. 15th Int. Congr. Acoustics (ICA95), Trondheim, Norway, June 2630, 1995,
vol. 3, pp. 451454.
Karjalainen, M., Vlimki, V., Hernoux, B., and Huopaniemi, J. 1995b. Explorations of wind instruments
using digital signal processing and physical modeling techniques. Proc. 1995 Int. Computer Music Conf.
(ICMC95), Banff, Canada, September 37, 1995, pp. 509516. A revised version will be published in
the Journal of New Music Research, vol. 24, no. 4, December 1995.
Karplus, K., and Strong, A. 1983. Digital synthesis of plucked-string and drum timbres. Computer Music
J., vol. 7, no. 1, summer, pp. 4355. Reprinted in Curtis Roads (ed.) 1989. The Music Machine.
Cambridge, Massachusetts, MIT Press, pp. 467479.
Keefe, D. H. 1982. Theory of the single woodwind tone hole. J. Acoust. Soc. Am., vol. 72, no. 3, September, pp. 676687.
Keefe, D. H. 1990. Woodwind air column models. J. Acoust. Soc. Am., vol. 88, no. 1, July, pp. 3551.
Kelly, J. L. Jr., and Lochbaum, C. C. 1962. Speech synthesis. Proc. Fourth Int. Congr. Acoust., Copenhagen, Denmark, September 1962, Paper G42, pp. 14.
Kinsler, L. E., and Frey, A. R. 1950. Fundamentals of Acoustics. New York, John Wiley & Sons. 516 p.
Ko, C. C., and Lim, Y. C. 1988. Approximation of a variable-length delay line by using tapped delay line
processing. Signal Processing, vol. 14, no. 4, June, pp. 363369.
Kootsookos, P. J., and Williamson, R. C. 1995. FIR approximation of fractional sample delay systems.
To be published in the IEEE Trans. Circuits and SystemsII: Analog and Digital Signal Processing.
Kreyszig, E. 1988. Advanced Engineering Mathematics. Sixth Edition. New York, John Wiley & Sons.
1294 p. and appendices.
Kroon, P., and Atal, B. S. 1991. On the use of pitch predictors with high temporal resolution. IEEE
Trans. Signal Processing, vol. 39, no. 3, March, pp. 733735.
Kuisma, T. M. 1994. Computational Modelling of the Vocal Tract and Speech Synthesis by Using Fractional Delay Filters. M. Sc. thesis (in Finnish). Espoo, Finland, Helsinki University of Technology,
Faculty of Electrical Engineering, Laboratory of Acoustics and Audio Signal Processing, June 14, 1994.
75 p.
Laakso, T. I., Vlimki, V., Karjalainen, M., and Laine, U. K. 1992. Real-time implementation techniques for a continuously variable digital delay in modeling musical instruments. Proc. 1992 Int. Computer Music Conf. (ICMC92), San Jose, California, October 1418, 1992, pp. 140141.
Laakso, T. I., Vlimki, V., Karjalainen, M., and Laine, U. K. 1994. Crushing the DelayTools for Fractional Delay Filter Design. Espoo, Finland, Helsinki University of Technology, Faculty of Electrical
Engineering, Laboratory of Acoustics and Audio Signal Processing, Report no. 35, October 1994. 46 p.
and figures. A revised version entitled Splitting the unit delaytools for fractional delay filter design
will be published in the IEEE Signal Processing Magazine, vol. 13, no. 1, January 1996.
Laakso, T. I., Vlimki, V., and Henriksson, J. 1995a. Tunable downsampling using fractional delay filters with applications to digital TV transmission. Proc. IEEE Int. Conf. Acoustics, Speech, Signal
Processing (ICASSP95), Detroit, Michigan, May 912, 1995, vol. 2, pp. 13041307.
Laakso, T. I., Saramki, T., and Cain, G. D. 1995b. Asymmetric Dolph-Chebyshev, Saramki, and transitional windows for fractional delay FIR filter design. Proc. 38th Midwest Symposium on Circuits and
Systems (MWSCAS'95), Rio de Janeiro, Brazil, August 1316, 1995. To be published in December 1995.
References
177
Lacroix, A., and Makai, B. 1979. A novel vocoder concept based on discrete time acoustic tubes. Proc.
IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP79), Washington, DC, April 24, 1979,
pp. 7376.
Laine, U. K. 1982. Modelling of lip radiation impedance in z-domain. Proc. 1982 IEEE Int. Conf.
Acoust., Speech, Signal Processing (ICASSP82), Paris, France, May 35, 1982, vol. 3, pp. 19921995.
Reprinted in Laine (1989).
Laine, U. K. 1988. Digital modelling of a variable-length acoustic tube. Proc. 1988 Nordic Acoustical
Meeting (NAM88), Tampere, Finland, June 1517, 1988, pp. 165168. Reprinted in Laine (1989).
Laine, U. K. 1989. Studies on Modelling of Vocal Tract Acoustics with Applications to Speech Synthesis.
Doctoral thesis. Espoo, Finland, Helsinki University of Technology, Faculty of Electrical Engineering,
Laboratory of Acoustics and Audio Signal Processing, Report no. 32, March 1989.
Lang, M., and Laakso, T. I. 1994. Simple and robust method for the design of allpass filters using leastsquares phase error criterion. IEEE Trans. Circuits and SystemsII: Analog and Digital Signal Processing, vol. 41, no. 1, January, pp. 4048.
Laroche, J., and Meillier, J.L. 1994. Multi-channel excitation/filter modeling of percussive sounds with
application to the piano. IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, April, pp. 329344.
Legge, K. A., and Fletcher, N. H. 1984. Nonlinear generation of missing modes on a vibrating string. J.
Acoust. Soc. Am., vol. 76, no. 1, July, pp. 512.
Liljencrants, J. 1985. Speech Synthesis with a Reflection-Type Line Analog. Doctoral dissertation. Dept.
of Speech Communication and Music Acoustics, Royal Institute of Technology, Stockholm, Sweden,
August 1985.
Liu, G.-S., and Wei, C.-H. 1990. Programmable fractional sample delay filter with Lagrange interpolation. Electronics Letters, vol. 26, no. 19, 13th September, pp. 16081610.
Liu, G.-S., and Wei, C.-H. 1992. A new variable fractional sample delay filter with nonlinear interpolation. IEEE Trans. Circuits and SystemsII: Analog and Digital Signal Processing, vol. 39, no. 2, February, pp. 123126.
Maeda, S. 1977. On a simulation method of dynamically varying vocal tract: reconsideration of the
KellyLochbaum model. Proc. Symp. Articulatory Modeling and Phonetics, Grenoble, France, July 10
12, 1977, pp. 281288.
Maeda, S. 1982. A digital simulation method of the vocal-tract system. Speech Communication, vol. 1,
no. 34, December, pp. 199229.
Marques, J. S., Tribolet, J. M., Trancoso, I. M., and Almeida, L. B. 1989. Pitch prediction with fractional
delays in CELP coding. Proc. European Conf. Speech Communications and Technology
(EUROSPEECH89), Paris, France, September 1989, vol. 2, pp. 509512.
Marques, J. S., Trancoso, I. M., Tribolet, J. M., and Almeida, L. B. 1990. Improved pitch prediction with
fractional delays in CELP coding. Proc. 1990 IEEE Int. Conf. Acoust., Speech, Signal Processing
(ICASSP90), Albuquerque, New Mexico, April 36, 1990, vol. 2, pp. 665668.
Martnez, J., and Agull, J. 1988. Conical bores. Part I: reflection functions associated with discontinuities. J. Acoust. Soc. Am., vol. 85, no. 5, November, pp. 16131619.
Martnez, J., Agull, J., and Cardona, S. 1988. Conical bores. Part II: multiconvolution. J. Acoust. Soc.
Am., vol. 84, no. 5, November, pp. 16201627.
Matsui, K., Pearson, S. D., Hata, K., and Kamai, T. 1991. Improving naturalness of text-to-speech synthesis using natural glottal source. Proc. 1991 IEEE Int. Conf. Acoust., Speech, Signal Processing
(ICASSP91), Toronto, Canada, May 25, 1991, vol. 2, pp. 769772. Also: Research Report no. 3,
Speech Technology Laboratory, Santa Barbara, California, September 1992, pp. 2127.
178
McIntyre, M. E., and Woodhouse, J. 1979. On the fundamentals of bowed string dynamics. Acustica, vol.
43, no. X, pp. 83108.
McIntyre, M. E., Schumacher, R. T., and Woodhouse, J. 1983. On the oscillations of musical instruments.
J. Acoust. Soc. Am., vol. 74, no. 5, November, pp. 13251345.
Medan, Y. 1991. Using super resolution pitch in waveform speech coders. Proc. 1991 IEEE Int. Conf.
Acoust., Speech, Signal Processing (ICASSP91), Toronto, Canada, May 25, 1991, vol. 1, pp. 633636.
Meyer, P., Wilhelms, R., and Strube, H. W. 1989. A quasiarticulatory speech synthesizer for the German
language running in real time. J. Acoust. Soc. Am., vol. 86, no. 2, August, pp. 523539.
Minocha, S., Dutta Roy, S. C., and Kumar, B. 1993. A note on the FIR approximation of a fractional
sample delay. Int. J. Circuit Theory and Appl., vol. 21, no. 3, MayJune, pp. 265274.
Mitra, S. K., and Hirano, K. 1974. Digital all-pass networks. IEEE Trans. Circuits and Systems, vol.
CAS-21, no. 5, September, pp. 688700.
Moorer, J. A. 1979. About this reverberation business. Computer Music J., vol. 3, no. 2, pp. 1328.
Reprinted in C. Roads and J. Strawn (Eds.) 1985. Foundations of Computer Music, Cambridge,
Massachusetts, MIT Press, pp. 605639.
Morse, P. M., and Ingard, U. K. 1968. Theoretical Acoustics. Princeton, New Jersey, Princeton University
Press. 927 p.
Mourjopoulos, J. N., KyriakisBitzaros, E. D., and Goutis, C. E. 1990. Theory and real-time implementation of time-varying digital audio filters. J. Audio Eng. Soc., vol. 38, no. 7/8, July/August, pp. 523536.
Mrayati, M., Carre, R., and Guerin, B. 1988. Distinctive regions and modes: a new theory of speech production. Speech Communication, vol. 7, no. 3, pp. 257286.
Nikolaidis, S. S, Mourjopoulos, J. N., and Goutis, C. E. 1993. A dedicated processor for time-varying
digital audio filters. IEEE Trans. Circuits and SystemsII: Analog and Digital Signal Processing, vol.
40, no. 7, July, pp. 452455.
Oetken, G., Parks, T. W., and Schler, H. W. 1975. New results in the design of digital interpolators.
IEEE Trans. Acoust., Speech, Signal Processing, vol. 23, no. 3, June, pp. 301309.
Oetken, G. 1979. A new approach for the design of digital interpolating filters. IEEE Trans. Acoust.
Speech, Signal Processing, vol. 27, no. 6, December, pp. 637643.
Oppenheim, A. V., and Schafer, R. W. 1975. Digital Signal Processing. Englewood Cliffs, New Jersey,
PrenticeHall, 585 p.
Parks, T. W., and Burrus, C. S. 1987. Digital Filter Design. New York, John Wiley & Sons. 342 p.
Preuss, K. 1989. On the design of FIR filters by complex approximation. IEEE Trans. Acoustics, Speech,
and Signal Processing, vol. 37, no. 5, May, pp. 702712.
Proakis, J. G., and Manolakis, D. G. 1992. Digital Signal Processing: Principles, Algorithms, and Applications (Second Edition). New York, Macmillan. 969 p.
Pyfer, M. F., and Ansari, R. 1987. The design and application of optimal FIR fractional-slope phase
filters. Proc. 1987 IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP87), Dallas, Texas, April
69, 1987, vol. 2, pp. 896899.
Rabenstein, R. 1988. Minimization of transient signals in recursive time-varying digital filters. Circuits,
Systems, and Signal Processing, vol. 7, no. 3, pp. 345359.
Rabiner, L. R. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustics,
Speech, and Signal Processing, vol. 25, no. 1, February, pp. 2433.
References
179
Rabiner, L. R., and Schafer, R. W. 1978. Digital Processing of Speech Signals. Englewood Cliffs, New
Jersey, PrenticeHall, 512 p.
Regalia, P. A. 1993. Special filter designs. In Sanjit K. Mitra and James F. Kaiser (eds.). Handbook for
Digital Signal Processing. New York, John Wiley & Sons. pp. 907980.
Roberts, R. A., and Mullis, C. T. 1987. Digital Signal Processing. Reading, Massachusetts, AddisonWesley. 578 p.
Rocchesso, D. 1993. Multiple feedback delay networks for sound processing. Proc. X Colloquio di
Informatica Musicale, Milan, Italy, December 24, 1993, pp. 202209.
Rocchesso, D., and Turra, F. 1993. A real time clarinet model on Mars workstation. Proc. X Colloquio di
Informatica Musicale, Milan, Italy, December 24, 1993, pp. 210213.
Rocchesso, D., and Smith, J. O. 1994. Circulant feedback delay networks for sound synthesis and processing. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark, September 1217, 1994,
pp. 378381.
Rodet, X. 1984. Time domain formant-wave-function synthesis. Computer Music J., vol. 8, no. 3, fall, pp.
1531.
Rodet, X., Potard, Y., and Barrire, J.B. 1984. The Chant project: from synthesis of the singing voice to
synthesis in general. Computer Music J., vol. 8, no. 3, fall, pp. 1531. Reprinted in C. Roads (ed.) 1989.
The Music Machine. Cambridge, Massachusetts, MIT Press, pp. 449466.
Savioja, L., Rinne, T. J., and Takala, T. 1994. Simulation of room acoustics with a 3-D finite difference
mesh. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark, September 1217, 1994, pp.
463466.
Schafer, R. W., and Rabiner, L. R. 1973. A digital signal processing approach to interpolation. Proc.
IEEE, vol. 61, no. 6, June, pp. 692702.
Schroeder, M. R. 1962. Natural sounding artificial reverberation. J. Audio Eng. Soc., vol. 10, no. 3, July,
pp. 219223.
Schroeter, J., and Sondhi, M. M. 1994. Techniques for estimating vocal-tract shapes from the speech signal. IEEE Trans. Speech and Audio Processing, vol. 2, no. 1, January, pp. 133150.
Schulist, M. 1990. Improvements of a complex FIR filter design algorithm. Signal Processing, vol. 20,
no. 1, May, pp. 8190.
Serra, X. 1989. A System for Sound Analysis/Transform/Synthesis Based on a Deterministic plus Stochastic Decomposition. Ph.D. dissertation. Stanford, California, Stanford University, Dept. of Music,
CCRMA Tech. Report no. STANM58, October 1989. 151 p.
Shannon, C. E. 1949. Communication in the presence of noise. Proc. IRE, vol. 37, no. 1, January, pp. 10
21.
Sivanand, S., Yang, J. F., and Kaveh, M. 1991. Focusing filters for wide-band direction finding. IEEE
Trans. Signal Processing, vol. 39, no. 2, February, pp. 437445.
Sivanand, S. 1992. Variable-delay implementation using digital FIR filters. Proc. 1992 Int. Conf. Signal
Processing Theory and Applications (ICSPAT92), Boston, Massachusetts, November 25, 1992, pp.
12381243.
Smith, J. O. 1983. Techniques for Digital Filter Design and System Identification with Application to the
Violin. Ph.D. dissertation. Stanford, California, Stanford University, Dept. of Music, CCRMA Tech.
Report no. STANM14, June 1983. 260 p.
180
Smith, J. O., and Gossett, P. 1984. A flexible sampling-rate conversion method. Proc. 1984 IEEE Int.
Conf. Acoust., Speech, Signal Processing (ICASSP84), San Diego, California, March 1921, 1984, vol.
2, pp. 19.4.14.
Smith, J. O., and Friedlander, B. 1985. Adaptive interpolated time-delay estimation. IEEE Trans.
Aerospace and Electronic Systems, vol. 21, no. 3, March, pp. 180199.
Smith, J. O. 1985. A new approach to digital reverberation using closed waveguide networks. Proc. 1985
Int. Computer Music Conf. (ICMC85), Vancouver, Canada, August 1922, 1985, pp 4753. Also: Stanford, California, Stanford University, Dept. of Music, CCRMA Tech. Report no. STANM31, July
1985. 7 p. Reprinted in Smith (1987).
Smith, J. O. 1986. Efficient simulation of the reed-bore and bow-string mechanisms. Proc. 1986 Int.
Computer Music Conf. (ICMC86), The Hague, The Netherlands, October 2024, 1986, pp. 275280.
Reprinted in Smith (1987).
Smith, J. O. 1987. Music Applications of Digital Waveguides. Stanford, California, Stanford University,
Dept. of Music, CCRMA Tech. Report no. STANM39, May 27, 1987. 181 p.
Smith, J. O., and Serra, X. 1987. PARSHL: An analysis/synthesis program for non-harmonic sounds
based on a sinusoidal representation. Proc. 1987 Int. Computer Music Conf. (ICMC87), Urbana, Illinois,
August 1987, pp. 290297.
Smith, J. O. 1990. Efficient Yet Accurate Models for Strings and Air Columns using Sparse Lumping of
Distributed Losses and Dispersion. Stanford, California, Stanford University, Dept. of Music, CCRMA
Tech. Report no. STANM67, December 1990. 15 p.
Smith, J. O. 1991. Waveguide simulation of non-cylindrical acoustic tubes. Proc. 1991 Int. Computer
Music Conf. (ICMC91), Montreal, Canada, October 1620, 1991, pp. 304307.
Smith, J. O. 1992a. Bandlimited interpolationintroduction and algorithm. Unpublished manuscript,
November 5, 1992.
Smith, J. O. 1992b. Physical modeling using digital waveguides. Computer Music J., vol. 16, no. 4, winter, pp. 7487.
Smith, J. O. 1993a. Efficient synthesis of stringed musical instruments. Proc. 1993 Int. Computer Music
Conf. (ICMC93), Tokyo, Japan, September 1015, 1993, pp. 6471.
Smith, J. O. 1993b. Structured sampling synthesisautomated construction of physical modeling synthesis parameters and tables from recorded sounds. CCRMA Associates Presentation, Stanford University,
Stanford, California, May 1993.
Smith, J. O., and Rocchesso, D. 1994. Connections between feedback delay networks and waveguide
networks for digital reverberation. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark,
September 1217, 1994, pp. 376377.
Smith, J. O., and Van Duyne, S. A. 1995. Commuted piano synthesis. Proc. 1995 Int. Computer Music
Conf. (ICMC95), Banff, Canada, September 37, 1995, pp. 335342.
Smith, J. O. 1995a. Discrete-time modeling of acoustic systems. Manuscript in preparation. Preliminary
version published for the CCRMA Associates Conference, Stanford University, Stanford, California,
May 1995. 110 p.
Smith, J. O. 1995b. Physical modeling synthesis update. Unpublished manuscript. To be published in
Computer Music J., vol. 20, no. 2, summer 1996.
Sondhi, M. M. 1983. An improved vocal tract model. Proc. 11th Int. Congr. Acoust., Paris, France, July
1927, 1983, vol. 4, pp. 167170.
Sondhi, M. M., and Schroeter, J. 1987. A hybrid time-frequency domain articulatory speech synthesizer.
IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 7, July, pp. 955966.
References
181
182
Vlimki, V., and Karjalainen, M. 1994a. Digital waveguide modeling of wind instrument bores constructed of truncated cones. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark,
September 1217, 1994, pp. 423430.
Vlimki, V., and Karjalainen, M. 1994b. Improving the KellyLochbaum vocal tract model using conical tube sections and fractional delay filtering techniques. Proc. 1994 Int. Conf. Spoken Language Processing (ICSLP94), Yokohama, Japan, September 1822, 1994, vol. 2, pp. 615618.
Vlimki, V., Huopaniemi, J., Karjalainen, M., and Jnosy, Z. 1995a. Physical modeling of plucked
string instruments with application to real-time sound synthesis. Presented at the 98th AES Convention,
Paris, France, February 2528, 1995, preprint no. 3956, 52 p. Submitted to the Journal of the Audio Engineering Society.
Vlimki, V. 1995a. A new filter implementation strategy for Lagrange interpolation. Proc. IEEE Int.
Symp. Circuits and Systems (ISCAS95), Seattle, Washington, April 29May 3, 1995, vol. 1, pp. 361364.
Vlimki, V., and Karjalainen, M. 1995. Implementation of fractional delay waveguide models using allpass filters. Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP95), Detroit, Michigan,
May 912, 1995, vol. 2, pp. 15241527.
Vlimki, V. 1995b. Fractional delay waveguide filters for modeling acoustic tubes. Proc. Second Int.
Conf. Acoustics and Music Research (CIARM95), Ferrara, Italy, May 1921, 1995, pp. 4146.
Vlimki, V., Laakso, T. I., and Mackenzie, J. 1995b. Time-varying fractional delay filters. Proc. Finnish
Signal Processing Symp. (FINSIG95), Espoo, Finland, June 2, 1995, pp. 118122.
Vlimki, V., Karjalainen, M., and Huopaniemi, J. 1995c. Measurement, estimation, and modeling of
wind instruments using DSP techniques. Proc. 15th Int. Congr. Acoustics (ICA95), Trondheim, Norway,
June 2630, 1995, vol. 3, pp. 513516.
Vlimki, V., Laakso, T. I., and Mackenzie, J. 1995d. Elimination of transients in time-varying allpass
fractional delay filters with application to digital waveguide modeling. Proc. 1995 Int. Computer Music
Conf. (ICMC95), Banff, Canada, September 37, 1995, pp. 327334.
Vlimki, V., and Takala, T. 1995. Virtual musical instrumentsnatural sound with physical models.
Submitted to Organised Sound, vol. 1, no. 1, April 1996 (special issue on sounds and sources).
Van Duyne, S. A., and Smith, J. O. 1993. Physical modeling with the 2-D digital waveguide mesh. Proc.
1993 Int. Computer Music Conf. (ICMC93), Tokyo, Japan, September 1015, 1993, pp. 4047.
Van Duyne, S. A., and Smith, J. O. 1994. A simplified approach to modeling dispersion caused by stiffness in strings and plates. Proc. 1994 Int. Computer Music Conf. (ICMC94), Aarhus, Denmark,
September 1217, 1994, pp. 407410.
Van Duyne, S. A., and Smith, J. O. 1995. Developments for the commuted piano. Proc. 1995 Int. Computer Music Conf. (ICMC95), Banff, Canada, September 37, 1995, pp. 319326.
van Walstijn, M., and de Bruin, G. 1995. Conical waveguide filters. Proc. Second Int. Conf. Acoustics
and Music Research (CIARM95), Ferrara, Italy, May 1921, 1995, pp. 4754.
Verhelst, W., and Nilens, P. 1986. A modified-superposition speech synthesizer and its applications.
Proc. 1986 IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP86), Tokyo, Japan, April 711,
1986, vol. 3, pp. 20072010.
Wright, G. T. H., and Owens, F. J. 1993. An optimized multirate sampling technique for the dynamic
variation of the vocal tract length in the KellyLochbaum speech synthesis model. IEEE Trans. Speech
and Audio Processing, vol. 1, no. 1, January, pp. 109113.
Wu, H. Y., Badin, P, Cheng, Y. M., and Guerin, B. 1987a. Vocal tract simulation: implementation of continuous variations of the length in a KellyLochbaum model, effects of area function spatial sampling.
Proc. 1987 IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP87), Dallas, Texas, April 69,
1987, vol. 1, pp. 912.
References
183
Wu, H. Y., Badin, P, Cheng, Y. M., and Guerin, B. 1987b. Continuous variation of the vocal tract length
in a KellyLochbaum type speech production model. Proc. Eleventh Int. Congr. Phonetic Sciences (XIth
ICPhS), Tallinn, Estonia, August 17, 1987, vol. 2, pp. 340343.
Zetterberg, L. H., and Zhang, Q. 1988. Elimination of transients in adaptive filters with application to
speech coding. Signal Processing, vol. 15, no. 4, December, pp. 419428.
Zlzer, U., Redmer, B., and Bucholtz, J. 1993. Strategies for switching digital audio filters. Presented at
the 95th AES Convention, New York, October 710, 1993, preprint no. 3714, 6 p. and 15 figures.
Zlzer, U., and Boltze, T. 1994. Interpolation algorithms: theory and application. Presented at the 97th
AES Convention, San Francisco, California, November 1013, 1994, preprint no. 3898, 22 p.
Appendix A
Bandlimited Squared Integral Error of FIR FD Filters
The least squared (LS) error function ELS is the L2 norm (integrated squared magnitude) of the error frequency response E(e j ). In the following, we derive the bandlimited version of the error function ELS , that is
ELS
1
=
E(e
1
) d =
j 2
H(e j ) Hid (e j ) d
(A.1)
where 0 < 1. The error function can be rewritten using vectors and matrices, i.e.
ELS
1
=
[h
][
e Hid (e j ) hT e Hid (e j ) d
(A.2)
=
h
0
2
Ch 2hT Re Hid (e j )e + Hid (e j ) d
Here h = [h(0) h(1) L h(N)] is the coefficient vector of the FIR filter and e is
T
e = 1 e j
L e jN
(A.3)
{ }
C = Re ee H
1
cos( )
L
cos(N )
cos( )
1
cos[(N 1) ]
=
M
O
M
cos(N ) cos (N 1) L
1
[
]
(A.4)
(A.5a)
Cd
0
184
(A.5b)
1
p1 =
[Re{Hid (e
}]
) c Im Hid (e j ) s d
185
(A.5c)
c = [1 cos( ) L cos(N )]
(A.5d)
s = [0 sin( ) L sin(N )]
(A.5e)
1
p0 =
Hid (e
1
) d =
j 2
d =
(A.5f)
1
=
1
Cd =
0
cos[(k l) ]d
0
(A.6)
=
sin[(k l) ]
= sinc[(k l) ]
(k l)
1
=
(A.7)
1
=
sin[(D k) ]
= sinc[(D k) ]
1
[cos(a b) + cos(a + b)]
2
sin(a)sin(b) =
1
[sin(a b) sin(a + b)]
2
2hT p1 = 2 h(n)sinc[(n D) ]
(A.8)
n=0
ELS = hT Ph 2 h(n)sinc[(n D) ] +
n=0
(A.9)
Appendix B
Effective Length of the Infinite Impulse Response
As a measure for the length of the impulse response of a recursive filter, we define the
effective length N P as the number of impulse response samples h(n) which bring about
P% or more of its total energy, i.e.,
EP =
NP
h2 (n) 100 E
(B.1)
n=0
where n is the discrete time index and E is the total energy of the impulse response
defined as
E=
h2 (n) =
n=0
1
j 2
H(e
) d
0
(B.2)
and H(e j ) is the frequency response of the system with the impulse response h(n).
The length N P is the value of time index n in (B.1) at which P% of the energy of the
impulse response has arrived. The choice of the value for P depends on the application.
In the following we derive closed-form expressions for the P% energy length of the
first and second-order all-pole filters. The result for the second-order case is particularly
interesting because higher order filters are often realized as a cascade or parallel connection of second-order sections.
First-Order All-Pole Filter
The transfer function of the first-order all-pole filter is
H(z) =
1
1 + a1z 1
(B.3)
for n < 0
for n 0
186
(B.4)
187
20
18
16
14
12
10
8
6
4
2
0
-1
-0.5
0
Filter Coefficient a1
0.5
Fig. B.1 Effective length N P (in samples) of the first-order all-pole filter as a function
of the filter coefficient a1 . The curves are for the values P = 90% (solid line),
P = 95% (dashed line), and P = 99% (dotted line).
EP =
NP
n=0
NP
(n) =
( )
n
a12
n=0
( )
1 a12
N P +1
1 a12
(B.5)
1
1 a12
(B.6)
For a given P, N P can be solved from (B.2) by substituting the equations (B.5) and
(B.6), that is
log(1 P 100) log(1 P 100)
1 =
NP =
2
2
log(a1 )
log(a1 )
(B.7)
where and denote the ceiling and floor operations, which correspond to rounding
upwards and downwards. Note that a practical value for N P must be an integer. The
base of the logarithm in (B.7) is arbitrary, e.g., 10. Figure B.1 shows the effective
length N P of the one-pole filter as a function of the filter coefficient a1 for P = 90%,
95%, and 99%.
188
1
1 2r cos( )z 1 + r 2 z 2
(B.8)
The poles of this filter are at z = re j , and for stability it is required that r < 1. The
corresponding impulse response can be expressed as
n<0
0,
n
h(n) = r
sin sin[(n + 1) ], n 0
(B.9)
1 + r2
1
j 2
E = H(e ) d =
0
1 r 2 1 + r 4 2r 2 cos2
)(
(B.10)
NP
h2 (n) = E E100 P
(B.11)
n=0
where
E100 P
1
1
r 2( N P +1)
2n
2
2n
=
r sin [(n + 1) ] sin 2 r = (1 r 2 )sin 2 (B.12)
sin 2 n= N +1
n= N +1
P
1
log(r 2 )
(B.13)
Equation (B.12) overestimates the residual energy E100 P and consequently the
estimate for EP (B.11) becomes too small. This is because it has been assumed that
sin 2 [(n + 1) ] 1. Thus, Eq. (B.13) gives a value too large for N P .
The estimate in (B.13) for N P of the second-order recursive filter can be improved
by using sin 2 = 12 (1 cos2 ) and by writing (B.12) in the form E100 P = GP,1
GP,2
with
GP,1
r 2( N P +1)
=
2(1 r 2 )sin 2 ( )
(B.14)
and
=
GP,2
r 2( N P +1) 2n
r cos(2n + P )
2sin 2 ( ) n=0
(B.15)
189
where P = 2(N P + 2) . Using cos(x + y) = cos(x)cos(y) sin(x)sin(y) and the following sin formulas (from (Gradshteyn and Ryzhik, 1964), Eq. 1.447)
psin
(B.16)
1 p cos
(B.17)
pn sin(k ) = 1 2 p cos( ) + p2
k =0
pk cos(k ) = 1 2 p cos( ) + p2
k =0
GP,2
=
2
2
4
2sin ( ) 1 2r cos(2 ) + r
(B.18)
with
(B.19)
(B.20)
from which one yields the maximum Fmax and the other the minimum Fmin . Using
(B.14) and (B.17), the upper bound for N P can be solved as
NP, max
=
log(r 2 )
(B.21)
1
Fmax
2
2
1 2r cos(2 ) + r 4
1 r
(B.22)
with
Gmax =
(B.23)
190
20
18
16
14
12
10
8
6
4
2
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Pole Angle (rad/pi)
0.7
0.8
0.9
Fig. B.2. The effective length N P (P = 90%) of the second-order all-pole filter as a
function of parameter q (r = 0.9). The solid line gives the actual value for N P
(obtained by numerical simulation) and the dashed lines present its upper and
lower limit. The uppermost curve results from the simple approximation
(B.13).
Closed-form formulas for the length of infinite length impulse responses of first and
second-order recursive systems were derived. These measures are based on the idea of
determining when P% of the total energy of the impulse response has arrived. Formulas
for higher-order systems are more complicated but results can also be obtained by
numerical simulation.
Simple Estimation of the Effective Length
The effective length of an infinite impulse response can also be estimated based on the
time constant of the filter. This approach is used often in analog electronics. The time
constant t of the first-order all-pole filter can be solved in the following way
n
r =e
n
t
(B.24)
where r is the radius of the pole, i.e., r = a1 . Now the time constant can be solved as
t=-
1
ln(r)
191
(B.25)
A simple approximation for the time constant can be derived by considering the Taylor
series of the function ln(r), that is
1
1
ln(r) = (r - 1) - (r - 1)2 + (r - 1)3 -K
2
3
(B.26)
When we truncate the first term of this series, we obtain the following approximation
for the time constant of the one-pole filter :
t=
1
1- r
(B.27)
This is also an approximation for the effective length of the impulse response of a
second-order all-pole filter. This is because the envelope of its impulse response decays
as rn [see Eq. (B.9)].
This simple approximation for the time constant can be used for estimating the
length of the impulse response. For example, the impulse response has decayed about
60 dB in 7t samples. More accurate estimates for the effective length of impulse
responses can be computed using the formulas that were derived above utilizing the
cumulated energy of the impulse response.
192
Index
advance time 122, 124, 125, 126, 127
asymmetric window functions 75
binomial coefficient 87, 108
binomial window function 88
CCRMA 17
Chaigne, A. 18
complementary fractional delay 86, 129,
130
CORDIS 18
Cramers rule 84
cross-fading method 117
decimation 104, 129
deinterpolation 21, 128134, 155
deinterpolation 21
digital delay line 66
digital differentiator 60
digital waveguide 17, 26, 63
direct form I structure 104
direct form II structure 121
discrete-time allpass filter 105
discrete-time Fourier transform 66
Farrow structure 103
floor function 30, 41, 67
fractional delay 20, 21, 22, 24
fractional spatial index 141
generalized KarplusStrong model 28
Gibbs phenomenon 74, 75
greatest integer function 67
group delay 69, 106
Horners method 98
impulse-invariant method 40, 54, 55, 56,
57, 63
input-switching method 118, 119
inverse filtering 32
inverse interpolation 129
IRCAM 18, 19
KarplusStrong algorithm 17, 27
193
Index
October 6, 1995
194