Вы находитесь на странице: 1из 10

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 1

Lossless Audio Coding Using the IntMDCT and


Rounding Error Shaping
Yoshikazu Yokotani, Ralf Geiger, Student Member, IEEE, Gerald Schuller, Member, IEEE,
Soontorn Oraintara, Senior Member, IEEE, and K. R. Rao, Fellow, IEEE
AbstractIn this paper, lossless audio coding using the Integer
Modied Discrete Cosine Transform (IntMDCT) is discussed.
The IntMDCT is constructed as an integer approximation of
the MDCT using the lifting scheme and is reversible. The
rounding error shape of the IntMDCT is derived. When the
spectral energy of the input audio signal is concentrated at
the low frequencies, the rounding error spectrum limits the
lossless coding performance. A method for shaping the rounding
error in the transform domain is presented. This rounding error
shaping scheme manipulates the error so that it is below the
spectral envelope of the signal at the high frequencies in order to
improve the lossless coding performance for the signal. Examples
of an error shaping lter design are presented and veried by
simulations. An IntMDCT-based lossless coding implementation
is carried out to illustrate the use of the error shaping lters.
Index TermsIntMDCT, lifting scheme, lossless audio coding,
multi-dimensional lifting scheme, rounding error, rounding error
shaping.
I. INTRODUCTION
T
HE lifting scheme-based integer transform maps integers
to integers and is reversible, and thus it has become a very
useful tool for lossless coding applications. In addition, since
the integer transform is an integer approximation of the origi-
nal oating point transform, the integer transform can be used
for a lossy coding scheme. Hence, a layered coding scheme
can be constructed, with a lossy core layer and an enhancement
layer for lossless coding. Major advantages of implementing
such a combined lossy and lossless coding scheme are 1)it can
provide a bit rate scalability from lossy to lossless and 2)the
lossy core layer can be compatible with conventional lossy
coding schemes. For the case of audio coding, the use of the
Integer Modied Discrete Cosine Transform (IntMDCT) for
a combined lossy and lossless coding scheme can be found
in [1], [2], [3], [4], [5]. The original transform, the MDCT, is
the most common lter bank used in audio coding and it has
a very high stopband attenuation at high frequencies such that
the high bands have very little energy for bandlimited signals.
In addition, the MDCT was adopted by major standards such
as MPEG Layer III (MP3) [6] and MPEG-4 Advanced Audio
Coding (AAC) [7], and therefore the core layer generated
in the combined lossy and lossless coding scheme can be
decoded by the MP3 or AAC decoder. Approaches for other
integer-to-integer transforms are described in [8], [9]. In this
paper, enhancing the lossless coding efciency is of interest.
We construct the IntMDCT followed by an entropy coder
which generates a losslessly encoded bitstream to measure the
efciency.
In order to construct an integer transform using the lifting
scheme, traditionally, the original transform is factorized into
a product of Givens rotations [10]. In the case of the MDCT,
based on [11], the structure is decomposed into the window
and time domain aliasing (WTDA) operation and the DCT of
type-IV (DCT-IV) operation [12]. The former operation can be
directly realized via Givens rotations, while in the latter one,
the DCT-IV matrix can be factorized into a product of sparse
orthogonal matrices. The structure in [13], [14] adopts direct
factorization using the discrete cosine and sine transforms
of type-II (DCT-II and DST-II) [15], [16], whereas the one
in [3], [4] is based on the employment of the fast Fourier
transform (FFT) [11], [17]. In both methods, once the DCT-
IV matrix is factorized into Givens rotations, each rotation is
replaced by the conventional three step lifting scheme [18].
Each lifting step has a scalar multiplication followed by
a rounding operation. Every rounding operation introduces
rounding error which is accumulated in the transform domain.
The accumulated error is interpreted as the approximation
error of the IntMDCT. Even though the error can be cancelled
out by the inverse transform, the accumulated error spreads
in the transform domain. Since all the spectral lines have to
be encoded, one can imagine that the error spectrum has a
negative impact on the lossless coding efciency. The impact
is more critical when a large size of transform is used since
many lifting steps are needed. In case of MPEG-4 AAC-
based scheme [1], [4], [5], the number of subbands of the
IntMDCT is 1024, which is much higher than the number
used in previous works [8], [9], [13]. The impact on the coding
efciency is experimentally evaluated in [4].
To increase the approximation accuracy, the multi-
dimensional lifting (MDL) scheme was recently proposed [19].
This technique is based on a generalized factorization of a
matrix scaling operation [20]. The MDL scheme can increase
the accuracy of an integer approximation of the DCT-IV by
reducing the number of lifting stages. As a consequence, the
entire level of the approximation error of the IntMDCT is
lowered signicantly, and correspondingly the lossless coding
efciency is improved [21]. Furthermore, when an input audio
signal is stereo, the MDL scheme-based stereo IntMDCT is
more suitable to use since it has fewer number of rounding
operations than the case where the MDL scheme-based Int-
MDCT is separately processed for each channel [19]. Similar
approaches to improve the accuracy followed in [22], [23].
These approaches treated the MDCT computation for each
channel separately. Hence, they need more rounding operations
than the stereo IntMDCT.
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 2
When the spectral energy of the input signal is concentrated
at low frequency bands, it is possible to improve the coding
efciency by attenuating the error spectrum at high frequency
bands where the error spectrum is dominant. In this paper,
we will shape the error spectrum towards the low frequency
bands based on a noise shaping scheme, which has been used
to shape the quantization noise towards the high frequency
bands to make the noise as minimally audible as possible [24],
[25]. The conventional noise shaping scheme negatively feeds
back the quantization noise after highpass ltering, while
the rounding error shaping scheme positively feeds back the
rounding error after lowpass ltering in order to achieve the
goal. However, as will be seen in Section III, the error shaping
is applied within the MDL scheme. Therefore, it is desirable
to investigate if the above approach can shape the rounding
error as expected, and thus a model to estimate the rounding
error is necessary.
In this paper, a rounding error model of the stereo IntMDCT
is rst discussed. Based on the model, it is shown that we can
shape the error with a lowpass lter designed in the odd-DFT
(ODFT) [26] domain. Furthermore, an experimental test shows
an improvement in lossless coding efciency when the spectral
energy of an input signal is mainly located at low frequencies.
This paper is organized as follows: in Section II, the con-
ventional lifting scheme and the MDL scheme are introduced.
Based on the schemes, the structure of the stereo IntMDCT
is described. In Section III, a model of the rounding error
is derived from a generalized lifting model. Then, rounding
error shaping based on the model is discussed in Section IV.
In Section V, several error shaping lters are designed and
incorporated into the stereo IntMDCT-based lossless audio
codec to illustrate the coding efciency improvement due to
the error shaping. Section VI concludes this paper.
II. STEREO INTMDCT
A. Notations
In this paper, boldface lower and upper case letters indi-
cate respectively vectors and matrices. Table I lists common
notations used in this paper.
TABLE I
LIST OF NOTATIONS
Notation Meaning
I N N identity matrix
0 N N zero matrix
e rounding error in the time domain
E rounding error in the DCT-IV domain
E[] expectation operator
subscripts L and R signals in the left and right channels
subscript w signal associated with the result of
the WTDA operation
subscript c signal associated with the result of
the DCT-IV operation
subscript h signal convolved by a lter h
subscript o signal in the ODFT domain
In this paper, all the arithmetic operations are oating-point,
and we assume that the rounding error injected by a rounding
operation is white and uniformly distributed between 0.5 and
0.5, hence with variance of
1
12
.
B. Conventional Three Step Lifting Scheme
The conventional three step lifting scheme [18] is a fac-
torization of a Givens rotation matrix into a product of three
lower and upper triangular matrices as follows:

cos sin
sin cos

1 tan

2
0 1

1 0
sin 1

1 tan

2
0 1

,
(1)
where is the rotational angle. Note that multiplying a trian-
gular matrix in the lifting factorization above can be realized
by a scalar multiplication followed by a scalar addition. The
inverse is done by carrying out a subtraction instead of an
addition. The invertibility holds even if a rounding operation
is applied after each scalar multiplication in order to make (1)
an integer-to-integer mapping.
C. MDL Scheme
The MDL scheme is a multi-dimensional extension of the
above triangular factorization [20]. Consider the following lift-
ing decomposition of a 2 2 scaling matrix with determinant
of one:

d 0
0 d
1

0 1
1 0

1 d
1
0 1

1 0
d 1

1 d
1
0 1

,
where d is a non-zero scalar and dd
1
= 1. This decom-
position provides the basic idea for the MDL approach. The
above factorization still holds when all the scalar quantities
are replaced by matrices as follows:

T 0
0 T
1

0 I
I 0

I T
1
0 I

I 0
T I

I T
1
0 I

,
(2)
where T is an N N non-singular matrix. Consider a lower
triangular matrix below:

I 0
A I

.
Similar to the scalar case, an invertible integer approximation
of the computation of (2) can be done by applying the lifting
scheme to each one of these block matrices. The rst half
of the integer-valued inputs are multiplied by the matrix
A and then rounded to integers before being added to the
second half of the inputs. The inverse of the block matrix is
obtained by changing A to A. Hence, the computation of
(2) is invertible. Similar to the conventional lifting scheme,
the invertibility still holds even if a rounding operation is
introduced after each matrix multiplication. Note that no
special restriction applies to the matrix A, e.g. it can be
singular.
D. Stereo IntMDCT by Conventional Lifting and MDL
Schemes
In this subsection, we describe the structure of the stereo
IntMDCT using the conventional lifting and MDL schemes.
The stereo IntMDCT transforms 2N stereo audio samples of
the i
th
frame, x
L
[n] and x
R
[n] for n = 0, . . . , N 1, into
the transform coefcient X
L
[k] + E
L
[k] and X
R
[k] + E
R
[k]
for k = 0, . . . , N 1. X
L
[k] and X
R
[k] are the oating-point
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 3
MDCT coefcients, and E
L
[k] and E
R
[k] are the accumulated
rounding errors. Fig. 1 depicts the structure of the stereo
IntMDCT. Fig. 1 (a) illustrates the WTDA operation in the left
channel. The right channel has the identical structure. Fig. 1
(b) shows sign changing and ipping operations and the DCT-
IV operation.
First, the WTDA operation is discussed. The WTDA oper-
ation includes the windowing, here with a sine window [7],
and the time domain aliasing. It is followed by a DCT-IV
to complete the MDCT lter bank. The WTDA operation is
two fold. The rst step is to perform the following
N
2
Givens
rotation operations on x
L
[n]. Let the corresponding output be
x
wL
[n] where n = 0, . . . , N 1. Then, x
wL
[n] is given by

x
wL
(
N
2
)
.
.
.
x
wL
(N 1)
x
wL
(
N
2
1)
.
.
.
x
wL
(0)

C S
S C

x
L
(0)
.
.
.
x
L
(
N
2
1)
x
L
(N 1)
.
.
.
x
L
(
N
2
)

, (3)
where C and S are
N
2

N
2
diagonal matrices whose (m, m)
th
elements are given by
C
(m,m)
= cos (m), (4)
S
(m,m)
= sin (m), (5)
and (m) =

2N
(m+0.5) for m = 0, . . . ,
N
2
1. The second
step is to concatenate the second half of x
wL
[n] in the previous
(i 1)
th
frame and the rst half of x
wL
[n] in the current i
th
frame. The concatenated x
wL
[n] for n = 0, . . . , N 1 is the
output of the WTDA operation as illustrated in Fig. 1 (a).
In order to obtain an integer approximation of (3), rst, the
left matrix on the right hand side of (3) is decomposed via the
lifting scheme (1) in the following way:

C S
S C

IN
2
T
0N
2
IN
2

IN
2
0N
2
S IN
2

IN
2
T
0N
2
IN
2

, (6)
where IN
2
and 0N
2
are
N
2

N
2
identity and zero matrices,
respectively. T is an
N
2

N
2
diagonal matrix whose (m, m)
th
element is given by
T
(m,m)
= tan
(m)
2
, (7)
for m = 0, ,
N
2
1. Rounding operations are introduced
after S and T as illustrated in Fig. 1 (a). Let the resulting
accumulated rounding error associated with x
wL
[n] be e
wL
[n]
for n = 0, . . . , N 1. For the WTDA operation in the
right channel, replace x
L
[n] and x
wL
[n] in (3) by x
R
[n] and
x
wR
[n], respectively, with the resulting error e
wR
[n].
Secondly, an integer approximation of the DCT-IV operation
performed on the WTDA outputs is obtained. Let the inputs
for the DCT-IV operation in both channels be x

wL
[n]+e

wL
[n]
and x

wR
[n] + e

wR
[n] for n = 0, . . . , N 1. Then, they are
given by
x

wL
[n] = x
wL
[N 1 n], (8)
x

wR
[n] = x
wR
[N 1 n], (9)
e

wL
[n] = e
wL
[N 1 n], (10)
e

wR
[n] = e
wR
[N 1 n], (11)
These inputs are processed by the following two blocks of the
N N DCT-IV matrices simultaneously:

C
IV
0
0 C
IV

, (12)
where C
IV
is the N N DCT-IV matrix for each of the two
channels and the (k, n)
th
element is given by
C
IV,(k,n)
=

2
N
cos

n +
1
2

k +
1
2

. (13)
Since C
IV
C
IV
= I, according to (2), the matrix in (12) can
be factorized as:

C
IV
0
0 C
IV

0 I
I 0

I C
IV
0 I

I 0
C
IV
I

I C
IV
0 I

. (14)
Thus, apart from permutations and multiplications with 1,
the application of the DCT-IV to the two blocks of signals
can be performed with three MDL steps. The integer ap-
proximation of the DCT-IV operation is done by applying
rounding operations after each DCT-IV matrix multiplication.
This process is illustrated in Fig. 1 (b). The total number of
rounding steps is 3N, i.e. 3N/2 rounding steps per block
of channel pair (2N samples). This is much fewer than the
number of rounding steps needed for the conventional three
step lifting case, and therefore the entire level of the rounding
error can be signicantly reduced [19].
III. ROUNDING ERROR OF THE STEREO INTMDCT
In this section, we discuss the rounding error of the stereo
IntMDCT. First, we derive rounding error of a generalized
lifting step model. Based on the result, secondly, the rounding
error of the WTDA and the DCT-IV operations is obtained. For
the case of the DCT-IV operation, the initial error is set to be
the one obtained by the WTDA operation. Thus, the resulting
rounding error of the DCT-IV operation is equivalent to the
rounding error of the stereo IntMDCT.
A. Rounding Error of the Generalized Lifting Step Model
One can see that (6) and (14) contain the following form:

I
M
A
0
M
I
M

I
M
0
M
B I
M

I
M
A
0
M
I
M

, (15)
where I
M
and 0
M
are M M identity and zero matrices,
respectively. A and B are M M matrices. For the case of
(6), A = T, B = S, and M =
N
2
. For the case of (14), A =
C
IV
, B = C
IV
, and M = N. Therefore, it is possible to
unify the derivation of the rounding errors for these two cases
by the use of (15). We call (15) a generalized lifting step model
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 4
xL[0]
xL[N/2-1]
T
-S
T
xL[N-1]
xL[N/2]
xwL[N/2]+ewL[N/2]
xL[0]
xL[N/2-1]
T
-S
T
xL[N-1]
xL[N/2]
The input in
the (i-1)
th
frame
The input in
the i
th
frame
xwL[N-1]+ewL[N-1]
xwL[0]+ewL[0]
xwL[N/2-1]+ewL[N/2-1]
xwL[N/2]+ewL[N/2]
xwL[N-1]+ewL[N-1]
xwL[0]+ewL[0]
xwL[N/2-1]+ewL[N/2-1]
The WTDA output
in the i
th
frame
The WTDA output
in the (i-1)
th
frame
The WTDA output
in the (i+1)
th
frame
(a)
XL[0]+EL[0]
CIV
-CIV
CIV
xwL[0]+ewL[0]
xwL[N-1]+ewL[N-1]
The left chan.
WTDA output
in the i
th
frame
-1
-1
The right chan.
WTDA output
in the i
th
frame
XL[N-1]+EL[N-1]
-1
-1
XR[0]+ER[0]
XR[N-1]+ER[N-1]
xwL[0]+ewL[0]
xwL[N-1]+ewL[N-1]
-1
-1
xwR[0]+ewR[0]
xwR[N-1]+ewR[N-1]
xwR[0]+ewR[0]
xwR[N-1]+ewR[N-1]
The left chan.
IntMDCT spectrum
in the i
th
frame
The right chan.
IntMDCT spectrum
in the i
th
frame
(b)
Fig. 1. Structure of the stereo IntMDCT. [] symbolizes a rounding operation. (a) the conventional lifting structure for the WTDA operation in the left channel
and (b) the MDL structure for the DCT-IV operation preceded by ipping and sign changing operations.
u0 + eu0
A
B
A
v0 + ev0
e1
e2
e3
u + eu
v + ev
Fig. 2. Generalized lifting step model. [] symbolizes a rounding operation.
in this paper. Based on the model, we derive the accumulated
rounding error. Fig. 2 illustrates the generalized lifting step
model with rounding operations. In Fig. 2, all the vectors are
M1 column vectors. u
0
and v
0
are the inputs, and e
u0
and
e
v0
are their initial rounding errors. The corresponding outputs
are u and v. Let e
u
and e
v
be the rounding errors accumulated
at the output. e
1
, e
2
, and e
3
denote the rounding errors injected
by the rounding operations after the matrix multiplications. In
Fig. 2, the following equations hold:
e
u
= (I
M
+AB)(e
u0
+e
1
) +A(2I
M
+BA)e
v0
+Ae
2
+e
3
, (16)
e
v
= B(e
u0
+e
1
) + (I
M
+BA)e
v0
+e
2
. (17)
B. Rounding Error of the WTDA Operation
In this subsection, we derive the rounding error of the
WTDA operation. From Fig. 1 (a), the n
th
element of the
error in the left channel, e
wL
[n], can be obtained as (18) by
substituting A = T, B = S, M =
N
2
, e
u0
[n] = 0, and
e
v0
[n] = 0 for n = 0, ,
N
2
1, into (16) and (17). Note
that e
wR
[n] has the identical representation. Since the variance
of e
1
, e
2
, and e
3
is
1
12
and they are uncorrelated, the variance
E[e
2
wL
[n]] and E[e
2
wR
[n]] can be given by (19).
C. Rounding Error of the DCT-IV Operation
In this subsection, we derive the rounding error of the DCT-
IV operation via the MDL steps. Note that the accumulated
rounding error after the DCT-IV operation is the rounding error
of the stereo IntMDCT. The k
th
elements of the rounding
errors, E
L
[k] and E
R
[k], can be obtained by substituting A =
C
IV
, B = C
IV
, M = N, e
u0
= e

wL
, and e
v0
= e

wR
into
(16) and (17).
E
L
[k] = e
v
[k] = C
k
IV
(e

wL
+e
1
) e
2
[k], (20)
E
R
[k] = e
u
[k] = C
k
IV
(e

wR
+e
2
) + e
3
[k], (21)
where C
k
IV
is the k
th
row vector of C
IV
for k = 0, . . . , N1.
Due to the statistics of the rounding error terms in (20) and
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 5
e
wL
[n] =

e
v

N
2
1 n

= sin

N
2
1 n

e
1

N
2
1 n

+e
2

N
2
1 n

for n = 0, . . . ,
N
2
1,
e
u

n
N
2

= cos

n
N
2

e
1

n
N
2

tan
[n
N
2
]
2
e
2

n
N
2

+e
3

n
N
2

for n =
N
2
, . . . , N 1.
(18)
E[e
2
wL
[n]] = E[e
2
wR
[n]] =

1
12

sin
2

N
2
1 n

+ 1

for n = 0, . . . ,
N
2
1,
1
12

cos
2

n
N
2

+ tan
2
[n
N
2
]
2
+ 1

for n =
N
2
, . . . , N 1.
(19)
(21), E[E
2
L
[k]] and E[E
2
R
[k]] can be approximated by:
E[E
2
L
[k]]
2
ew
+
1
6
, (22)
E[E
2
R
[k]]
2
ew
+
1
6
, (23)
where

2
ew
=
1
N
N1

n=0
E[e
2
wL
[n]] =
1
N
N1

n=0
E[e
2
wR
[n]]. (24)
The derivation is described in Appendix I.
2
ew
is an average
of the variance of the rounding error of the WTDA operation.
The numerical estimation of the overall approximation error
shows that it is indeed practically at across frequency. This
also hints that we can gain performance from error shaping.
IV. ROUNDING ERROR SHAPING
In this section, the rounding error is re-calculated after
rounding error shaping is applied to the rounding operations of
the stereo IntMDCT. The conventional noise shaping scheme
is in general applied to an audio signal in the time domain
and the noise shaping effect is observed in the frequency
domain [24]. On the other hand, the stereo IntMDCT has its
internal signals for the DCT-IV computation (Fig. 1 (b)) in
either time domain or DCT-IV domain. (20) shows that the
rounding error of the stereo IntMDCT in the left channel can
be given by e
2
[k] and the DCT-IV coefcients of both e

wL
and
e
1
. Similarly, for the right channel case, as shown by (21), it
can be described by e
3
[k] and the DCT-IV coefcients of both
e

wR
and e
2
. Thus, it is possible to employ the conventional
noise shaping scheme individually for the rounding operations
to see an effect of shaping on each rounding error separately.
In this section, we assume to use a xed error shaping lter
for all the rounding operations except for ones in the third
MDL step. This is because (21) shows that e
3
[k] is simply
added in the transform domain and the shaping effect cannot
be observed at the output. First, we discuss a general case.
Fig. 3 shows a block diagram of the rounding error shaping
scheme applied to a rounding operation in a lifting step. Note
that the conventional noise shaping [24] has a negative noise
feedback loop.
In Fig. 3, X(z), Y (z), and E(z) are the input, output, and
the rounding error injected by the rounding operation in the z
domain, respectively. Y (z) can be given by
Y (z) = aX(z) + (1 +H

(z)z
1
)E(z)
= aX(z) +H(z)E(z), (25)
X(z)
z
-1
E(z)
a
H(z)
Y(z)
Fig. 3. Block diagram of rounding error shaping for a lifting step. []
symbolizes a rounding operation.
where H(z) = 1 + H

(z) = 1 +

M
n=1
h[n]z
n
is a causal
error shaping lter of order M. a is a scalar constant. When this
scheme is applied to the rounding operations for the WTDA
operation, a is replaced by the matrices T and S, while it
becomes C
IV
and C
IV
in the case of the DCT-IV operation.
(25) shows that the rounding error E(z) can be shaped by
H(z). The ltered rounding error, e
h
[n] for n = 0, . . . , N1,
can be computed by
e
h
[n] =
M

m=0
h[m]e[n m], (26)
where M N 1. If both h[m] and e[n m] are real, the
DCT-IV coefcients of e
h
[n], E
h
[k], can be approximated by
E
h
[k]

2
N
Re

W
k
H
o
[k]E
o
[k]

, (27)
where k = 0, . . . , N 1. W
k
= e
j

2N
(k+
1
2
)
, and E
o
[k] and
H
o
[k] are the 2N-point ODFT [26] coefcients of e[n] and
h[n], respectively. Re(x) takes a real part of a complex number
x. The derivation of (27) is in Appendix II.
We now consider applying the error shaping lter h[n] to the
rounding operations in the realization of the stereo IntMDCT
except for the rounding operations in the third MDL step.
In other words, the lter is used to shape e

wL
, e

wR
, e
1
,
and e
2
in (20) and (21). Let e

wLh
, e

wRh
, e
1h
, and e
2h
be
the corresponding shaped error vectors, respectively. When
the rounding error shaping scheme is applied, the resulting
rounding errors at the spectral line index k in the IntMDCT
domain, E
hL
[k] and E
hR
[k], can be obtained from (20) and
(21) and represented by the following equations:
E
hL
[k] = C
k
IV
(e

wLh
+e
1h
) e
2h
[k], (28)
E
hR
[k] = C
k
IV
(e

wRh
+e
2h
) +e
3
[k], (29)
e

wLh
, e

wRh
, e
1h
, and e
2h
are uncorrelated to one another
since each one of them is, as it is given by (26), a linear
combination of e

wL
[n m], e

wR
[n m], e
1
[k m], and
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 6
E[E

2
wLh
[k]]
2
N

Re

W
k
H
o
[k]

Re

2N1

n=0
E[e

wL
[n]]e
j

N
n(k+
1
2
)

2
+
2
N

Im

W
k
H
o
[k]

Im

2N1

m=0
E[e

wL
[m]]e
j

N
m(k+
1
2
)

4
N
Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

2N1

n=0
2N1

m=0
E[e

wL
[n]e

wL
[m]]Re

e
j

N
n(k+
1
2
)

Im

e
j

N
m(k+
1
2
)

=
2
N

Re

W
k
H
o
[k]

2
2N1

n=0
E[e

2
wL
[n]]C
2
(k,n)
+
2
N

Im

W
k
H
o
[k]

2
2N1

m=0
E[e

2
wL
[m]]
S
2
(k,m)
+
4
N
Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

2N1

n=0
2N1

m=0
E[e

wL
[n]e

wL
[m]]C
(k,n)
S
(k,m)
, (30)
e
2
[km], respectively for m = 0, . . . , M. Hence the variance
of E
hL
[k] and E
hR
[k] can be given by:
E[E
2
hL
[k]] = E[E

2
wLh
[k]] + E[E
2
1h
[k]] + E[e
2
2h
[k]],(31)
E[E
2
hR
[k]] = E[E

2
wRh
[k]] + E[E
2
2h
[k]] + E[e
2
3
[k]]. (32)
Due to the space limitation, only simplication of (31) is
considered. E

wLh
[k] and E
1h
[k] are given by
E

wLh
[k]

2
N
Re

W
k
H
o
[k]E

wLo
[k]

, (33)
E
1h
[k]

2
N
Re

W
k
H
o
[k]E
1o
[k]

. (34)
From (33), E

wLh
[k] can be re-written as:
E

wLh
[k]

2
N

Re

W
k
H
o
[k]

Re (E
wLo
[k])
Im

W
k
H
o
[k]

Im(E
wLo
[k])

.
E[E

2
wLh
[k]] can be calculated as in (30), where C
(k,n)
=
cos

N
n

k +
1
2

and S
(k,m)
= sin

N
m

k +
1
2

. Since
e

wL
[n] = e

wL
[m] = 0 for n, m > N 1 and
E[e

wL
[n]e

wL
[m]] = 0 for n = m, the approximation above
can be simplied by using the formulas cos
2
=
1+cos 2
2
and
sin
2
=
1cos 2
2
as follows:
E[E

2
wLh
[k]]
2
ew
|H
o
[k]|
2
+
w
[k], (35)
where
2
ew
is given by (24) and

w
[k] =

Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

w
[k]
+2Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

w
[k].

w
[k] and
w
[k] are averages of E[e

2
wL
[n]] modulated by
C
(k,2n)
and S
(k,2n)
, respectively. They are given by

w
[k] =
1
N
N1

n=0
E[e

2
wL
[n]] cos

N
2n

k +
1
2

,(36)

w
[k] =
1
N
N1

n=0
E[e

2
wL
[n]] sin

N
2n

k +
1
2

.(37)
(35) shows that, when the rounding error injected by each
rounding operation during the WTDA operation in the left
channel is convolved with the lter h[n], the total variance
is described by
2
ew
multiplied by |H
O
[k]|
2
and the residual
term
w
[k]. Likewise, E[E
2
1h
[k]] in (31) can be simplied and
given by the following equation which is similar to (35):
E[E
2
1h
[k]]
1
12
|H
o
[k]|
2
+
c
[k], (38)
where
c
[k] is the residual term and given by

c
[k] =

Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

c
[k]
+2Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

c
[k].

c
[k] and
c
[k] are averages of E[e
2
1
[k

]] modulated by
C
(k,2k

)
and S
(k,2k

)
, respectively. They are given by

c
[k] =
1
N
N1

=0
E[e
2
1
[k

]] cos

N
2k

k +
1
2

,(39)

c
[k] =
1
N
N1

=0
E[e
2
1
[k

]] sin

N
2k

k +
1
2

. (40)
Finally, by substituting (35) and (38) for E[E

2
wLh
[k]] and
E[E
2
1h
[k]] in (31) and by taking the same simplication ap-
proach for (32), the following approximations can be obtained:
E[E
2
hL
[k]]

2
ew
+
1
12

|H
o
[k]|
2
+ E[|h[k] e
2
[k]|
2
]
+
w
[k] +
c
[k], (41)
E[E
2
hR
[k]]

2
ew
+
1
12

|H
o
[k]|
2
+
1
12
+
w
[k]
+
c
[k], (42)
where indicates the convolution operator. This shows that
we indeed have a tool to shape the rounding error, up to the
terms which are not multiplied by H
o
[k]
V. FILTER DESIGN AND CODING RESULTS
In this section, we discuss an error shaping lter design
based on (41) and (42) described in the previous section and
perform a lossless coding implementation to evaluate an effect
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 7
of the rounding error shaping. The input signals are 15 audio
items at two sampling frequencies, 48kHz and 96kHz and
quantized at 16 bits. These items are used in the MPEG-4
task group for lossless audio coding [27]. Let us call each set
of the 15 items 48/16 and 96/16, respectively. Due to the
higher sampling rate, the spectral energy of 96/16 items is
mainly at lower frequencies compared to 48/16 items. The
value of N is 1024.
In (41), the rst term on the right hand side in-
dicates that

2
ew
+
1
12

can be shaped by |H
O
[k]|
2
.
E[|h[k] e
2
[k]|
2
] is the variance of the convolution be-
tween e
2
[k] and h[n]. The last two terms of (41) on the
right hand side are the residual terms of the error shap-
ing. By using (19) and E[e
2
1
[k]] = E[e
2
2
[k]] =
1
12
and
|

Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

2
| |H
o
[k]|
2
and
2Re

W
k
H
o
[k]

Im

W
k
H
o
[k]

|H
o
[k]|
2
, one can show
that the last two terms in (41) are much smaller than the rst
term. Since N = 1024, the ratio of
1
N

N1
k=0
(|
w
[k]|+|
c
[k]|)
to

2
ew
+
1
12

is approximately 1.6110
3
while the ratio of
1
N

N1
k=0
(|
w
[k]| +|
c
[k]|) to

2
ew
+
1
12

is approximately
4.91 10
3
. Hence they can be neglected. Similarly, the last
two terms in (42) can also be neglected. The second term
in (41) comes from the error shaping operation in the second
MDL step. In order to simplify the simulation, it is omitted and
thus one can immediately see that E[E
2
hL
[k]] and E[E
2
hR
[k]]
can be controlled by H
o
[k] as:
E[E
2
hL
[k]]

2
ew
+
1
12

|H
o
[k]|
2
+
1
12
, (43)
E[E
2
hR
[k]]
2
ew
|H
o
[k]|
2
+
1
6
. (44)
A. Error Shaping Filter Design
In this subsection, we discuss an error shaping lter design.
In order to design such a lter, it is necessary to know
the shape of the spectral envelope of the input signal. In
this paper, we decided to use vocal signals (SQAM44-48)
of EBU-SQAM [28] with sampling frequency 44.1kHz and
quantized at 16bit PCM for the training purpose. Fig. 4 shows
the average of the MDCT coefcients of these signals and
the gure is used to optimize the lter h[n]. Fig. 5 shows
six different versions of h[n] optimized by controlling the
passband gain, passband width, and stopband energy. The
properties and frequency responses of the designed lters are
shown in Table II and Fig. 5, respectively.
B. Lossless Coding Implementation
In this subsection, a lossless coding implementation is
shown in order to test the improvement in lossless coding
efciency due to the rounding error shaping. The stereo
IntMDCT is implemented with each of the six error shaping
lters, which is enabled only when an input frame is non-
silent. The resulting IntMDCT coefcients are compressed by
an entropy coder, which was not optimized for best absolute
values, but to show the differences for the error shaping.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10
1
10
0
10
1
10
2
10
3
10
4
Normalized Frequency (rad/)
SQAM 4448 Left Averaged MDCT Spectrum
0.23
0.4
Fig. 4. Averaged MDCT spectrum of vocal (SQAM44-48) in the left channel.
A horizontal solid line indicates an experimentally determined minimum
magnitude level within the passband of error shaping lters.
TABLE II
SUMMARY OF THE FILTERS FOR 48/16 ITEMS (TOP) AND FOR 96/16
ITEMS (BOTTOM).
48/16 passband stopband order stopband energy
gain (/rad) M
H
1
4.99 0.4 1.0 14 108.32
H
2
10.01 0.23 1.0 16 428.48
H
3
19.99 0.23 1.0 26 348.67
96/16 passband stopband order stopband energy
gain (/rad) M
H
4
5.01 0.23 1.0 18 496.77
H
5
10.06 0.23 1.0 20 456.44
H
6
20.28 0.23 1.0 32 411.9
Table III shows an improvement in lossless coding ef-
ciency for all the 15 items at each sampling frequency. The
improvement can be observed at both sampling frequencies.
Another important observation is that higher improvements
are obtained at 96/16 items compared to 48/16 items as can
be seen comparing the percentage numbers. This is because
the spectral energy of 96/16 items is mainly at the lower
end of the spectrum compared to 48/16 items due to the
higher sampling rate. To verify that these improvements are
not dependent on a specic kind of entropy coder, we also
compared compression numbers for an estimated entropy,
and obtained the same amount of improvement for our error
shaping scheme. Additionally, we compare the amount of the
improvement with the coding efciency of other lossless audio
codecs. One is Monkeys audio version 3.99. This is one of
the popular open source lossless only codec based on linear
prediction. Another is the MPEG-4 audio scalable lossless
coding (SLS) scheme [29] based on the IntMDCT. The result
was obtained by the default mode of both codecs and shown
in Table IV. From Tables III and IV, the amount of the
improvement due to the error shaping could be seen valuable
even though the absolute value itself seems to be small.
Moreover, in order to conrm the error shaping effect
visually, the averaged error variances E[E
2
L
[k]] and E[E
2
R
[k]]
of the 15 items of 48/16 before and after rounding error
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10
1
10
0
10
1
Normalized Frequency (rad/)
Noise Shaping Filter Response (for F
S
=48kHz audio items)
|H
1
|
|H
2
|
|H
3
|
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10
1
10
0
10
1
Normalized Frequency (rad/)
Noise Shaping Filter Response (for F
S
=96kHz audio items)
|H
4
|
|H
5
|
|H
6
|
(b)
Fig. 5. Frequency responses of the designed error shaping lters for (a)
48/16 audio items and (b) 96/16 audio items.
shaping is applied are shown in Fig.6. In the gure, a at line
in each channel is the absolute error variance without the error
shaping and the L shaped curve is the error variance shaped by
the error shaping lter H
3
. The solid lines in Fig.6 are obtained
from (43) and (44) which show close approximations to the
actual values.
TABLE III
AVERAGED LOSSLESS CODING EFFICIENCY (BITS/SAMPLE) AND THE
IMPROVEMENT (%).
lter type 48/16 lter type 96/16
no lter 7.571 no lter 5.297
H
1
7.564 (0.093) H
4
5.241 (1.056)
H
2
7.562 (0.136) H
5
5.231 (1.235)
H
3
7.561 (0.156) H
6
5.227 (1.312)
VI. CONCLUSIONS
In this paper, lossless audio coding using the IntMDCT
and rounding error shaping was discussed. A model for the
rounding error of the stereo IntMDCT was obtained. Based
on this model, it was shown that the rounding error spectrum
could be shaped using an error shaping lter in the stereo
IntMDCT.
Our rounding error shaping scheme was implemented in
a lossless audio coder, using the stereo IntMDCT, and six
TABLE IV
CODING EFFICIENCY (BITS/SAMPLE) OF OTHER LOSSLESS AUDIO
CODECS AND DIFFERENCE (%)
codec type 48/16 96/16
Monkeys Audio 3.99 7.369 5.004
MPEG-4 SLS 7.378 (0.126) 5.111 (2.147)
0 200 400 600 800 1000
10
1
10
0
10
1
10
2
Spectral line index k
Left Channel
actual
theoretical
0 200 400 600 800 1000
10
1
10
0
10
1
10
2
Spectral line index k
Right Channel
actual
theoretical
Fig. 6. Averaged E[E
2
L
[k]] and E[E
2
R
[k]] (at lines) and E[E
2
hL
[k]] and
E[E
2
hR
[k]] (L shaped lines) of all the 48/16 items with H
3
. : actual and
: theoretical values. The at lines indicate the error variance without error
shaping.
different versions of the error shaping lter. The results
showed that the coding performance is indeed improved. The
improvement becomes bigger at the higher sampling rates.
APPENDIX I
VARIANCE OF THE APPROXIMATION ERROR OF THE
STEREO INTMDCT
From (19), e
wL
and e
wR
have the same variance. In
addition, e
1
, e
2
, and e
3
have the same statistics. Without a
loss of generality, we only derive the left channel case. Due
to the assumption made about the statistics of the rounding
error variables in Section II, e

wL
[n], e
1
[k], and e
2
[k] are
uncorrelated with one another. Hence, from (10) and (20),
E[E
2
L
[k]] can be given by:
E[E
2
L
[k]] =
1
N
N1

n=0
E[e
2
wL
[n]]

1 +C
(k,n)

+
1
N
N1

=0
E[e
2
1
[k

]]

1 +C
(k,k

+ E[e
2
2
[k]],
where C
(k,n)
= cos

2
N

n +
1
2

k +
1
2

. Since E[e
2
1
[k]] =
E[e
2
2
[k]] =
1
12
for k = 0, . . . , N 1 and

N1
k

=0
C
(k,k

)
= 0,
the above equation can be simplied as:
E[E
2
L
[k]] =
1
N
N1

n=0
E[e
2
wL
[n]] +
1
N
N1

=0
E[e
2
1
[k

]]
+E[e
2
2
[k]] +[k]
=
2
ew
+
1
6
+[k], (45)
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 9
where
[k] =
1
N
N1

n=0
E[e
2
wL
[n]]C
(k,n)
.
From (19), [k] can be neglected since it is much smaller
than the other terms. For example, if N = 1024,
|[k]|

2
ew
+
1
6
is
approximately 6.910
4
, where |[k]| is the average of |[k]|.
The condition for the right channel can be derived in the same
fashion.
APPENDIX II
THE DCT-IV COEFFICIENTS OF e
h
[n]
The DCT-IV coefcient of e
h
[n] at the spectral line index
k is given by
E
h
[k] =

2
N
N1

n=0
e
h
[n] cos

n +
1
2

k +
1
2

2
N
N1

n=0
M

m=0
h[m]e[n m]Re

e
j

N
(n+
1
2
)(k+
1
2
)

.
Since h[m] and e[n m] are real, the equation above can be
re-written as
E
h
[k] =

2
N
Re

e
j

2N
(k+
1
2
)

m=0
h[m]e
j

N
m(k+
1
2
)
N1m

n=m
e[n]e
j

N
n(k+
1
2
)

. (46)
Since e[n] is assumed to be white and stationary, the statistics
do not change with a time shift by m. Thus, the ODFT of e[n]
is approximately the same with any value of m. Accordingly,
(46) can be approximated as follows:
E
h
[k]

2
N
Re

e
j

2N
(k+
1
2
)

m=0
h[m]e
j

N
m(k+
1
2
)
N1

n=0
e[n]e
j

N
n(k+
1
2
)

2
N
Re

W
k
H
o
[k]E
o
[k]

. (47)
REFERENCES
[1] R. Geiger, J. Herre, J. Koller, and K. Brandenburg, IntMDCT - A
Link between Perceptual and Lossless Audio Coding, in Proc. of IEEE
International Conf. on Acoustics, Speech, and Signal Processing, May
2002, vol. 2, pp. 18131816.
[2] R. Geiger, J. Herre, G. Schuller, and T. Sporer, Fine Grain Scalable
Perceptual and Lossless Audio Coding based on IntMDCT, in Proc. of
IEEE International Conf. on Acoustics, Speech, and Signal Processing,
April 2003, vol. 5, pp. 445448.
[3] J. Li, A Progressive to Lossless Embedded Audio Coder (PLEAC) with
Reversible Modulated Lapped Transform, in Multimedia and Expo.
ICME, July 2003, vol. 3, pp. 221224.
[4] Y. Yokotani and S. Oraintara, Lossless Audio Compression using Inte-
ger Modied Discrete Cosine Transform, in Proc. of IEEE International
Symp. on Intelligent Signal Processing and Communication Systems,
December 2003.
[5] R. Yu, X. Lin, S. Rahardja, and C. C. Ko, A Scalable Lossy to Lossless
Audio Coding for MPEG-4 Lossless Audio Coding, in Proc. of IEEE
International Conf. on Acoustics, Speech, and Signal Processing, May
2004, vol. 3, pp. 10041007.
[6] ISO/IEC JTC1/SC29/WG11 (MPEG), ISO/IEC 11172-3: Information
Technology - Coding of Moving Pictures and Associated Audio for
Digital Storage Media up to about 1.5Mbits/s - Part 3: Audio, 1993.
[7] ISO/IEC JTC1/SC29/WG11 (MPEG), ISO/IEC 14496-3: Information
Technology - Coding of Audio-Visual Objects - Part 3: Audio, 1999.
[8] J. Liang and T. D. Tran, Fast Multiplierless Approximations of the
DCT with the Lifting Scheme, IEEE Trans. on Signal Processing, vol.
49, no. 12, pp. 30323044, December 2001.
[9] J. Reichel, G. Menegaz, M. J. Nadenau, and M. Kunt, Integer Wavelet
Transform for Embedded Lossy to Lossless Image Compression, IEEE
Trans. on Image Processing, vol. 10, no. 3, pp. 383392, March 2001.
[10] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins
University Press, third edition, 1996.
[11] H. S. Malvar, Signal Processing with Lapped Transforms, Artech House,
Boston, 1992.
[12] R. Geiger, T. Sporer, J. Koller, and K. Brandenburg, Audio Coding
Based on Integer Transforms, in 111th AES Convention, December
2001, Preprint 5471.
[13] T. Krishnan and S. Oraintara, A Fast and Lossless Forward and Inverse
Structure for the MDCT in MPEG Audio Coding, in Proc. of the
International Symposium on Circuits and Systems, May 2002, vol. 2,
pp. 181184.
[14] D.-Y. Huang and R. Ma, Integer Fast Modied Cosine Transform, in
Multimedia and Expo. ICME, July 2003, vol. 2, pp. 729732.
[15] Z. Wang, A Fast Algorithm for the Discrete Sine Transform Imple-
mented by the Fast Cosine Transform, IEEE Trans. on Acoust., Speech,
Signal Processing, vol. 30, pp. 814815, October 1982.
[16] V. Britanak and K. R. Rao, An Efcient Implementation of the Forward
and Inverse MDCT in MPEG Audio Coding, IEEE Signal Processing
Letters, vol. 8, no. 2, pp. 4851, February 2001.
[17] P. Duhamel, Y. Mahieux, and J. P. Petit, A Fast Algorithm for
the Implementation of Filter Banks Based on Time Domain Aliasing
Cancellation, in Proc. of IEEE International Conf. on Acoustics,
Speech, and Signal Processing, May 1991, vol. 3, pp. 22092212.
[18] I. Daubechies and W. Sweldens, Factoring Wavelet Transforms into
Lifting Steps, Journal of Fourier Anal. Appl., vol. 4, no. 3, pp. 247
269, 1998.
[19] R. Geiger, Y. Yokotani, and G. Schuller, Improved Integer Transforms
for Lossless Audio Coding, in Proc. of the Asilomar Conf. on Signals,
Systems, and Computers, November 2003, vol. 2, pp. 21192123.
[20] R. Geiger and G. Schuller, Integer Low Delay and MDCT Filter
Banks, in Proc. of the Asilomar Conf. on Signals, Systems, and
Computers, September 2002, vol. 1, pp. 811815.
[21] R. Geiger, Y. Yokotani, G. Schuller, and J. Herre, Improved Integer
Transforms Using Multi-Dimensional Lifting, in Proc. of IEEE Inter-
national Conf. on Acoustics, Speech, and Signal Processing, May 2004,
vol. 2, pp. 10051008.
[22] J. Li, Reversible FFT and MDCT via Matrix Lifting, in Proc. of IEEE
International Conf. on Acoustics, Speech, and Signal Processing, May
2004, vol. 4, pp. 173176.
[23] H. Huang, S. Rahardja, R. Yu, and X. Lin, A Fast Algorithm of Integer
MDCT for Lossless Audio Coding, in Proc. of IEEE International
Conf. on Acoustics, Speech, and Signal Processing, May 2044, vol. 4,
pp. 177180.
[24] M. Gerzon and P. G. Craven, Optimal Noise Shaping and Dither of
Digital Signals, in 87th AES Convention, October 1989, Preprint 2822.
[25] S. P. Lipshitz, J. Vanderkooy, and R. A. Wannamaker, Minimally
Audible Noise Shaping, Journal of Audio Eng. Soc., vol. 39, no. 11,
pp. 836852, Nov. 1991.
[26] A. J. S. Ferreira, Accurate Estimation in the ODFT Domain of the
Frequency, Phase and Magnitude of Stationary Sinusoids, in IEEE
Workshop on Applications of Signal Processing to Audio and Acoustics,
October 2001, pp. 4750.
[27] ISO/IEC JTC1/SC29/WG11 Moving Picture Experts Group, Final Call
for Proposals on MPEG-4 Lossless Audio Coding, October 2002,
Shanghai, China, N5208.
[28] European Broadcasting Union, SQAM-Sound Quality Assessment
Material Recordings for Subjective Tests, April 1988.
[29] R. Yu, R. Geiger, S. Rahardja, J. Herre, X. Lin, and H. Huang, MPEG-4
Scalable to Lossless Audio Coding, in 117th AES Convention, October
2004, Preprint 6183.
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, SUBMITTED: OCTOBER 2004, REVISED: JULY 2005, ACCEPTED: OCTOBER 2005 10
PLACE
PHOTO
HERE
Yoshikazu Yokotani received the B.S. and M.S.
degrees from Tokyo University of Science, Chiba,
Japan, in 1995 and 1997, respectively, and received
his Ph.D. degree from the University of Texas at
Arlington in 2004, all in Electrical Engineering.
From May to August 2003, he was an intern at
Fraunhofer Institute for Digital Media Technology
(IDMT), Ilmenau, Germany. He is now with Fraun-
hofer Institute of Integrated Circuits (IIS), Erlangen,
Germany, as a postdoctoral research assistant. His
research interests are in the area of audio coding,
digital audio processing, and lter banks.
PLACE
PHOTO
HERE
Ralf Geiger received the diploma (M.S.) degree
in Mathematics from the University of Regensburg,
Germany, in 1997.
In 1998 he joined the Audio/Multimedia depart-
ment at the Fraunhofer Institute for Integrated Cir-
cuits (IIS), Erlangen, Germany. From 2000 to 2004
he was with the Fraunhofer Institute for Digital Me-
dia Technology (IDMT), Ilmenau, Germany. Since
2005 he is again with the Fraunhofer IIS, Erlangen,
Germany.
He is working on the development and standard-
ization of perceptual and lossless audio coding schemes and serves as an
editor of the upcoming ISO/IEC standard MPEG-4 Scalable Lossless Coding
(SLS).
PLACE
PHOTO
HERE
Gerald D.T. Schuller is temporary full profes-
sor at the Ilmenau University of Technology in
Ilmenau, Germany, and head of the audio coding
research group of the Fraunhofer Institute for Digital
Media Technology, also in Ilmenau. He received
the Vordiplom (B.S.) degree in mathematics from
the Technical University of Clausthal, Germany, in
1984, the Vordiplom and Diplom (M.S.) degree
in electrical engineering from the Technical Univer-
sity of Berlin, Germany, in 1986 and 1989 respec-
tively, and the Ph.D. degree from the University of
Hanover in 1997.
He received a fellowship to study at the Massachusetts Institute of Tech-
nology, Cambridge, in 1989/90, was a Research Assistant at the Technical
University of Berlin from 1990 to 1992, where he worked on speech coding,
a Teaching Assistant at the Georgia Institute of Technology, Atlanta, in 1993,
where he worked on low delay perfect reconstruction lter banks, and a
Research Assistant at the University of Bonn, Germany, in 1994, where he
worked on lter banks for vision applications and their optimization.
Before joining the Fraunhofer Institute he was Member of Technical Staff
at Bell Laboratories, Lucent Technologies, and Agere Systems, a Lucent Spin-
off, from 1998 to 2001. There he worked in the Multimedia Communications
Research Laboratory.
PLACE
PHOTO
HERE
Soontorn Oraintara received the B.E. degree (with
rst-class honors) from the King Monkuts Institute
of Technology Ladkrabang, Bangkok, Thailand, in
1995 and the M.S. and Ph.D. degrees, both in electri-
cal engineering, respectively, from the University of
Wisconsin, Madison, in 1996 and Boston University,
Boston, MA, in 2000.
He joined the Department of Electrical Engineer-
ing, University of Texas at Arlington (UTA), as an
Assistant Professor in July 2000. From May 1998 to
April 2000, he was an intern and a consultant at the
Advanced Research and Development Group, Ericsson Inc., Research Triangle
Park, NC. His current research interests are in the eld of digital signal
processing: wavelets, lterbanks, and multirate systems and their applications
in data compression, signal detection and estimation, communications, image
recontruction, and regularization and noise reduction.
Dr. Oraintara received the Technology Award from Boston University
for his invention on Integer DCT (with Y. J. Chen and T. Q. Nguyen)
in 1999. In 2003, he received the College of Engineering Outstanding
Young Faculty Member Award from UTA. He represented Thailand in the
International Mathematical Olympiad competitions and, respectively, received
the Honorable Mention Award in Beijing, China, in 1990 and the bronze medal
in Sigtuna, Sweden, in 1991.
PLACE
PHOTO
HERE
K. R. Rao received the Ph. D. degree in electrical
engineering from The University of New Mexico,
Albuquerque in 1966. Since 1966, he has been
with the University of Texas at Arlington where
he is currently a professor of electrical engineering.
He, along with two other researchers, introduced
the Discrete Cosine Transform in 1975 which has
since become very popular in digital signal process-
ing. He is the co-author of the books Orthogonal
Transforms for Digital Signal Processing (Springer-
Verlag, 1975), Also recorded for the blind in Braille
by the Royal Institute for the blind. Fast Transforms: Analyses and Ap-
plications (Academic Press, 1982), Discrete Cosine Transform-Algorithms,
Advantages, Applications (Academic Press, 1990). He has edited a bench-
mark volume, Discrete Transforms and Their Applications (Van Nostrand
Reinhold, 1985). He has coedited a benchmark volume, Teleconferencing
(Van Nostrand Reinhold, 1985). He is co-author of the books, Techniques and
standards for Image/Video/Audio Coding (Prentice Hall) 1996 Packet video
communications over ATM networks(Prentice Hall) 2000 and Multimedia
communication systems (Prentice Hall) 2002. He has coedited a handbook
The transform and data compression handbook, ( CRC Press, 2001). Two
books in nal stages: Digital video image quality and perceptual coding,
(with H.R. Wu),Taylor and Francis (Dec. 2005). Introduction to multimedia
communications: applications, middleware, networking, (with Z.S. Bojkovic
and D.A. Milovanovic), Wiley, (Nov. 2005). Some of his books have been
translated into Japanese, Chinese, Korean and Russian and also published as
Asian (paperback) editions. He has been an external examiner for graduate
students from Universities in Australia, Canada, Hong Kong, India, Singapore,
Thailand and Taiwan. He was a visiting professor in several Universities -3
weeks to 7 and1/2 months- (Australia, Japan, Korea, Singapore and Thailand).
He has conducted workshops/tutorials on video/audio coding/standards world-
wide. He has supervised several students at the Masters and Doctoral levels.
He has published extensively in refereed journals and has been a consultant
to industry, research institutes, law rms and academia.

Вам также может понравиться