Вы находитесь на странице: 1из 5

ICICS-PCM 2003 15-18 December 2003 Singapore

3A3.7

Low Power Spectral Band Replication Technology for the MPEG-4 Audio Standard
CHONG Kok Seng*, Naoya TANAKA**, Toshiyuki NOMURA***, Osamu SHIMADA***, KUAH Kim Hann*, Mineo TSUSHIMA**, Yuichiro TAKAMIZAWA***, NEO Sua Hong*, Takeshi NORIMATSU**, Masahiro SERIZAWA***
* Panasonic Singapore Laboratories Pte Ltd. **Matsushita Electric Industrial Co., Ltd. *** NEC Corporation

Abstract
Spectral Band Replication (SBR) is the bandwidth extension technology for the MPEG-4 Audio Extension 1 standard. Its principle is based on mapping the low-frequency portion of an audio signal coded at low bitrate to the missing highfrequency region, and uses a small amount of information embedded in the audio bitstream to shape the energy envelop and tone to noise ratio of the mapped signals, such that they aurally resembles the high-frequency spectrum of the original signal, thereby deliver compact disc -like listening sensation to the listeners. Low Power SBR (LP-SBR) is a simplified version of the SBR technology that applies real-valued processing to all SBR modules, and applies aliasing reduction tools to suppress the resultant aliasing artifacts. LP-SBR requires 40% less computational cost compared to the original SBR. This paper describes the causes of aliasing artifacts and the principles behind the anti-aliasing solutions.

40% less computation at the expense of minor degradation in sound quality. The original SBR later became High Quality SBR (HQ-SBR), and the simplified version became Low Power SBR (LP-SBR) [3].
Original spectrum of audio
f

Reduced bandwidth of low bitrate audio


f

Spectral band replication

1. Introduction
The Moving Picture Experts Group (MPEG) has recently standardized MPEG-4 Audio Extension 1 technology [1]. This brand new audio compression technology is the brainchild of a German company, Coding Technologies (CT). Known as Spectral Band Replication (SBR), its principle is based on mapping the low-frequency portion of an audio signal coded at low bitrate to the missing high-frequency region, and using a small amount of information embedded in the audio bitstream to shape the energy envelop and tone to noise ratio of the mapped signals, such that they aurally resembles the highfrequency spectrum of the original signal, thereby delivering compact disc -like listening sensation to the listeners. SBR is coupled with Advanced Audio Coding (AAC) technology, which codes the low-frequency portion of an audio signal. The principle of SBR is illustrated in Figure 1. During the Core Experiment (CE) phase, Matsushita Electric Industrial Co., Ltd. (MEI), NEC Corporation, and Panasonic Singapore Laboratories Pte Ltd. (PSL) collaborate with CT to produce a simplified version of the SBR technology that uses
0-7803-8185-8/03/$17.00 2003 IEEE
Energy shaping

Adding tone and noise


f

Figure 1: The principle of SBR Both HQ-SBR and LP-SBR passed the criterion put forth by the MPEG Audio Subgroup, which mandated that the new Extension 1 technology must exceed the sound quality of audio coded at 25% higher bitrate with the original AAC. Listening evaluation tests conducted by audio experts evidently showed that HQ-SBR and LP-SBR fulfilled the

criterion at low bitrates of 24kbps/ch and 16kbps/ch. MPEG-4 Audio Ext 1 reaches International Standard status in July 2003. SBR is useful for applications that deliver low-bitrate audio contents, such as Internet streaming and mobile streaming. HQ-SBR is already boasting a clientele comprising Digital Radio Mondiale (www.drm.org) and XM Satellite Radio (www.xmradio.com). This paper describes the principles behind the low complexity tools introduced into the HQ-SBR to produce the LP-SBR system. To simplify description, some equations have been modified from the ones used in the standard to fit the introductory tone of this paper. For detail of equations, the readers are referred to [1].

4. Problem with Real-valued Processing: Aliasing


4.1 Aliasing Problem with Envelope Adjustment

The aliasing cancellation properties are only retained if the subbands are not modified independently (i.e. by applying the same gain). This is true for both real-valued and complexvalued systems. If different gains are applied, aliasing components appear. Aliasing components are very audible for the real-valued system if the input signal contains tone-like features, because the aliasing components appear in the neighbouring subbands. This aliasing phenomenon is illustrated in Figure 3(a) and Figure 3(b) with a tonal signal.

2. High Quality SBR (HQ-SBR)


Figure 2 shows a block diagram of HQ-SBR. The analysis QMF bank splits an AAC decoded signal into multiple subbands. The output sub-band signals are fed to the synthesis QMF bank and the HF generator. Using the SBR information de-multiplexed from the SBR-enhanced AAC bitstream, the HF generator produces subband signals in higher-frequency bands by mapping the subband signals in lower-frequency bands, and the envelope adjuster modifies the energy of the mapped signals. In addition, the envelope adjuster also injects additional noise and/or tones to modify the pitch sensation of the mapped signals. The synthesis QMF bank synthesizes the enhanced signal using the output subband signals from the analysis QMF bank and the envelope adjuster.
AAC Decoded Signal Analysis QMF Bank HF Generator DEMUX Synthesis QMF Bank Envelope Adjuster

Figure 3(a) A tonal signal decoded by HQ-SBR

Enhanced Signal

SBR Bitstream

Figure 3(b) Aliasing components (arrowed) decoded by real-valued SBR

Figure 2: High Quality SBR (HQ-SBR)

4.2 Aliasing Problem with Artificial Sine Tone Injection


To achieve the right tone-to-noise ratio, additional tones are sometimes synthesized by injecting artificial sinusoidal components comprising a real part and an imaginary part scaled to a desired level to the complex-valued synthesis QMF bank. Removing imaginary processing from the synthesis QMF bank leads to an aliasing phenomenon similar to 4.1 where extraneous sinusoids appear on both sides of the main sinusoids, as depicted in Figure 4.

3. Low Power SBR (LP-SBR)


In HQ-SBR, the QMF bank utilizes a complex-valued modulator and the resulting subband signals are complexvalued. In an attempt to reduce the complexity, LP-SBR applies real-valued processing instead of complex-valued one to all SBR-related modules, i.e. the analysis QMF bank, the HF generator (including envelope adjuster) and the synthesis QMF bank in Figure 2.

Condition 1 if parity(k)=odd, 0 < r ( k 1) r ( k ) 1 if parity(k)=even, 1 r ( k 1) r ( k ) < 0 This is illustrated in Figure 6. Discrimination between odd and even subbands is necessary because successive QMF subbands are frequency-inverted. Figure 4: Sine tone images (arrowed) on both sides of main sinusoids

5. Anti-aliasing Measure: Aliasing Reducer


k-1
x(n) Real Analysis QMF Real Synthesis QMF y(n)

subband index

k r(k-1)<0 r(k-1)>0

r(k-1)<0 r(k-1)>0

if k is even

DEMUX Bitstream

Real HF Generator

Real Envelope Adjuster

if k is odd Figure 6: Reflection coefficients of two channels in the presense of one tone component. The dashed tone is the aliasing image of the solid tone.
If two tone components are present, such that the frequency of one lies between the center frequencies of subband k-1 and k, and the frequency of the other lies between the center frequencies of subband k-2 and k-1, the following characteristics of r[k] can be observed: Condition 2 if parity(k)=even, r[k-2] > 0 and r[k] < 0 if parity(k)=odd, r[k-2] < 0 and r[k] > 0 This is illustrated in Figure 7. The level of the aliasing component introduced increases as a function of the difference in gains to be applied to the mapped subbands. Hence, to alleviate the aliasing effect, the gains of neighboring subbands are equalized to reduce the gaindifference between the subbands, and thereby, aliasing. For Case (a), since the image in the k-2 channel would be more annoying than that in the k channel, gains for the k-1 and the k-2 channels should be subjected to more equalization than those for the k and the k-1 channels. It is the opposite for Case (c). For Case (b), it is necessary to apply gain equalization for three adjacent channels, because both images have the same energy. From this observation, the degree of gain equalization should depend on the reflection coefficient r[k1] as follows deg = 1 r[k-1]*r[k-1] This degree of aliasing is only assigned to the lower priority channels that satisfy Condition 2. For the channels that satisfy Condition 1, full gain equalization (i.e. deg = 1.0) is applied.

Fixed Gain Compensator

Aliasing Reducer

Figure 5: Low Power SBR (LP-SBR) is the real-valued SBR with an Aliasing Reducer To counter the above aliasing problems, an Aliasing Reducer [4] is introduced to the real-valued SBR simplified from HQSBR. The Aliasing Reducer performs gain equalization during envelope adjustment and sine tone Image cancellation during sine tone injection. The integration of Aliasing Reducer to the real-valued SBR is shown in Figure 5.

5.1

Gain Equalization

The following description outlines how to reduce aliasing by modifying the gains computed from SBR bitstream elements to modify the mapped signals. The degree of aliasing between any two neighboring subbands can be estimated by examining their reflection coefficients r(p), where p is the subband index. If subband k is mapped from channel p during HF Generation, then channel k inherits the reflection coefficient of channel p as follows:

r (k ) = r ( p ) .
If the frequency of a tone component lies between the center frequencies of subband k-1 and k, the reflection coefficients of the two channels take on the following values:

Based on the above principle, the degree of gain equalization for every mapped subband, deg[k], can be computed.
Tone 1 Tone 2 k=even

initial gains is the same as the total energy of the mapped subbands modified by the adjusted gains. Figure 8 depicts the same signal in Figure 3(b) after the above gain equalization operation has been introduced. Aurally, the aliasing noises associated with the aliaising components are suppressed.

(a)
k-2 r(k-2)>0 k-1 r(k-1)>0 k r(k)<0

subband index

(b)
k-2 r(k-2)>0 k-1 r(k-1)=0 k r(k)<0

subband index

(b)
k-2 r(k-2)>0 k-1 r(k-1)<0 k r(k)<0

Figure 8: Aliasing components have been largely cancelled compared to Figure 3(b)
subband index

5.2

Sine Tone Image Cancellation

Figure 7: Reflection coefficients of three subbands in the presence of two tightly positioned tone components. (a) Energy of Tone 1 > Energy of Tone 2 (b) Energy of Tone 1 = Energy of Tone 2 (c) Energy of Tone 1 < Energy of Tone 2. From this point onward, consecutive subbands with non-zero deg[k] are grouped together and gain equalization is applied to every subband in the group. The principle is that the higher the degree of gain equalization is, the nearer the gains in the group should be adjusted toward a common target gain to reduce gain differences. For all subbands within the group, the total energy of the mapped signal modified by the initial gains is computed. This total energy is then divided by the total energy of the mapped (unmodified) subbands to obtain the target gain, Gt2arg et . All gains within the group is adjusted in the following manner to obtain the adjusted gains:
G (k ) = a(k )G
2 adj 2 t arg et

Introducing signals that approximate the negative of the extraneous sinusoids to the affected QMF channels, as illustrated in Figure 9, can alleviate the tone image phenomenon.
main sine tone tone images
frequency

minus
tone images cancellors
frequency

becomes
Reduced tone images
frequency

Figure 9: Principle of sine tone images cancellation Figure 10 depicts the same main sinusoids in Figure 4 after sine tone images cancellation. As a result of the removal of tone images, artificial tones sound as pure as those of HQSBR.

+ (1 a(k ))G

2 initial

(k )

where a(k)=max(deg[k], deg[k-1]); The adjusted gains in the same group, they are further normalized with a common factor to ensure that the total energy of the group of mapped subbands modified by the

created by band-limiting the original to 7kHz; HQ-SBR-48 means HQ-SBR at 48kbps; LC-SBR-32 means LC-SBR at 32kbps; MP4-AAC-48 means MPEG4 AAC at 48kbps; BCAAC-32 means the AAC part of the SBR at 32kbps, etc. Other codecs and bitrates can be interpreted in a similar fashion. For the full report, see [2]. The test result shows that both HQ-SBR and LP-SBR are very close in mean score, indicating close similarity in quality. Their confidence intervals overlap for both bitrates under test. Both evidently exceed the acceptance criterion put forth by MPEG.
MPEG Audio Stereo Bandwidth Enhancement MUSHRA test results
100.00

Figure 10: Sine tone images have been cancelled

6. Test Results
Figure 11(a)-(c) illustrate the typical spectral views of an original audio signal, AAC coded at 48kbps, and SBR coded at 48kbps respectively.

90.00

80.00

70.00

60.00

Grade

50.00

40.00

30.00

20.00

10.00

0.00 H-Ref-Org H-Ref-7 H-Ref-3 HQ-SBR-48 LC-SBR-48 HQ-SBR-32 LC-SBR-32 MP4-AAC-60 MP4-AAC-48 BC-AAC-48 BC-AAC-32 Modules under Test

Figure 12: Informal Verification Test Result Figure 11(a) Original Audio Signal A formal verification test was conducted before the 65th MPEG meeting, where 49 subjects from Germany and France participated in the test. The test result is similar to that of the informal verification test, but the full report will only be available after the 66th MPEG meeting to be held in October.

7. Conclusion
Owing to the Aliasing Reducer, LP-SBR can operate at 40% lower complexity, yet there is no degradation compared to HQ-SBR, according to the stochastical analysis based on the informal verification test. The MPEG-4 Audio Extension 1 became International Standard at the 65th MPEG meeting in July 2003. The estimated publication date is in September.

Figure 11(b) AAC of the same signal coded at 48kbps

References
[1] ISO/IEC JTC1/SC29/WG11/N5570, Text of ISO/IEC 14496-3:2001/FDAM1, Bandwidth Extension [2] ISO/IEC JTC1/SC29/WG11/N5571, Report on Informal MPEG-4 Extension 1 (Bandwidth Extension) Verification Tests [3] T. Nomura et al., "A Low-Complexity Bandwidth Extension Algorithm for MPEG-4 Audio Standardization," Proc. of IEICE General Conference D-14-8, March, 2003. [4] O. Shimada et al., "An Aliasing Reduction Method For MPEG-4 Aduio Low Complexity Bandwidth Extension," Proc. of IEICE General Conference D-14-9, March, 2003.

Figure 11(c) SBR of the same signal coded at 48kbps An informal verification test has been conducted by listening experts from MPEG to compare the sound quality of HQSBR, LP-SBR and original AAC coded at 25% higher bitrate (the MP4-AAC series). The MUSHRA listening test methodology was used. Due to space constraint, only stereo test results are shown in Figure 12. In the chart, H-Ref-Org is the original audio; H-Ref-7kHz is the hidden reference

Вам также может понравиться