Вы находитесь на странице: 1из 65

Audio Compression & Synthesis Technology Overview

Adam Chang, MSEE Product Marketing Manager

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

Audio Compression Technologies


s

A wild range of Audio compression technologies are available, but few of them are really commercialized.

Owing to internet music, MPEG-1/Audio Layer-3 (so called MP3) becomes the most successful Audio compression technology

In addition to internet music, Audio compression technologies are applied to:


Portable solid-state Audio recorder Internet Radio DAB (Digital Audio Broadcast) system Audio accessories of portable devices (Cell phone, PDA, )

MPEG-1/Audio Layer 3 Coding

s s s s

MPEG/Audio compression layer 3 is now well known as MP3 Low bit-rate Application 64Kbps for mono channel Sampling Frequency: 32, 44.1, or 48KHz Lossy compression algorithm: 12-to-1 Compression ratio

Audio Encoding System Overview

Digital Audio Input

Filter Bank

Bit or Noise Allocation


SMR (Signal to Mask Ratio)

Bitstream Formatting

Encoded Bitstream

Psychoacoustic Model

Hybrid Filter Bank


s

Polyphase filter bank divides the audio signal into 32 equal-width frequency sub bands. Processing the filter outputs with a MDCT (Modified Discrete Cosine Transformation)

Psychoacoustic Model
s

Incoming signal is transformed from time domain to frequency domain for analysis. Psychoacoustic model will calculate SMR (Signal-to-Mask Ratio) to each band by using auditory perception like Simultaneous Masking, Temporal Masking, and Absolute Threshold. SMR of each band will have direct impact to compreesion rate and audio quality. Different Psychoacoustic models are chosen upon trade-off between audio quality and compression rate.

Noise/Bit Allocation

Based on SMR from Psychoacoustic model and bit rate restriction, 576 frequency coefficients are grouped to scale factor bands. Each scale factor band executes noise (or bit) allocation by repeating adjustment of its scale factor and global gain until distortion is minimized. Non-uniform quantization & Huffman Coding

Audio Compression Technologies Comparison

Technology MP3 MP3PRO AAC WMA ATRAC3

Bit rate (Kbit/sec) 128 64 96 96 72

Advantages (1) Internet Music Standard (2) Easy to be silicon LSI (1) High compression ratio (2) Extention of MP3 (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio (1) Excellent audio quality (2) High compression ratio

Drawbacks (1) Bit rate is too high (1) IP by Thomson Multimedia (2) No encoder IC available (1) Not internet music standard (2) No encoder IC available (1) IP by Microsoft (2) No encoder IC available (1) IP by Sony (2) No encoder IC available

10

Audio Compression Technologies Brief Summary


s

MP3 is the most mature technology, and its encoder is easy to be implemented by silicon LSI

Among newly developed Audio compression technologies, MP3PRO is the most shining star, because:

It is backward compatible with MP3 Its compression rate is the lowest based on the same audio quality like MP3 Its encoder is easier to be implemented by silicon LSI Thomson Media aggressively promotes it be new internet music standard

11

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

12

Audio Synthesis Technologies


s

Audio synthesis technology is actually an method of producing sounds where no acoustic sound is used

Among audio synthesis technologies, FM (Frequency Modulation) and Wavetable Synthesis are now mainstream Audio technologies

Audio synthesis technologies are now wildly applied to many applications like

Music Keyboard Cell phone sound generator Toys Melody accessories

13

Wavetable Synthesis Technology


u-Law Compression Sound Model Loop Envelope Control Pitch shift Interpolation

14

u-Law Compression
s

Converts linear 16-bit samples into 8-bit codes

log(1 + 255 | s |) s = sign( s ) log(1 + 255)


Assume all samples are fractional values between -1 and 1

log(1 + 255s ) s = log 256


256 1 s= 255
s

8-bit u-Law codes

16-bit linear samples


15

A Typical Waveform of Sound

16

Sound Model
s

ADSR Model

A (Attack), D (Decay), S (Sustain), R (Release) For non-percussive instruments (e.g. violin)

0 dB

D A
eduil p m t a not aunet a i t

S R

note on

note off

time

17

Sound Model
s

ADSR Model

For percussive instruments (e.g. piano, drum)

0 dB

D A
eduil p m t a not aunet a i t

S R

note on

note off

time

18

Loop

19

Envelope Control

20

Pitch shift
s

Use one or limited sound samples of notes to generate all notes you want to perform Access the stored sample memory at different rates during playback

Memory Pointer Pointer

Memory

Some particular pitch fs

Pitch shifted up by one octave 2fs

21

Interpolation

22

Wavetable System Implementation

Audio Out (L)

RAM

DAC

Audio Out (R)

MIDI IN

Micro Processor Program ROM

Wavetable Synthesizer Wavetable ROM

23

FM (Frequency modulation)
s

FM is actually a process of varying the frequency of a signal, often periodically;

24

FM (Frequency Modulation)
s

Fundamental principle of FM sound generator is to synthesizing tones by combining modulation signal and carrier signal.

Modulator

FM Modulation

Output Sound

Pyramidal Wave Created

Saw toothed Wave Created Paramete r

Oblong Wave Created

Carrier (Sine wave) Carrier Created

Paramete r

Paramete r
25

FM (Frequency Modulation)
s

A device producing carrier or modulator is called an operator

At least two operators are required to generate sound of a musical instrument.

For percussion instruments, at least 4 operators are required if expecting decent instrumental sound quality

26

Audio Synthesis Comparison


s

Theoretically, FM and Wavetable synthesis can achieve the same audio quality.

Technology Wavetable Synthesis

Advantages (1) Easy to be implemented (2) Quality consistent (1) Cost (1) Cost

Drawbacks

Frequency Modulation

(1) Not easy to be implemented (2) Quality is inconsistent

27

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

28

Speech Compression Technologies


s

In last decade, we have seen rapid progress in speech technologies. Present speech coders are tending to source-specific and hearing-specific for low rate consideration. Speech compression technologies are now wildly applied to many applications like

Digital Telecommucation devices (Cell phone, ISDN, DECT, SST, DAM, ) Digital voice recording accessories of Cell phone, PDA, DSC, ... Electronic Language learning solution Toys

29

Quality Measures
s

Rather from Audio compression technologies, there does exist an impersonal quality measure method called MOS (Mean Opinion Scoring)
MOS (Mean Opinion Score) 5 4 3 2 1 Impairment scale Imperceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying

30

Major Speech Coders

Type of coder PCM ADPCM GSM CELP LPC

Bit Rates in Kb/sec 64 32 13 4.8 2.4

MOS 4.3 4.1 3.8 3.3 2.6

31

Waveform Coding
s

PCM (Pulse Code Modulation)


Quantized Output

Analog Input

0111 0110 0101 0100 0011 0010 0001 0000

0001

0100

0110

0110

0100

0011

0101

0110

0111

0111

0111

0101

0010

0000
32

Waveform Coding
s

ADPCM (Adaptive Differential Pulse Code Modulation)

Analysis of speech waveforms shows a high sample-to-sample correlation. ADPCM (Adaptive differential Pulse Code Modulation) was developed to further reduce bit rate while preserving the overall speech quality.
Step size Calculation ss(n+1) Adjusted step size Z-1 ss(n) Step size

X(n) Linear Input Signal

+ -

d(n) difference

Encoder

L(n) ADPCM output sample

X(n-1) estimate of last input sample

Z-1

X(n)

Decoder
33

Source Coding
s

Speech is produced when air is forced from the lungs through the vocal cords and along the vocal tracts. Voiced sound are produced when the vocal cords vibrate open and closed like quasi-periodic pulses. Unvoiced sounds result when the excitation is a noise-like turbulence.

A Periodic Signal

B Variable Signal

C Output sound

34

Source Coding
s

LPC (Linear Predictive Coder)

h d wdna B t i

yc ne uqerf t na m oF r

Pn

Pulse generator Voiced/unvoiced control White noise generator

e duil p mA t
X

P3 P2 P1

X X X

Speech Signal

35

Hybrid Coding
s

Hybrid coding is an analysis-by-synthesis approach. The encoder analyzes the input speech by synthesizing many different approximations to it, then transmits information representing the synthesis filter parameters and the excitation to the decoder.
Input speech s(n)

Encoder

Excitation Generation

u(n)

Synthesis Filter

s(n)

e(n) Error Weighting

Error Minimization

ew(n)

Decoder

Excitation Generation

u(n)

Synthesis Filter

s(n)

Reproduced speech
36

Speech Compression Technologies Brief Summary


s

Typically waveform coding (like ADPCM) is used at high bit rates, and gives very good quality speech. Source coding (like LPC) operates at very low bit rates, but tend to produce speech which sounds synthetic. Hybrid coding (like CELP) uses techniques from both source and waveform coding, and gives good quality speech at intermediate bit rates.
MOS 5 4 3 2 1 1 2 4 8 16 32 64 (Kbps)
37

Hybrid Coding

Waveform coding

Source Coding

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

38

Speech product Offering - I ELL (Electronic Language Learning)


ELL System Block Diagram
I/O & Peripherals (Keyboard, battery, ...) SRAM (Data buffer, PIM) LCD Module (Displa y)

D/A (Voice output)

Flash ROM (Program, data)

MCU core (6502, Z80, 8051)

DSP (LRC, synthesizer) A/D (Voice input, Pen-input)

USB Memory Card PC

* Red block means the components or technologies that MXIC can provide.

39

MXIC ELL Product Features


THV - True Human Voice
s

What is True Human Voice?

Record Human voice

Compression in PC

Code stored in ROM

DSP decodes and playback

What can MXIC provide to THV solution?

MXIC has 1.2K/2.0Kbps LRC (Low-Rate Coder) with excellent speech quality. Over 50,000 THV words can be stored in 64Mb ROM based on 1.2Kbps LRC.

40

MXIC ELL Product Features


THV - True Human Voice
s

Why is it so important to have Sequential ROM interface in ED application? Because:

ED needs larger and larger Mask ROM density:


Content becomes larger and larger True Human Voice

MCU just needs 20 pins up to 4Gb Sequential ROM. It saves pin-count, which means to save die size Sequential ROM is the most cost-effective

MXIC Sequential ROM

Conventional ROM

41

Worldwide ED Market Size

Japan 44%

China 44%

Korea HK 2% 3%

Taiwan 7%

China Quantity (K sets) 4,000


Source: MXIC, 2001

Taiwan 600

HK 300

Korea 200

Japan 4,000

Total 9,100

42

Worldwide ED Market Size


14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 1999 2000 2001 2002 2003 Japan Korea HK Taiwan China

China Taiwan HK Korea Japan Total

1999 2000 2001 2002 2003 2,600 3,600 4,000 4,600 5,400 500 550 600 630 660 250 280 300 320 350 180 200 200 220 240 3,000 3,500 4,000 4,500 5,000 6,530 8,130 9,100 10,270 11,650

CAGR 20.05% 7.19% 8.78% 7.46% 13.62% 15.57%


43

Source: MXIC, 2001

ELL Product Road Map


DA D&P E o r fo r ss All-in-one ED Proce ch e (MCU + DSP + S-ROM I/F) & Spe CU M
MX93L551 DVR Processor with LRC ARM7TDMI embedded ED Controller 6502 embedded ED Controller LRC decoder

MX93L552 DVR Processor with VR

Z80 embedded 3-in-1 ED Controller


Q2/2001 Q3/2001 Q4/2001 Q1/2002

Q1/2001

Q2/2002

Q3/2002

Q4/2002

* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, VR stands for Voice Recognition 44

MXIC ELL Solution Advantage


s

We can provide THV (True Human Voice) solution! We can provide MCU ASSP with:

Effective Sequential ROM interface for program and data storage in ED with THV (True Human Voice) feature

We can provide Sequential ROM family (64Mb ~ 256Mb) for ED and E-Book

45

Speech Product Offering - II DVR (Digital Voice Recorder)


ELL System Block Diagram
LCD Display

Keypad

Micro controller

Digital Voice Recorder (DSP Engine Chip)

Speak er M I C

Flash

* Red block means the components or technologies that MXIC can provide.
46

DVR (Digital Voice Recorder)


s

Message management:

Playback, Fast Forward, Rewind Forward/backward Search within specific message Repeat

RW

FF

00:00

05:30

02:15

200ms

Repeat

05:10

BS

FS

47

DVR (Digital Voice Recorder)


s

PSA (Playback Speed Adjustment) can be ranged from 50% to 200%

50%

100%

200%

Fast Playback Normal Playback Slow Playback

48

MXIC DVR Solution Advantage


s

We can provide switchable speech compression rate (4.8K/12.8K/32Kbps) for different speech recording systems We can provide flexible speech manipulations like:

Folder management Playback, pause, FF, RW, Repeat, Forward/backward search, append, PSA (Playback Speed Adjustment)

We can provide Total System Solution (MCU, DSP, Flash)

49

Speech Product Offering - III DAM (Digital Answering Machine)


ELL System Block Diagram
Telephone Line All-in-one DAM Controller
Microphone

MCU (System control code)

Speaker

AFlash (Voice Prompt)


Key pad

Disp lay

* Red block means the components or technologies that MXIC can provide.
50

DAM (Digital Answering Machine)


s

Key successful factor is to have an excellent speech CODEC


Switchable compression rate: 4.8K/12.8K/32Kbps MRC (Multi-Rate Coder): 3.6Kbps ~ 14.2Kbps

Full-duplex speakerphone is highlighted in this application Also, Telecom signal processing (tone generation/detection) is also included
SPK Driver MIC Gain Line Gain Line Driver

Speaker Microphone ACOUSTIC COUPLING

PCM Codec-1

DAM Engine Chip

PCM Codec-2

4-2 wire coupling LINE COUPLING

51

Worldwide DAM Size

Japan 19%

Others 5%

Europe 16%

North America 60%


unit: M sets

Q'ty (M sets) Source: MXIC, 2001

North America 22

Europe 6

Japan 7

Others 2

Total 37

52

Worldwide DAM Size


45,000 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 1999 2000 2001 2002 2003
unit: K sets

Others China Europe Japan US

US Japan Europe China Others Total

1999 20,500 6,800 6,200 180 1,800 35,480

2000 21,500 7,000 6,000 200 2,000 36,700

2001 22,000 7,000 6,000 200 2,000 37,200

2002 22,500 7,100 5,800 600 2,100 38,100

2003 CAGR 23,000 2.92% 7,200 1.44% 5,800 -1.65% 1,000 53.53% 2,200 5.14% 39,200 2.52%
53

Source: MXIC, 2001

DAM Product Road Map

DAM processor n olutio embedded 1Mb MTP S DAMMX93L132A MX93132 3V MRC DAM 5V DAM w/ MX93L108 w/ CID/SPK CID/SPK Entry level DAM Processor MX93L111A MX93111 5V DAM 3V MRC DAM

DAM embedded 4Mb Flash

Q1/2002

Q2/2002

Q3/2002

Q4/2002

Q1/2003

Q2/2003

Q3/2003

Q4/2003

* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * MRC stands for Multi-Rate Coder, CID stands for Caller ID, and SPK stands for Speaker phone 54

MXIC DAM Solution Advantages


s

MXIC has different kinds of solutions in each DAM market segment MXIC is the leader in mid-range segment, and Top 2 DAM IC Vendor in the World MXIC provides one-stop shopping service (DSP, MCU, AFlash) in DAM application

High-end MRC (Multi-Rate Coder) + 8/16Mb AFlash 12.8K/32Kbps + 64/128Mb SDRAM MXIC

Low-end
55

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

56

Audio Product Offering - I AIRTM (Audio IC Recorder)


Audio Devices Audio input

Audio Devices

Host controller

Audio Encoder/Decoder Processor

16-bit Audio Codec


Power Amplifier

Speaker

Memory (MMC, CF, SD, Memory Stick, )

Audio ROM

Flash
Headphone

* Red block means the components or technologies that MXIC can provide.
57

AIRTM (Audio IC Recorder)


s

AIRTM, A brand new Audio product concept!

Built-in S/PDIF, Audio data can be directly saved into the MP3 Player via its MP3 real-time encoding. Say Good-bye to the sophisticated PC download method!

CD

Compression

Download

Audio Devices

S/PDIF

58

AIRTM (Audio IC Recorder)


s

Mini Component System and Portable Audio:


Upgrade Conventional Models to Fully-Digital Audio (MP3) Alignment with Young Generations Portable MP3 Players!

MX92L600 Audio IC Recorder

Mini Component System

Portable Audio

Portable MP3 Players

Cassette

Memory Cards

59

Audio Product Offering - II Wavetable Sound Generator


s

MIDI for Sound Generator: Sound Generator ASSP SRAM Audio DAC

MIDI IN

Micro Processor

Wavetable Synthesizer

Program ROM

Wavetable ROM

* Red block means the components or technologies that MXIC can provide.
60

Digital Audio Product Road Map

ion Solut er Audio ROM derivatives ecord er & R ay AC P l MP3/A


MX92L600 MP3 Codec Promotional Singles (8MB embedded)

MX92L500 MP3 decoder

Q1/2001

Q2/2001

Q3/2001

Q4/2001

Q1/2002

Q2/2002

Q3/2002

Q4/2002

* Rectangle means existing products, and circle means under developing products * Left edge of circles is the project starting schedule, and the right edge of circles is the commercial sample schedule. * DVR stands for Digital Voice Recorder, LRC stands for Low-Rate Coder 61

MXIC Digital Audio Advantages


s

Professional MIDI technology (with General MIDI V1.0 Sound set, 32 Polyphony and 32 Multi-timbre) provides supreme sound generator solution for Mobile phones, PDA, ED, and Toys applications. Complete solution for MP3 player and recorder In-house Sequential ROM, Flash and Memory Card support

62

Contents
s

Audio Compression Technology Overview Audio Synthesis Technology Overview Speech Compression Overview MXIC Solution to Digital Audio & Speech Applications

Speech Product Offering Digital Audio Product Offering

Summary

63

Summary
s

Among Audio Compression technologies, MP3 is the most mature one, while MP3PRO is deemed to be a future start. FM and wavetable synthesis are mainstream Audio synthesis technologies, and wavetable synthesis seems superior pratically. Different speech technologies are for different applications. Among all, Hybrid coding is superior reinforced by DSP technology. MXIC focus on Audio & speech technologies, and several products related to Audio & Speech were presented.

64

Audio Compression & Synthesis Technology Overview

Moving Toward IA, Moving with Us!


65