Вы находитесь на странице: 1из 6

Real-Time Voice Streaming over IEEE 802.15.

4
Tullio Facchinetti
University of Pavia 27100 - Pavia, Italy Email: tullio.facchinetti@unipv.it

Marco Ghibaudi
Scuola Superiore S. Anna 56010 - S.Giuliano Terme (PI), Italy Email: m.ghibaudi@sssup.it

Emanuele Goldoni and Alberto Savioli


University of Pavia 27100 - Pavia, Italy Email: emanuele.goldoni@unipv.it alberto.savioli@unipv.it

AbstractAudio and video applications over wireless sensor networks have recently emerged as a promising research eld. However, the limits in terms of communication bandwidth and transmission power have withstood the design of low-power embedded nodes for voice communication. In this work we describe the implementation details of an embedded system for the wireless broadcasting of audio signals over the low datarate IEEE 802.15.4 standard, which is widely adopted to build Wireless Personal and Sensor Networks. The resulting device has been developed from scratch by combining several techniques with the goal of obtaining the most suitable implementation on a low-cost and low-power 16-bit microcontroller. We used a realtime operating system, a well-known psychoacoustic model based on FFT signal decomposition and the Haar wavelet transform to create a novel audio compression algorithm targeted to embedded systems with limited computational capabilities. The result is a fully-functional embedded system which is able to stream voice in real-time over IEEE 802.15.4 with an acceptable audio quality.

I. I NTRODUCTION The need of processing units with high computing power at low prices is becoming an important factor for the development of many applications requiring embedded computing. Bio-medicine, environmental monitoring and control, signal processing, domotic, factory automation and telecommunications can take great advantages by the introduction of new processors and components put as close as possible to the process to be monitored. Nonetheless, the use of small processors having limited processing power and memory size is mandatory for limiting the total cost of the devices, in particular when large scale production is planned. Therefore, the analysis of the performance of algorithms implemented on low-power processors and micro-controllers represents a relevant research activity. Similarly, data exchange between sensing elements is a key element for the design of efcient distributed monitoring system. Reducing energy consumption of communication devices and protocols is fundamental to increase the lifetime of battery-powered sensing nodes. The IEEE 802.15.4 wireless technology [1] has recently emerged as the option that best ts the needs of cost-effectiveness, exibility, interoperability and low power consumption typically required by pervasive embedded applications. This standard has been specically designed for low data-rate ubiquitous communication between low-cost and simple devices in Wireless Sensor Networks (WSNs). WSNs technology is nowadays in a wide rage of applications including industrial control, environmental monitoring and home automation. Multimedia content distribution

in WSNs is an additional promising research area, since the possibility of sending audio and video streams over a dense network of small nodes would encourage the development of new applications, among which emergency services and surveillance systems. On the other hand, the limits in terms of communication bandwidth, computational power and energy budget severely affect the actual streaming capabilities of WSNs nodes [2]. In this work we describe the design and implementations details of an embedded system for point-to-point wireless broadcasting of audio signals over IEEE 802.15.4. The device has been designed from scratch by combining new and existing ideas, with the goal of obtaining a satisfactory implementation on a low-cost 16-bit microcontroller. Since voice streams must be managed in a deterministic way and transmitted within a bounded amount of time, we adopted an embedded real-time operating system. Finally, we combined a technique based on wavelet analysis and a well-known psychoacoustic model to create a novel audio compression algorithm specically targeted to embedded systems with limited computational power. Experimental results conrm that it is possible fulll the delay constraints of a real-time multimedia streaming service while still providing an acceptable audio quality over IEEE 802.15.4. The rest of the paper is organized as follows: in Section II we review related works, while in Section III we briey describe the development platform used in this project. The details of the real-time implementation and the compression algorithm adopted in the streaming system are provided in Section IV. Section V presents and discusses preliminary performance results, and conclusions follow in Section VI. II. R ELATED WORK Real-time audio and video streaming services over wireless networks have gained much attention in recent years. Constraints on end-to-end delays and data losses arise challenging requirements on the communication, since the wireless channel can be affected by a signicant amount of noise resulting in large losses and noticeable delays. A pioneering work by Abdelzaher et al. [11] dened a quantitative notion of realtime capacity of a wireless network in term of amount of data that the network can transfer by a certain deadlines. A capacity bound derived from the analysis could be used as a sufcient schedulability condition for packet scheduling algorithms and capacity planning.

978-1-4244-7755-5/10/$26.00 2010 IEEE

985

The impact of real-time constraints on multimedia applications is more evident in presence of strict constraints on computing resources, memory, bandwidth and energy, which is the typical scenario for WSNs. In particular, when the IEEE 802.15.4 communication standard is used, the problem is made even more difcult since, differently from other standard protocols such as Bluetooth, the specications do not explicitly cover voice or video streaming applications. A rst study by Deshpande [10] has investigated the performance and feasibility of an IEEE 802.15.4 wireless network for low bitrate audio and video streaming. The author focused on packet loss and latency in order to nd a suitable operating rate range. However all the performance evaluation has been carried out only by means of simulations. A rst notable deployment of a WSN for voice steaming is presented in[12]. The authors created a network of 42 FireFly embedded sensor nodes scattered in a coal mine. They were able to meet audio timeliness requirements of the voice applications by tightly coupling transmissions scheduling with a hardware-based global time synchronization. More recently, Wang et al. in [13] investigated the performance of a pure ZigBee network for voice transmissions using the network simulator NS2. On the other hand, the realworld capability of the IEEE 802.15.4 standard have been measured and analyzed in [9], [6] using commercial off-theshelf hardware. Simulations and experimental results shown in these works have demonstrated that the achievable throughput is not enough for high quality audio streams, but it sufces for a limited number of concurrent most common voice streaming applications. Finally, two recent works have focused on efcient algorithms based on perceptual packet protection [7] or multistage DWT waveform coding [8] for reducing network overhead. III. P LATFORM OVERVIEW The embedded platform used to develop the streaming system is the Evidence Flex Board. This board is equipped with the Microchip dsPIC DSC 33FJ256MC710 MicroController Unit (MCU) [3], which is a 16-bit processor featuring a core speed of 40 MIPS, 32 KBytes of RAM, 2 KBytes of DMA buffer RAM and a program memory of 256 KBytes. The processor has high data-rate sampling capabilities provided by 16 Analog-to-Digitals 12-bit channels working up to 500.000 samples/s. The wireless communication between nodes relies on FlexiPanels EasyBee [4] radio devices. The EasyBee module is an IEEE 802.15.4 compliant 2.4GHz RF transceiver with an impedance matched balun and integral antenna. The device does not require a ZigBee networking layer to implement point-to-point communication. Typical MAC and physical functions like CRC-16 generation, clear channel assessment, and signal energy detection are provided by the integrated radio controller. The radio module is directly connected to the Flex boards MCU through a 3-wire SPI port, and it can establish a communication at a raw speed up to 250 Kbps.

TX-Board
analog signal ADC

RX-Board
DAC

analog signal

MC

MC
1011010... RS-232

sampling

processing - compression - coding

TX

RX

processing - decoding - decompression

output

Fig. 1.

The communication system architecture

The audio signal source can be either a microphone or the output of a computers soundcard. While the former source has a peak-to-peak voltage spanning in the range of [3, 3]V, the latter ranges in the [5, 5]mV interval. Since the ADC integrated in the Flex-Board accepts an input in the range [0.15, 3.3]V, we designed a dedicated conditioning circuit to best t the ADC input range. The implemented application runs on top of the ERIKA Enterprise (Embedded Real tIme Kernel Architecture) realtime kernel [5], a modular open-source operating system. The ERIKA kernel supports several real-time scheduling algorithms, such as Rate Monotonic (RM) and Earliest Deadline First (EDF) [21], and features an extremely low memory footprint. IV. S YSTEM DESCRIPTION The proposed communication system consists of two elements: a transceiver and a receiver. The former, called TX-board, continuously acquires an analog audio signal and generates its compressed digital representation. The compressed data are sent to the receiver board (RX-board) over the wireless medium, where the analog signal is reconstructed and sent to an audio transducer. A schematic representation of our experimental voice streaming system is shown in Figure 1. A. Software architecture The communication system is based on two devices with different functionality the transmitter and the receiver. The former one is responsible for acquiring, compressing and transmitting the voice stream, while the receiver must decompress the stream and nally store or reproduce the signal. Therefore, software tasks implemented on the two systems signicantly differ. The TX-board continuously acquires an analog audio signal through the Analog-To-Digital Converter (ADC) and generates a compressed digital representation of the signal using a combination of psychoacoustic analysis and wavelet decomposition. The Acquisition task uses the Direct Memory Access (DMA) feature to acquire data in background with reduced computing overhead. When the input buffer (512 elements) is lled, the received signal is passed to the Compression task for data analysis and compression. The implemented algorithms are described later. When all the required compression steps have been performed, the resulting

978-1-4244-7755-5/10/$26.00 2010 IEEE

986

Signal

Acquisition Task

Compression Task

Packet Header

Transmission Task

Packet

Fig. 2. The structure of the real-time application running on the TX-board


Packet Header Signal

Reception Task

Decompression Task

Packet

Storaging Task

Fig. 3. The structure of the real-time application running on the RX-board

codec. This work is based on a suitably modied MPEG-1 algorithm, where adjustments have been made to cope with the limited available resources on the selected computing platform. The MPEG Model-I was used for the tonal analysis; the basic steps of this technique are described below while a detailed description of the algorithm is given in [20]. The initial step suggested by MPEG-1 provides a normalized spectral analysis of the signal. The output of this step is a high-resolution spectral estimation of the input, characterized by the representation of the spectral components in terms of its Sound Pressure Level (SPL) [19]. The SPL normalization provides power values adjusted according to the common working range of the human ear. After this normalization, the Power Spectrum Density (PSD) is calculated by means of a Fast Fourier Transform (FFT) over N = 512 consecutive samples of the input buffer x. The resulting PSD P (k) from the 512-point FFT is:
N 1 2

data stream is passed to the Packaging task. This element properly encodes the stream with a Reed Solomon code to increase communication robustness. The nal packet is built by combining the encoded data stream together with a suitable header. Finally, after a xed delay from the acquisition, the transmission takes place and the packet is sent to the 802.15.4 transceiver. Figure 2 summarizes the structure of the realtime application running on the TX-board, including shared resources (variables). According with the data path, the rst processing step occurring on the RX-board is the acquisition of the raw packet and its storage in a buffer. At this stage, the Receiver task concludes its duties and the Decompression task is started. The sequence number and the size of the packet are extracted from the header and used to decide whether to discard or decode the payload. Then, the audio signal is re-constructed applying the appropriate decompression algorithm and the output data are stored. The output data can be nally fed to a pre-amplier and reproduced through a speaker. Optionally, the re-constructed signal can be sent by the RX-board to a personal computer through a digital line, where it can be stored and analyzed. For example, this latter scenario can adopted for investigating the performance of the nal system and for tuning the behavior of the communication infrastructure. The structure of the real-time application implemented on the receiver board is shown in Figure 3. B. Data compression Due to the limited bandwidth of the IEEE 802.15.4 standard, compression was crucial to transmit the audio signal over the wireless channel. We focused our attention over the wellknown Perceptual Model [18], whereby human ears perceive various frequencies in different ways. Several compression codecs developed in past years have used such a model to reduce the size of the signal. Examples of wide-spread algorithms based on perceptual models are MPEG-1 [14] ModelI or Model-II, AAC [16] and the public-domain Vorbis [15]

P (k) = PN + log10
n=0

xn w(n)ej

2n N

+ log10

1 N

where the Power Normalization (PN) term is xed at 96dB, while w for the n-th sample is dened as w(n) = 2n 1 1 cos( ) 2 N

1 and the log10 N term is used to normalize the signal, thus avoiding the division of each the components by the factor N . With the PSD values normalized according the PSD reference, tonal and non tonal masking components can be identied. The procedure used to underline the tonal components highlights the local maxima that exceed neighboring components by at least 7 dB. Only the tonal characterized by a sufcient bark-distance are considered as the actual tonal components, while the remaining ones are discarded. The set ST that describes the tonal components can be dened as follows:

ST = P (k) : {P (k) > P (k 1), P (k) > P (k k ) + 7} where k is a coefcient that depends from the non-linear behavior of the human ear over the frequencies spectrum. From tonal components, the tonal maskers PT M (k) can be computed using the following:
1

PT M (k) = 10 log10
j=1

100.1P (k+j) (dB)

The left and right components are added to the power of each local maximum, generating a signicant value only from three ones. The generated value is stored in the original position occupied by the tone, meanwhile the spectral components that are not used in the previous operations represent the non-tonal components. The amount of energy that a tone must have to be detectable by the human ear is associated with the fundamental frequency of the tone itself. In fact, the behavior of the ear is highly variable as a function of the tones frequency. In

978-1-4244-7755-5/10/$26.00 2010 IEEE

987

Not null coeff. s(t)

PKT

FFT

s(f) Psycoacoustic s'(f)

Filtering

IFFT

s'(t)

Haar DWT
Null coeff.

Fig. 4.

Diagram of the implemented encoding schema

oating point unit. This lead to unacceptable computation time for oating point operations. Therefore, we developed from scratch a dedicated version of the compression model based on integer arithmetic only, in order to make possible the execution with acceptable latency. The details of our xedpoint arithmetic version of the Wavelet compression algorithm is outside the scope of this paper. C. Real-Time scheduling The resulting system provides acceptable performance once some specic timing constraints are satised. In particular, the set of tasks must be schedulable by the adopted scheduling algorithm, i.e., timing requirements must be satised for each computing task. The rst constraint is related with the total execution time of tasks executed on the TX-board. The amount of time that can be exploited by the task set is bounded by the length of the sampling period T . Given the desired system audio sampling frequency fs , the period T can be calculated as T = dim/fs where dim is the number of fresh samples to be collected and elaborated within a sampling period according, which is set equal to 512 in the proposed compression algorithm. A second constraint is associated with the timing of packet reception. Every packet arriving after a specied deadline tarrival is discarded to avoid unneeded sampling-storage overhead. In other words, the packets are kept if the following condition holds: tarrival T telab(RX) where telab(RX) is the maximum amount of time required on RX-board to perform all the associated tasks. The latest constraint is associated with the time required by the RX-board to transfer the data to the external device where they are stored. This condition can be stated as: tRXBoardExternalDevice T telab(RX) tarrival When the above 3 conditions are simultaneously satised the system is guaranteed to produce correct results. Otherwise, the generated signal may contain noise, clicks, whistles or, in the worst scenario, meaningless information. We determined the average Tavg and worst-case Tmax execution time for all the implemented software tasks, using 20 different real-world audio streams. Average results obtained for the RX-board and TX-board are shown in tables I and II, respectively. An analysis of the tasks running on the TX-board shows that timings obtained by the Acquisition, Packaging and Transmission tasks have an almost constant execution time, thus independent from the nature of the audio signal under processing. On the contrary, the Compression tasks computation time is highly conditioned by the signal behaviour, and requires an amount of time proportional to the number of tones computed by the psychoacoustic model. Therefore, when

literature, this propriety of the human auditory system is referred as Absolute Threshold of Hearing (ATH) [17]. During the maskers calculation performed before, the ATH has not been considered and thus the calculated components could be inaudible and irrelevant. Therefore, a comparison between the calculated values and the absolute threshold of hearing must be performed. In particular only the maskers that respect the following condition must be kept: PT M (k) AT H(k) Once the inaudible tones are removed, the ltered signal is reconstructed using the inverse transformation IFFT. The next step is the Wavelet transform, which converts the elements of the audible signal into up to 512 wavelet coefcients. The Haar transform [23] was selected for the implementation. This transform is characterized by a very simple formulation and the complexity associated with the implementation of the basic transform is O(n), thus especially suitable for being used in resource constrained systems. A full description of the Haar transform lies beyond the scope of this paper more information can be found in [23] An initial threshold is applied, the subbands wavelet coefcients with Signal-to-Mask Ratio (SMR) beneath an adjustable value are lled with zeros and the maximum and minimum wavelet coefcients are calculated. Then, the coefcient values are ordered and the non-null subbands wavelet coefcients are encoded with a varying number of bit depending on the SMR value. Finally, the useful wavelet coefcients and the subbands normalization coefcients are packed in a frame and the compressed data stream is put in the output queue for being transmitted. A block diagram of the encoding process is shown in Figure 4. When a packet has been correctly received by the RX-board, the Decompression software task starts. The PacketNum value is extracted to check the delivery ordering correctness. In addition, a dedicated eld in the frames header is used to check if the compressed signal presents signicant components in at least one subband and to ll with zeros the wavelet coefcients associated with void bands. If the decomposition is meaningless, i.e., no subbands have signicant value, the packet is discarded. Otherwise, subband values are de-normalized and associated with the corresponding subband using the minimum and maximum values extracted from the frame. Then, the signal is recomposed using inverse IDWT transform and the result is nally stored and played. It is worth noticing that one of the strongest limitations faced during the implementation of the compression algorithm on the available hardware was the absence of a dedicated

978-1-4244-7755-5/10/$26.00 2010 IEEE

988

TABLE I T RANSMITTER S TASKS TIMINGS ( IN Task Acquisition Compression Packaging Transmission Total Tavg 0.35 24.98 2.78 6.4 -

MILLISECONDS )

Tmax 0.40 32.91 3.35 7.2 -

D 0.5 35 4.0 7.5 47.0 Fig. 5. Coding of a speech audio sample (time domain)

TABLE II R ECEIVER S TASKS TIMINGS ( IN MILLISECONDS ) Task Reception Decompression Storage Total Tavg 4.65 9.40 3.73 Tmax 5.18 12.18 4.16 D 6.0 14.0 5.0 25.0

Fig. 6.

Coding of a speech audio sample (frequency domain)

the psychoacoustic model generates the maximum number of tones, the largest execution time occurs. On the RX-board, the Reception task runs in almost constant time. However, the Decompression tasks duration depends upon the number of signicant subbands in the signal. Therefore, such a duration is related with the length of the received packet, which varies in the range [6, . . . , 93] bytes. The data transfer from the RX-board to an external storage system uses the serial communication line, and requires the transmission of the full buffer. These processes affect the Storage tasks execution time, which is actually characterized by a not negligible variance. The developed system uses a Fixed Priority (FP) scheduling scheme [21]. The actual deadline of each task was set in order to satisfy the above described system constraints, and adopting a value greater or equal than the longest interval as the actual tasks deadline. The nal tasks deadlines D for the resulting real-time system are shown in Tables I and II. The total execution time of tasks running by the TX-board determines a bound on the sampling rate. Since the system is able to acquire, encode and transmit 512 samples in 47ms, the maximum sampling rate for the audio stream is about 11KHz. According to the Shannon theorem, sampling must be performed with a frequency which must be at least twice than the maximum frequency of the signal. As a result, this can guarantee a perfect reconstruction of signals in the frequency range up to 5.5KHz. V. E XPERIMENTAL EVALUATION During our tests we mainly considered audio streams generated by the human voice. The goal of the experimental evaluation was to compared the original audio with the reconstructed one. Although the proposed system was originally designed for speech processing only, we also considered more complex signals (an audio clip of a song is commented later in this section) in order to investigate the limits of our approach.

A. Speech processing During this set of tests we considered several different sources of human voice, and we veried that our system was performing rather similarly in all tests. We thus report the results carried out on a sample of a female voice stream, having length of about 2 seconds, taken from [22]. Focusing on the time domain, it can be noticed that any audio associated with a pure human speech is characterized by signicant components with high amplitude followed by segments with a relatively null one. For this reason, we carried out the comparison of two human speech-only audio signal over the not-at component of the signal. In Figure 5, the proposed compression model is compared against the original audio clip. Our approach seems to fails in the sub-part of the audio spanning from around 50 up to 300 ms. On the other hand, good results can be brought back by analyzing the remaining part of the audio signal, which is characterized by an high level of similarity with the unprocessed signal. In Figure 6 we analyze the same signals under the frequency domain. The plot represents the normalized 4096-points Fast Fourier Transform of the two signals in the frequency range [0 22, 050 Hz]. The audio generated using our technique presents high similarity with the original signal in the lower frequency range, up to 5.5 Khz, while the difference increases for medium and high frequencies. Considering the overall performance, the proposed system provides an efcient signal compression. Non negligible errors affecting the rst part of the stream have been observed independently from the considered audio source. Although we have not yet clearly identied the motivation of this undesired behaviour, it is likely to be due to a software incorrectness of the reconstruction algorithm implementation. B. General audio processing This set of tests involved short segments of audio tracks obtained from common songs. In this case, again, our tech-

978-1-4244-7755-5/10/$26.00 2010 IEEE

989

Fig. 7.

Coding of a song audio clip (time domain)

strongly affected by the low processing power of the adopted hardware platform. With more powerful hardware, the proposed system could easily manage the broadcast of the wireless signal with reasonable audio quality for multimedia applications. Although the results obtained from our prototype implementation are quite promising, much future work remains. The reasons for the error affecting the rst part of the stream are currently being investigated. More tests should be carried out in order to stress to investigate the hardware and software limits of the system. An extensive comparison between the proposed solution and existing platforms is needed too. Finally, we plan to complete the analysis considering the energy efciency of our streaming system. R EFERENCES

Fig. 8.

Coding of a song audio clip (frequency domain)

nique behaved similarly over all the tested samples. Therefore, we provide one example above all. The time domain representation of the considered signal is shown in Figure 7. The outcome of our model is characterized by an high level of similarity with respect to the original signal, but signicant differences are present when fast transitions occur. The relatively low audio quality generated by our system is further highlighted in Figure 8, where the spectrum of the two signals is presented. Likewise the encoding of speech clips, these latter tests have shown the limits of our approach in presence of signals with signicant components at high frequencies. However, the lack of accuracy in the upper part of the spectrum is mainly due to the limited sampling frequency of 11.500 Hz against the optimal value of 44.100 Hz. This limitation could be overcome by adopting a more powerful processing unit, letting the Acquisition task to be scheduled at higher frequency, thus improving the overall audio quality. VI. C ONCLUSIONS In this work we described the design and implementation details of an embedded platform for the wireless broadcasting of audio signals over IEEE 802.15.4. The resulting system has been designed from scratch by combining a robust psychoacoustic model, together with an analysis and synthesis technique based on wavelets to design a novel audio compression algorithm specically targeted to embedded systems with limited processing capabilities. We focused the application design on the careful evaluation of timing constraints of tasks running on both the receiving and transmitting devices. A dedicated real-time operating system has been adopted to achieve the required timing requirements. Experimental results show that it is possible to adjust the parameters of the implemented algorithms to achieve the real-time constraints while providing a correct audio streaming with adequate quality. The developed system exhibits an audio quality which is

[1] Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specications for Low-Rate Wireless Personal Area Network (WPANs), IEEE Std 802.15.4-2006, pp. 1305, 2006. [2] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury, A survey on wireless multimedia sensor networks, in Computer Networks, vol. 51, no. 4, pp. 921.960, March 2007. [3] Microchip. dsPIC30f reference manual. Microchip Technology Inc., 2004. [4] RFSolutions. http://www.rfsolutions.co.uk. [5] Evidence ERIKA RTOS. http://www.evidence.eu.com. [6] D. Brunelli and L. Teodorani, Improving audio streaming over multi-hop ZigBee networks, in Proc. IEEE ISCC 2008, pp. 3136, Jul. 2008. [7] M. Petracca, G. Litovsky, A. Rinotti, M. Tacca, J. C. De Martin, and A. Fumagalli, Perceptual based Voice Multi-Hop Transmission over Wireless Sensor Networks, in Proc. IEEE ISCC 2009, pp. 19-24, Jun. 2009. [8] Z. Chen, S. G. Kang, E. C. Choi, and J. D. Huh, Multistage waveform coding for voice communication over Zigbee networks. in IEEE ICCE 2009, pp. 12, Jan. 2009 [9] D. Brunelli, M. Maggioretti, L. Benini and F.L. Bellifemine, Analysis of audio streaming capability of zigbee networks, in Proc. EWSN 2008, pp. 189204, Jul. 2008. [10] S. Deshpande, Adaptive low-bitrate streaming over IEEE 802.15.4 low rate wireless personal area networks (lr-wpan) based on link quality indication, in Proc. IWCMC 2006, pp. 863868, 2006. [11] T. R. Abdelzaher, S. Prabh and R. Kiran, On real-time capacity limits of multihop wireless sensor networks, in Proc. IEEE RTSS 2004, pp. 359370, Dec. 2004. [12] R. Mangharam, A. Roew, R. Rajkumar and R. Suzuki, Voice over sensor networks, in Proc. IEEE RTSS 2006, pp. 291302, Dec. 2006. [13] C. Wang, K. Sohraby, R. Jana, L. Ji and M. Daneshmand, Voice communications over zigbee networks, in IEEE Communications Magazine, Vol. 46, Iss. 1, pp. 121127, January 2008. [14] ISO/IEC, Coding of moving pictures and associated audio for digital storage media at up to about 1.5Mbit/s - Part 3: Audio, IS11172-3, 1992. [15] Vorbis Project, Vorbis Specication, http://www.vorbis.com. [16] ISO/IEC, Generic coding of moving pictures and associated audio information Part 7: Advanced Audio Coding (AAC), 13818-7, 2006. [17] H. Fletcher, Auditory Patterns, in Reviews of Modern Physics, Vol. 12, Iss. 1, pp. 4765, January 1949. [18] J. Zwislocki, Analysis of some auditory characteristics, in Handbook of Mathematical Psycology, Eds. New York: Wiley, 1965. [19] E. Zwicker and H. Fastl, Psychoacoustic: Facts and Modeling, Springer-Verlag, 1990. [20] T. Painter, and A. Spanias, A Review of Algorithms for Perceptual Coding of Digital Audio Signals, in Proc. DSP-97, pp. 179-208, Jul. 1997. [21] G. C. Buttazzo, Hard Real-time Computing Systems: Predictable Scheduling Algorithms amd Applications. Springer, 2005. [22] T. Spanias, Speech Coding Demonstrations, http://www.eas.asu.edu/ speech/cod demo, 1998. [23] C. K. Chui, An introduction to wavelets. Academic Press Professional, 1992.

978-1-4244-7755-5/10/$26.00 2010 IEEE

990

Вам также может понравиться