Iccce 06271174 Shervin

International Conference on Computer and Communication Engineering (ICCCE 2012), 3-5 July 2012, Kuala Lumpur, Malaysia
Voice Quality in Speech Watermarking Using Spread Spectrum Technique

Shervin Shokri, Mahamod Ismail and Nasharuddin Zainal
Dept. of Electrical, Electronics and System Engineering Universiti Kebangsaan Malaysia 43600 UKM Bangi, Selangor Darul Ehsan, MALAYSIA {shshokri, mahamod, nash }@eng.ukm.my
Abstract One of the important problems in communication is sending specific data privately. In addition to cryptography which was used many years ago, methods such as data hiding have been introduced in recent years. In data hiding, specific data is embedded in another signal, which is known as a host signal that can be in the form of video, image and audio. Because of narrow bandwidth limitation, speech signal is seldom used, despite its popularity in communication application such as military, bank phone and network security. A data hiding algorithm in speech signal is called speech watermarking. Watermarking will be more robust against attacks (i.e. watermark removal and impairment attacks) with uses the Spread Spectrum technique. Speech watermarking is implemented using simulation and then been evaluated based on the quality of voice (speech) by the Mean Opinion Score (MOS) from 20 respondents. The average MOS is 2.75 out of 5 showings between fair and poor perceptual audibility quality. Keywords-component; spread spectrum, liner predictive (LP), MOS, BCH-code, speech watermarking
as a secret key to protect the authorized users data in both system sides [1]. The paper is organized as follows, Section II describes the system overview of speech watermarking algorithms, and explained about parts of the system such as error control coding, data spreading, data embedding, channel, watermarking decoder, whitening, synchronization, Data despreading , decoding and MOS technique. Section III results and discussion, presents some information about the data and speech signal which used in system input and the results have been achieved by simulation the signal in each part of the system. Finally, Section VI presented conclusion and future steps. This paper mainly focuses on the voice quality of output speech in the receiver by MOS technique. II.
SYSTEM DESCRIPTION
I.
INTRODUCTION
In data hiding, private data is embedded in the host signal that can be in the form of video, image and audio signals. Since the audio speech is a narrow bandwidth signal, it is seldom been used despite its popularity in communication application such as military, phone banking and network security. Speech watermarking is a technique to embedding information such as number or text into speech signal virtually imperceptible with using perceptual methods. This method has recently been considered in speech watermarking. For example, in air/ground voice radio communication, between a pilot and aircraft control could be embedded in speech signal a series of confidential information such as flight number or in phone banking systems to transmit some kind of authentication information, this technique could be used without being perceptible to the user. The system algorithm is based on the spread-spectrum which can generate a narrow-band signal (message signal which must be transmitted) to broaden the bandwidth of the speech signal with modulating a broadband carrier signal. The watermark information can spread over a larger set of samples with using the parameter chip-rate. The watermark which was generated is using a Pseudo Noise (PN) sequence to spreading spectrum in the frequency domain. The PN sequence is used
A. System Overview The system combines Direct-Sequence Spread Spectrum (DSSS) technology with a simple basic frequency masking approach. The spread spectrum approach was chosen because many studies have shown its robustness against channel noise .The block diagram for the transmitter side (encoder) is as shown in Fig. 1. The systems can be divided into three major sections. First, the encoder which is used for two reasons, increase the payload data (message) and error control coding (channel coding). Channel code is used to protect digital data (watermarked signal) which was sent in the channel even with the presence of noise (make the errors). Next level is spreading the watermark signal over the available frequency band with the PN sequence. At the end, the watermark is embedded in the speech signal whit using perceptual methods. With using a LPC filter, the watermark spectrum is shaped and will ready to embed in digitized speech signal. The digital signal will be transferred to channel after converting to analog form. At the receiver side (decoder), the process to extract the watermark data from the speech signal will be carried out (Fig. 2). First, a whitening filter is used for undo the spectral masking and then the signal has to be synchronized to perform the watermark extraction. The error correction block is used to correct the errors which have happened due to the channel conditions [2].
978-1-4673-0479-5/12/$31.00 2012 IEEE
169
Figure 1. Watermark encoder block diagram
Figure 2. Watermark decoder block diagram
B.
Erorr Control Coding To error-control coding, the payload data have been encoded with BCH-Code. BCH-Codes are one kind of the cyclic linear block codes. They permitted to code the large selection of block data, alphabet sizes, code rates, and error correction capability. [3]. BCH-code is checking the errors number, in the receiver side, if they are putting in the ability range, will be made correct, otherwise it will be only detected. In Table I showed the databits, code-bits, parity bits and the correction capability for three different sizes of payload data.
TABLE I
BCH Code for 12 -24 - 36 Bits Data Words
Data-bits k 12 bits 24 bits 36 bits
Code-bits n 62 bits 63 bits 63 bits
Parity-bits (n-k) 50 bits 39 bits 27 bits
Correction capability t 14 bits 7 bits 5 bits
170
Data Spreading Direct-sequence spread spectrum (DSSS) is one of the Passband modulations. In DSSS the data (b[]) will be broadened with multiplied in a pseudo-noise binary sequence (c[]) over the existing bandwidth.
][][ = ][ (1)
C.
E. Channel
The Channel is simulated as a simple model of VHF radio communication, with combined an Additive White Gaussian Noise channel (AWGN) with slow amplitude fading.
The number samples which are used in a sequence length should be equal the one data. With a sampling rate Fs=8 kHz , a symbol rate of 100 bit/s and data word size of 80 bits(includes either sixty two or sixty three code bits and seventeen or eighteen synchronization bits which is added to them), that will produce a spreading sequence with 6400 samples. The watermark and speech signal spectrum must be so close together, to achieve robustness for the synchronization. Additionally, the spreading sequence is lowpass filtered (LPF) to match the voice channel bandwidth [2]. D. Data Embedding Embedding the watermark signal with maximum energy and minimum possible perceptual distortions are the main aims in the data embedding. To achieve the watermarked signal, the watermark signal which is spread should be embedded into the speech signal. Adding the watermark signal without any terms will make a high interference in the speech signal and maybe with changing the speech power, we lost the more quality. The process can be better control by using masking techniques for terms of temporal energy and spectral shape. Many watermarking algorithms are using the psychoacoustics models. The psychoacoustic model has been studied based on humans perception [4 ~ 7]. Linear prediction (LP) is used to spectral masking, which is enabling to make the similarity between the shapes of watermark and speech signal. A technique which is called bandwidth expansion has been used to prevent the interference of spectral peaks (formants) between the watermark and speech signals. It will be achieved by moving the all poles to the center of the unit circle. In the absence of the voice (unvoice), a minimal watermark level ( ) is remained to ensure to transmission.
][ = ][+ , [] ][,
Figure 3. Effectiveness of white Gaussian noise
F. Watermark Decoder As mentioned in system overview, each parts of the decoder will be explained. Although it has not been done in this paper because this paper is mainly focused on the speech quality in the receiver (refer to the output speech after AWGN channel in decoder block diagram figure2). 1) Whitening: The signal which is received will undo the spectral shaping with whitening filter.In the next step, the inverse LP filter is used to analysis again and then the bandwidth expansion is done by whitening filter too. As a matter of fact, this time the zeros which were moved towards to the center of the unit circle will be returned to their place. Of course, the signal's spectral shape which is received is not equal to the original speech signal, thus is expected they will be similar with spectral shaping process [2]. 2) Synchronization: Synchronization is done by using a matched filter in decoder. A special synchronization sequence which is known in transmitter and receiver has been added in payload data (in encoder). The matched filter can be detected them to start decoding. 3) Data De-Spreading and Decoding: The spreading sequence will be multiplying in the signal after synchronization . This performance to achieve the payload data is called De-Spreading. For decoding, is used a simple integrator with the knowing of the data bits and length of them, which is ready at the decoder. At this step, the process to reduce the sampling rate to the binary symbol rate is performed with downsampling [x]. Finally BCH decoder is used for error correction as far as possible. [2]. G. Mean Opinion Score (MOS) evaliation: The speech quality in output is estimated by MOS technique that International Telecommunications Union (ITU) recommendation P.800. It provides methods for conducting listening and conversational tests. For evaluating the perceptual quality as discussed before, MOS technique as shown in Table II have been adopted.
(2)
where
][: Watermarked signal before channel ][: Speech signal ][: Watermark signal [] : Speech signal power : Watermark amplitude : A minimal watermark level.
Speech signal power can be achieved by the equation is came done

[] = [ + ] ,
(3)
where: n : total number of samples M : frame length.
171
TABLE II
MOS Score for Speech Quality
Listening-Quality Scale Quality of the Speech Excellent Good Fair Poor Bad Score 5 4 3 2 1 Figure 5. Watermark signal
III.
RESULTS AND DISCUSSION
To evaluate the voice quality by the proposed watermarking algorithm, a series of simulations were carried out. Mean Opinion Score (MOS) test by ITU-T recommendation P.800 and was investigated to substantiate the perceptual quality. This study should be done with the users hearing. In other word, its a real experimental for simulation. At first, we need some information in input the system which we will explain them in below: x x x Data rate: channel data rate for the watermark (100 bits/second). Message size: The data payload size (12 bits). Host signal length: The speech signal which the length is 8.9 seconds and 71200 samples that is sampled by 8000 bits per seconds. Channel SNR: the channel was simulated in terms of existence a simple Additive White Gaussian Noise (AWGN) with adjusted the SNR = 20dB. Watermark floor: The effective minimum watermark energy in the signal (-18dB). Signal-to-watermark ratio (SWR): The ratio by which the watermarking is attenuated relative to the speech signal.
The watermark signal is shaped by the LP filter duration 1 sec. is as shown in Figure 5. The watermark signal is shaped and ready to embed in speech signal imperceptible by following the equation (2). Figure 6 shows the watermarked signal. Watermarked signal is transmitted in AWGN channel, and the signal is received is as shown in Figure7. The speech signal in output was estimated by MOS technique and 20 responders were joined in this test. The averaged votes of persons are used to measure the quality of the watermarked speech. The MOS result is shown in Table III. The signal's audibility is estimated in three positions of the system. First, the speech signal input, seconds watermarked signal and the signal is received in output.
x x
The speech signal is simulated using Matlab and was shown in Figure 4.
Figure 6. Watermarked signal
Figure 4. Original speech signal
Figure 7. Watermarked signal and AWGN noise
172
ACKNOWLEDGMENT
TABLE III.
Accountable Original signal
MOS test
Channel (watermarked & AWGN)
Watermarked signal
This study is sponsored by Universiti Kebangsaan Malaysia (UKM) through the university research grant OUP-2012-182.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 min max
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
4 4 4 4 4 3 3 4 4 4 4 4 4 4 4 4 4 4 3 4 3 4
3 3 2 3 3 2 2 2 3 3 3 3 3 3 3 3 3 3 2 3 2 3
[7] [6] [5] [4] [3] [2] [1]
REFERENCES
Shilpa Arora and Sabu Emmanuel.Real time adaptive speech watermarking for scheme for mobile applications,in 4 ICICS & 4 IEEE PCM conference ,vol. 2 ,pp. 1153 1157,singapore,December 2003. Horst Hering, Martin Hagmller,and Andreas krpfl,and Gernot Kubin.Safety and security increase for air traffic management through unnoticeable watermark airtrafic identification tag transmitted with the VHF voice communication, IEEE Proc. DASC, 2003. Marcos Faundez-Zanuy, Martin Hagmller, and Gernot Kubin, Speaker identification security improvement by means of speech watermarking,ELSEVIER on pattern recognition, vol. 40, pp. 3027 3034, February 2007. Ricardo A. Garcia, Digital watermarking of audio signal using a psychoacoustics auditory model and spread spectrum theory,in preprints of AES 107th Convention,New York,USA,September 1999. N. Cvejic,A. Keskinarkaus,and T. Seppanen.Audio watermarking using m-sequances and temporal masking,in processing to Audio and Acoustics,pp.227-230,New Platz,NY,USA,October 2001. Petar Horvation,Jian Zhao,and Niels J. Thorwirth,Robust audio watermarking:based on secure spread spectrum and auditory perception model,in proceeding of the Sixteenth Anumal Working Conference Information Security for Global Information Infrastructures.PP.181190.Beijing,China,Kluwer Academic Publishers,Norwell,MA,USA,August 2000. D.Kirovski and H. Malvar,Spread spectrum audio watermarking :requirment applications,and limitations,IEEE Fourth Workshop on Multimedia Signal Processing,pp.219-224,Cannes,France,October 2001.
mean
std
5
0
3.85
0.402373908
2.75
0.46291005
CONCLUSIONS The speech watermarking using spread spectrum technique have been successfully been implemented using simulation and then been evaluated by the MOS. The average score for MOS falls is so close to fair perceptual audibility quality. However, extensive studies on the effect of message length and channel properties is yet to be done. In the future, other factors such as coding and equalizing techniques for the speech signal in the transmitter will be investigated.
173

Iccce 06271174 Shervin

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Iccce 06271174 Shervin

Загружено:

Авторское право:

Доступные форматы

International Conference on Computer and Communication Engineering (ICCCE 2012), 3-5 July 2012, Kuala Lumpur, Malaysia

Voice Quality in Speech Watermarking Using Spread Spectrum Technique

978-1-4673-0479-5/12/$31.00 2012 IEEE

Figure 1. Watermark encoder block diagram

Figure 2. Watermark decoder block diagram

BCH Code for 12 -24 - 36 Bits Data Words

Data-bits k 12 bits 24 bits 36 bits

Code-bits n 62 bits 63 bits 63 bits

Parity-bits (n-k) 50 bits 39 bits 27 bits

Correction capability t 14 bits 7 bits 5 bits

Figure 3. Effectiveness of white Gaussian noise

Speech signal power can be achieved by the equation is came done

where: n : total number of samples M : frame length.

MOS Score for Speech Quality

RESULTS AND DISCUSSION

Figure 6. Watermarked signal

Figure 4. Original speech signal

Figure 7. Watermarked signal and AWGN noise

Вам также может понравиться