Вы находитесь на странице: 1из 8

A codec is an algorithm (OK lets be simple sort of a program!

!), most of the time installed as a software on a server or embedded within a piece of hardware (ATA, IP Phone etc.), that is used to convert voice (in the case of VoIP) signals into digital data to be transmitted over the Internet or any network during a VoIP call. The word codec comes from the composed words coder-decoder or compressor-decompressor. Codecs normally achieve the following three tasks (very few do the last one):

Encoding decoding Compression decompression Encryption DecryptionEncoding - decoding When you talk over normal PSTN phone, your voice is transported in an analog way over the phone line. But with VoIP, your voice is converted into digital signals. This conversion is technically called encoding, and is achieved by a codec. When the digitized voice reaches its destination, it has to be decoded back to its original analog state so that the other correspondent can hear and understand it. Compression decompression Bandwidth is a scarce commodity. Therefore, if the data to be sent is made lighter, you can send more in a certain amount of time, and thus improve performance. To make the digitized voice less bulky, it is compressed. Compression is a complex process whereby the same data is stored but using lesser space (digital bits). During compression, the data is confined to a structure (packet) proper to the compression algorithm. The compressed data is sent over the network and once it reaches its destination, it is decompressed back to it original state before being decoded. In most cases, however, it is not necessary to decompress the data back, since the compressed data is already in a consumable state. Types of compression When data is compressed, it becomes lighter and hence performance is improved. However, it tends to be that the best compression algorithms decrease the quality of the compressed data. There are two types of compression: lossless and lossy. With lossless compression, you lose nothing, but you cant compress that much. With lossy compression, you achieve great downsizing, but you lose in quality. You normally cant get the compressed data back to its original state with lossy compression, since the quality had been sacrificed for size. But this is most of the time not necessary. A good example of lossy compression is MP3 for audio. When you compress to audio, you cant compress back, you MP3 audio is already very good to listen to, compared to huge pure audio files. Encryption decryption Encryption is one of the best tools for achieving security. It is the process of changing data into such a state that it no one can understand. This way, even if the encrypted data is intercepted by unauthorized people, the data still remains confidential. Once the encrypted data reaches destination, it is decrypted back to its original form. Often, when data is compressed, it already is encrypted to a certain extent, since it is altered from its original state.

Codec Bandwidth/kbps G.711 64

Comments Delivers precise speech transmission. Very low processor requirements. Needs at least 128 kbps for two-way. Adapts to varying compressions and bandwidth is conserved with network congestion. High compression with high quality audio. Can use with dial-up. Lot of processor power.

G.722 G.723.1

48/56/64 5.3/6.3



An improved version of G.721 and G.723 (different from G.723.1)

G.729 GSM

8 13

Excellent bandwidth utilization. Error tolerant. License required. High compression ratio. Free and available in many hardware and software platforms. Same encoding is used in GSM cellphones (improved versions are often used nowadays). Robust to packet loss. Free Minimizes bandwidth usage by using variable bit rate.

iLBC Speex

15 2.15 / 44

Complete Definition
An A-law algorithm is a regular companding algorithm. For optimizing or modifying the dynamic range of an analog signal for digitizing, it is utilized in European digital communications systems. It is fairly akin to the mulaw algorithm used in Japan and North America. The wide-ranging active range of speech does not lend itself well to proficient linear digital encoding.

A-law algorithm encoding efficiently lessens the dynamic range of the signal. Thus, the coding effectiveness gets increased and results in a signal-to-distortion ratio which is advanced to that obtained by linear encoding for a particular number of bits. A-law algorithm is an audio compression scheme defined by Consultative Committee for International Telephony and Telegraphy or CCITT G.711 which constrict 16-bit linear PCM data down to 8 bits of logarithmic data. By restricting the linear sample values to 12 magnitude bits, the A-law compression is defined by its equation. In the equation, A is the compression parameter and x is the normalized integer to be compressed. It is a linear estimation of logarithmic input/output relationship. It is put into practice by using 8-bit code words. 8-bit code words permit for a bit rate of 64 kilobits per second.

By multiplying the sampling rate by the size of the code word, this is calculated.A-law algorithm breaks a vibrant range into a total of 16 segments. There are 8 positive and 8 negative segments. Each segment is twice the length of the preceding one. Within each segment, homogeneous quantization is used. For coding the 8-bit word, 1st (MSB) identifies polarity. Then, bits 2, 3 and 4 recognize segment. The final 4 bits quantize the segment.A-law algorithm has certain edges over the mu-law algorithm. The A-law algorithm provides a somewhat superior dynamic range than the mu-law algorithm, at the cost of substandard relative alteration for small signals. For an unvarying PCM, A-law algorithm calls for 13-bits. For using A-law algorithm, a worldwide link is mandatory.The application of A-law algorithm has been observed extensively

Both a-law and u-law are companders for the G.711 voice codec

A-law Compander formula (from Cisco's Waveform Coding Techniques )

G.703/G.704 Complete Definition

G.703 is a ITU-T standard for transmitting voice or data over digital carriers such as T1 and E1. G.703 provides specifications for pulse code modulation (PCM). G.703 also specifies E0 (64 kbps). For information about E0 audio see G.711.

G.703 is typically transported over balanced 120 ohm twisted pair cables terminated in RJ48C jacks. However, some telephone companies use unbalanced (dual 75 ohm coaxial cables) wires, also allowed by G.703.

G.704 uses synchronous frame structures used at 1544, 6312, 2048, 8448 and 44 736 kpbs hierarchical levels.

There are two main compression algorithms defined in the standard, the u-law algorithm (used in North America & Japan) and a-law algorithm (used in Europe and the rest of the world). Both are logarithmic, but a-law was specifically designed to be simpler for a computer to process. The standard also defines a sequence of repeating code values which defines the power level of 0 dB.

The u-law and A-law algorithms encode 14-bit and 13-bit signed linear PCM samples (respectively) to logarithmic 8bit samples. Thus, the G.711 encoder will create a 64 kbps bitstream for a signal sampled at 8 kHz.

G.711, also known as Pulse Code Modulation (PCM), is a very commonly used waveform codec. G.711 uses a sampling rate of 8,000 samples per second, with the tolerance on that rate 50 parts per million (ppm). Non-uniform quantization with 8 bits is used to represent each sample, resulting in a 64 kbit/s bit rate. There are two slightly different versions; -law, which is used primarily in North America, and a-law, which is in use in most other countries outside North America. G.711 u-law tends to give more resolution to higher range signals while G.711 alaw provides more quantization levels at lower signal levels. When using u-law G.711 in networks where suppression of the all 0 character signal is required, the character signal corresponding to negative input values between decision values numbers 127 and 128 should be 00000010 and the value at the decoder output is 7519. The corresponding decoder output value number is 125.

G.726 is an ITU-T ADPCM speech codec standard covering the transmission of voice at rates of 16, 24, 32, and 40 kbps. It was introduced to supersede both G.721, which covered ADPCM at 32 kbps, and G.723, which described ADPCM for 24 and 40 kbps. G.726 also introduced a new 16 kbps rate. The four bit rates associated with G.726 are often referred to by the bit size of a sample, which are 2-bits, 3-bits, 4-bits, and 5-bits respectively.

The most commonly used mode is 32 kbps, since this is half the rate of G.711, thus increasing the usable network capacity by 100%. It is primarily used on international trunks in the phone network. It also is the standard codec used in DECT wireless phone systems and is used on some Canon cameras.

G.728 is a ITU-T standard for speech coding operating at 16 kbps. Technology used is Low Delay Code Excited Linear Prediction LD-CELP. Delay of the codec is only 5 samples (0.625 ms). The linear prediction is calculated backwards with a 50th order LPC filter. The excitation is generated with gain scaled VQ. The standard was finished in 1992 in the form of algorithm exact floating point code. In 1994 a bit exact fixed point codec was

released. G.728 passes low bit rate modem signals up to 2400 bit/s. Also network signaling goes through. The complexity of the codec is 30 MIPS. 2 kilobytes of RAM is needed for codebooks.

G.729 is an ITU-T algorithm for voice encoding that produces an 80-bit voice sample every 10 msec (bit rate of 8 kbps). The codec works in blocks of 10 msec and so it is possible to generate frames of multiple 10 msec duration.

G.729 is an audio data compression algorithm for voice that compresses voice audio in chunks of 10 milliseconds. Music or tones such as DTMF or fax tones cannot be transported reliably with this codec, and thus use G.711 or out-of-band methods to transport these signals.

G.729 is mostly used in Voice over IP applications for its low bandwidth requirement. Standard G.729 operates at 8 kbit/s, but there are extensions, which provide also 6.4 kbit/s and 11.8 kbit/s rates for marginally worse and better speech quality respectively. Also very common is G.729a which is compatible with G.729, but requires less computation. This lower complexity is not free since speech quality is marginally worsened.

The annex B of G.729 is a silence compression scheme, which has a VAD module which is used to detect voice activity, speech or non speech. It also includes a DTX module which decides on updating the background noise parameters for non speech (noisy frames). These frames which are transmitted to update the background noise parameters are called SID frames. A comfort noise generator (CNG) is also there because in a communication channel, if transmission is stopped, because it's not speech, then the other side may assume that link has been cut. This is also taken care of by the annex B standard.

Recently, G.729 has been extended to provide support for wideband speech and audio coding, i.e., the transmitted acoustic frequency range is extended to 50 Hz - 7 kHz. The respective extension to G.729 is referred to as G.729.1. The G.729.1 coder is hierarchically organized: Its bit rate and the obtained quality are adjustable by simple bitstream truncation.

Pulse-code modulation (PCM) is a digital representation of an analog signal where the magnitude of the signal is sampled regularly at uniform intervals, then quantized to a series of symbols in a numeric (usually binary) code. PCM has been used in digital telephone systems and is also the standard form for digital audio in computers and the compact disc red book format. It is also standard in digital video, for example, using ITU-R BT.601. However, straight PCM is not typically used for video in standard definition consumer applications such as DVD or DVR because the bit rate required is far too high. Very frequently, PCM encoding facilitates digital transmission from one point to another (within a given system, or geographically) in serial form.

The form of modulation in which the information signals are sampled at regular intervals and a series of pulses in coded form are transmitted representing the amplitude of the information signal at that time. For T1 applications, a method of converting successive (every 125 us) analog samples of a voice waveform to successive 8-bit codes, to be transmitted in an 8-bit timeslot of a T1 frame. In "robbed bit" frames, only the most significant 7 bits are used to encode the sample. The total bit rate for such a channel is (8000 samples/sec) x (8-bits/sample) = 64000 bits/sec.

Codecs theoretical bandwidth usage expands with UDP/IP headers:

Codec BR NEB G.711 64 Kbps 87.2 Kbps G.729 8 Kbps 31.2 Kbps G.723.1 6.4 Kbps 21.9 Kbps G.723.1 5.3 Kbps 20.8 Kbps G.726 32 Kbps 55.2 Kbps G.726 24 Kbps 47.2 Kbps G.728 16 Kbps 31.5 Kbps iLBC 15 Kbps 27.7 Kbps BR = Bit rate NEB = Nominal Ethernet Bandwidth (one direction)

MOS Mean Opinion Score for VoIP testing and what it means for your call quality

MOS or Mean Opinion Score gives VoIP testing a number value as an indication of the perceived quality of received voice after being transmitted and compressed using codecs. This measurement is the result of underlying network attributes that act upon data flow and is useful in predicting call quality and is a good VoIP test tool in determining issues that can affect your VoIP quality and your conversations.

Mean opinion score (MOS) MOS Quality 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad

Impairment Imperceptible Perceptible but not annoying Slightly annoying Annoying Very annoying

Maximum for G.711 codec Very satisfied Satisfied Some user satisfied Many user dissatisfied All users dissatisfied Not recommended

4.4 4.3-5.0 4.0-4.3 3.6-4.0 3.1-3.6 2.6-3.1 1.0-2.6

As an example, the following are mean opinion scores for one implementation of different codecs
Codec G.711 (ISDN) iLBC AMR G.729 G.723.1 r63 Data rate [kbit/s] 64 15.2 12.2 8 6.3 Mean opinion score (MOS) 4.1 4.14 4.14 3.92 3.9

Codec GSM EFR G.726 ADPCM G.729a G.723.1 r53 G.728 GSM FR

Data rate [kbit/s] 12.2 32 8 5.3 16 12.2

Mean opinion score (MOS) 3.8 3.85 3.7 3.65 3.61 3.5

What affects your VoIP MOS test score. first, it is important to understand that MOS - Mean Opinion Score - is a relative scale and is built upon many factors which can affect voice quality VoIP measurements are collected for after testing the one-way delay or the latency of the connection, packet loss with a metric to include the number of consecutive packets lost, and the amount jitter(difference in time it takes packets to arrive).Calculations then factor that can be used to estimate a MOS score.

Propagation delay
which is is the time required for a digital signal to travel end-to-end across the entire network. The greater the distance the greater the propagation delay. Additionally the data has to travel through network routers, switches and other devices like firewalls, each adding its own (transport) delay.

Packetization delay
which is the time required to digitize the signal for the codec used for sending over the internet and decode it at the far end. A more compressed codec like G.729 has a higher packetization delay than a non-compressed codec like G.711 codec.

Jitter buffer
is the delay introduced by the ATA device to hold one or more datagrams, to compensate in variations of arrival times. In VoIP a jitter buffer is an area where voice packets can be collected, stored, and then sent to the processor in more evenly spaced intervals. Variations in packet arrival time (jitter) is usually the result of network congestion or route changes. The jitter buffer, which is located at the receiving end device of the voice connection, intentionally delays the packets so that received voice is presented correctly with less distortion. The following items can all affect call quality: Bandwidth Hardware Jitter Latency Packet Loss