Вы находитесь на странице: 1из 6

Codecs: Encoding/decoding images and sounds

Adrian Mackenzie
a.mackenzie@lancaster.ac.uk
October 2006
Codecs (coder-decoder) perform encodin and decodin on !ideo" speec#" music and te$t.
%#e& scale" reorder" decompose and reconstitute perceptible imaes and sounds so t#at
t#e& can et t#rou# information net'orks and electronic media. Codecs are intimatel&
associated 'it# c#anes in t#e (spectral densit&") t#e distribution of ener&" of sound and
imae in electronic media.
Codecs pose numerous anal&tical problems for soft'are sc#olars. %#e& are mat#ematicall&
deep. Comin to rips 'it# t#em ma& entail lent#& immersion in tec#nical details.
Alt#ou# t#e& are responsible for displa&in man& mo!in imaes on screens toda&"
codecs t#emsel!es often #ide in #ard'are and lo'er-le!el code. %#e& come to li#t
occasionall&" usuall& in t#e form of an error messae sa&in t#at somet#in is missin* t#e
ri#t codec #as not been installed. +espite or per#aps because of t#eir con!oluted
obscurit&" t#e& catal&ze ne' relations bet'een people" t#ins" spaces and times in e!ents
and forms.
Patent pools and codec floods
,ideo codecs suc# as M-./-0" M-./-1" 2.260. 2.263" t#e important 2.261" t#eora" dirac"
+i!4" 4!i+" M5-./" 6M," 7eal,ideo" etc are stre'n across net'orked electronic media.
7ou#l& a #undred different audio and !ideo codecs are currentl& a!ailable" some in
multiple implementations. 8ecause codecs often borro' tec#ni9ues and strateies of
processin sound and imae from eac# ot#er" t#e& #a!e tanled eneaoloies.
:ea!in aside t#e tanle of relations bet'een different codecs and !ideo tec#noloies"
e!en one codec" t#e 'ell-establis#ed and uncontentious M-./-2 codin standard" is
e$traordinaril& comple$. M-./-2 (a.k.a. 2.262) desinates a 'ell-establis#ed set of
encodin and decodin procedures for diital audio and !ideo formalised as a standard
(;<O=;.C 03>0>) in t#e mid 0??0s. %#e standards for M-./-2 are 'idel& described. Man&
diarams" definitions and e$planations of codin and decodin t#e bitstream are a!ailable
in print and online (<mit#" 2000@ 6ikipedia" 2006). Open-source soft'are implementations
of t#e standard offer anot#er 'a& to e$maine t#eir implementation. Aor instance" ffmpe"
Bis a complete solution to record" con!ert and stream audio and !ideoB (ffmpe" 2006). ;t
#andles man& different !ideo and audio codecs" and is 'idel& used b& man& ot#er !ideo
and audio proCects (,:C" mpla&er" etc).
.conomicall& M-./-2 is a mosaic of intellectual propert& claims (almost D00 patents #eld
2
b& entertainment" telecommunications" o!ernment" academic and militar& o'ners
accordin to 6ikipedia (6ikipedia" 2006)). %#e lare patent pool attests to t#e economic
sinificance of M-./-2 codecs. As t#e basis of commercial +,+s" t#e transmission format
for satellite and cable diital tele!ision (+,8 and A%<C)" as t#e platform for 2+%, as 'ell
as t#e foundation for man& internet streamin formats suc# as 7ealMedia and 6indo's
Media" M-./-2 forms a primar& tec#nical component of contemporar& audio!isual
culture. ;t fiures in eo-political codec 'ars (e.. C#inaBs A,C codec !ersus t#e
increasinl& popular 2.261 !ersus ot#er !ersions suc# as Microsoft 6indo's ,C-0 E
6indo's Media ?).
Man& salient e!ents in t#e de!elopment of information and diital cultures (for instance"
M-3-base file-s'appin" or 5-./-based p#otorap#&) deri!e from t#e same tec#noloical
lineae as M-./-2 (loss& compression usin transforms). At an embodied" '#at appears
on screen or '#at 'e #ear is coloured b& t#e tec#ni9ues of Bloss& compressionB t#at
M-./-2 epitomizes. Codecs affect at a deep le!el contemporar& sensations of mo!ement"
color" li#t and time.
Trading space and time in transforms
%#e M-./ standard is comple$. +iital sinal processin te$tbooks caution aainst tr&in
to proram to it at #ome ('#ic# immediatel& suests t#e desirabilit& of tr&in to). %#e&
suest ettin someone elseBs implementation of t#e standard (<mit#" 2000 " 22F)G. 6#ere
does t#is comple$it& come fromH %#e purpose of M-./-2 as set out in t#e standards
document is eneric*
%#is -art of t#is specification 'as de!eloped in response to t#e ro'in need for a eneric
codin met#od of mo!in pictures and of associated sound for !arious applications suc#
as diital storae media" tele!ision broadcastin and communication. %#e use of t#is
specification means t#at motion !ideo can be manipulated as a form of computer data.
(;<O=;.C 03>0>-2* 0??F (.)" !i)
2o' does a Ieneric codin met#odJ end up bein so comple$ t#at Bit is one of t#e most
complicated alorit#ms in +<- Kdiital sinal processinGB (<mit#" 2000" 22F)H M-./-2
defines a bitstream t#at directl& addresses t#e complicated ps&c#op#&sical and
tec#nocultural processes of seein. Codecs put more pictures" more often" in more places.
;t mo!es sounds and imaes furt#er and faster in media net'orks t#an t#e& 'ould
ot#er'ise. ;t deepl& reoranises relations 'it#in and bet'een imaes and sounds"
bet'een t#ins and e$perience.
2o'e!er" to do t#at" !ideo codecs trade bet'een space and time at man& scales.
Alorit#micall&" M-./-2 combines se!eral distinct compression tec#ni9ues (con!ertin
sinals from time domain to fre9uenc& domain usin +iscrete Cosine %ransforms"
9uantisation" 2uffman and 7un :ent# .ncodin" block motion compensation)" timin
and multiple$in mec#anisms" retrie!al and se9uencin tec#ni9ues" man& borro'ed from
3
t#e earlier" lo'-bitrate standard" M-./-0 (;<O=;.C 000D2-0*0??3). %#is tradeoff impacts
#ea!il& on imaes. ;t c#anes '#at becomes of t#em. Onl& a sample of t#e man& processes
in t#e codecs can be discussed #ere. ; 'ill concentrate on '#at #appens at t#e lo'est
le!els of t#e picture" t#e block (> $ > pi$els). %#e ke& areas of interest for t#e purposes of
seein t#e trade-offs are t#e encodin and decodin sections of t#e soft'are.
+iital !ideo t&picall& arri!es at t#e codec as a series of frames. .ac# frame comprises
arra&s of pi$el-le!el luminance and c#rominance !alues. .ac# frame t#en underoes
se!eral p#ases of encodin and decodin. %#ese p#ases probe and re-structure of t#e
imae 9uite deepl&" almost to t#e pi$el le!el. +iital !ideo pictures are composed of 2+
arra&s of pi$els t#at #a!e muc# spatial redundanc& (t#at is" man& adCacent pi$els 'ill be
!er& similar). One priorit& is to e$press t#e spatial distribution of luminance (t#e
bri#tness or amount of li#t emitted) and c#rominance (t#e t'o sinals t#at encode color
information) as efficientl& as possible. %#e so-called ;--icture or ;ntra--icture codin" t#e
first p#ase of encodin" is based on spectral anal&sis. Man& !ideo and audio codecs toda&
rel& on Aourier %ransforms or" because it is easier to proram and concentrates t#e ener&
of t#e sinal into a smaller number of coefficients" on a particular !ariant of t#e Aourier
%ransform" t#e +iscrete Cosine %ransform (+C%). Aourier %ransform or spectral anal&sis
met#ods encompass a !er& 'ide rane of computational problems. %#ese met#ods
decompose sinals t#at !ar& o!er time or space into a spectrum of component fre9uencies
t#at can be summed toet#er to reconstitute t#e oriinal sinal. Learl& all !ideo codecs
transform spatiall& e$tended imaes into sets of fre9uencies. %#is allo's t#em to isolate
t#ose components of a sound or imae t#at are most perceptuall& rele!ant to #uman e&es
and ears. (More recent codecs suc# as t#e 8ritis# 8roadcastin CorporationJs open-source
+irac are raduall& replacin Aourier-based transforms 'it# I'a!eletJ-based transforms
because t#e& take less computation on t#e '#ole.)
%#ere is somet#in 9uite counter-intuiti!e in transform compression. %#e notion of t#e
transform is mat#ematical* it is an operation t#at takes an arbitrar& sinal and anal&ses it
as a series of sinusoids of different fre9uencies and amplitudes. Added toet#er" t#ese sine
or cosine 'a!es re-constitute t#e oriinal sinal. Aor t#e codin of a i!en picture" t#e ;-
-icture results from di!ision of t#e luminance and c#rominance arra&s into > $ > blocks.
%#e +C% applied to eac# of t#ese spatial blocks in isolation produces a series of
coefficients (or multipliers) of different fre9uenc& cosine 'a!es t#at rane from lo' to
#i# fre9uencies. %#e cosine 'a!e coefficients represent amplitudes of different
fre9uenc& cosine 'a!es. %#e coefficients sketc# distributions of li#t in t#e imae. %#is
means t#at t#e luminance and c#rominance !alues of an imae are compressed"
transmitted=stored and decompressed e!er sendin an& information about indi!idual
pi$els. %#e codec discards lo' !alue coefficients t#at describe indi!idual pi$el differences.
;t keeps t#e #i# !alue coefficients t#at e$press more eneretic components of t#e sinal. ;t
subCects t#ese coefficients to furt#er compression usin 9uantisation (t#at is" reducin
t#em to a subset of discrete !alues) and Bentrop& encodinB (t#at is" usin standard
compression tec#ni9ues suc# as 2uffman codin). 6#en t#e block is decoded (for
instance" durin displa& of a !ideo frame on screen)" t#e coefficients are re-attac#ed to
correspondin cosine 'a!es" and t#ese are summed toet#er to re-constitute arra&s of
!alues for luminance and c#rominance comprisin t#e block.
1
;n t#e source code for codecs suc# as t#eora" t#e densit& of matri$ or arra& manipulation
stands out in t#e transform p#ase of t#e encodin. %#ousands of blocks in eac# picture
undero +C%. ;n contrast to filmJs use of linear se9uences of frames" or tele!ision and
!ideoJs interlacin of scan-lines to compose imaes" transforms suc# as +C% subCect
imaes to #i#l& intricate reorderins. <ince blocks t#emsel!es are not framents of
pictures" but rat#er distributions of luminosit& and c#rominance" t#e& are put into t#e bit
stream E t#e data t#at flo's out of t#e encoder - in sometimes 9uite strane order" an
order t#at bears little direct relation to t#e displa&ed imae.
Motion prediction - forward and backwards in time
<oft'are #as lon been understood as closel& linked to ideation or t#ou#t" particularl&
mat#ematical t#ou#t. +espite t#e mat#ematical tec#nicalities of t#e transform
compression" t#e t#inkin present in soft'are cannot be reduced to mat#ematical t#ou#t"
or at least" mat#ematical t#ou#t as it is usuall& concei!ed. Codecs per#aps bear a closer
relation to cinematic t#ou#t and memor&. ;n t#eir #andlin of imaes" t#e& de!iate
radicall& from normal understandins of representation. ,ideo codecs are !er&
preoccupied 'it# t#e relations bet'een pictures (IframesJ). ;ndeed Cust as pictures
t#emsel!es are indi!iduall& anal&sed as distributions of luminance and c#rominance
!alues" !ideo codecs relate pictures to eac# in terms of motion !ectors.
M-./ !ideo ne!er flickers. ;t sometimes ets a bit Bblock&.B %#is is because t#e boundaries
bet'een pictures are not fi$ed in t#e same 'a& t#e& are in film frames or e!en in
tele!ision 'it# its IinterlacedJ scanned fields. - (for'ard prediction) and 8 (back'ard
prediction) pictures come after t#e transform-encoded ;-picture in a M-./-2 bitstream.
2o'e!er" t#is succession is cinematic. %#e 'orkin assumption be#ind t#e production of
- and 8 pictures is t#at not#in muc# #appens across successi!e frames t#at canBt be
understood as macroblocks (usuall& 1 block toet#er) underoin linear eometric
transformations (translation" rotation" ske'in" etc). %#e fact t#at not#in muc# #appens
bet'een frames apart from eometric transformation is used as t#e basis of t#e inter-
picture compression. ;ntra-picture compression of t#e space of t#e imae is t#e first maCor
component of M-./-2. Motion prediction bet'een frames" or time compression" is t#e
second.
;nter-picture compression relies on for'ard and back'ards correlations bet'een t#e
unencoded frames. ;t calculates motion !ectors for e!er& part of t#e imae. Aor eac#
frame" t#e M-./-2 codec anal&zes '#ic# parts #a!e mo!ed in comparison to t#e pre!ious
or later frame. ;t onl& transmits lists of motion !ectors describin t#e mo!ement of blocks
in relation to a ke&frame or reference picture. %#is fundamentall& alters t#e c#aracter of
frames ('#ic# is '#& t#e M-./ standard calls frames BpicturesB). 6e #a!e alread& seen
t#at rat#er t#an t#e ra' pi$el bein t#e elementar& material of t#e imae" t#e block
becomes t#e elementar& component. ;n motion prediction" t#e frame is no loner t#e
elementar& component of mo!ement" but an obCect to be cut up and sorted into sets of
F
motion !ectors describin relati!e mo!ements of blocks. %#e IpictureJ after encodin is
not#in but a series of !ectors describin '#at #appens to blocks. At t#e decodin end" an
M-./ decoder turns t#e streams of !ectors back into arranements of blocks mo!in
bet'een frames.
From complicated to composite
All t#is is a bit complicated. Motion prediction takes time. %#e ratio of intra-frame and
inter-frame pictures in a i!en bitstream depends on '#ere t#e encodin is done and t#e
band'idt# of t#e e$pected transmission c#annel (+,+" 3/ cellp#one" satellite diital %,"
2+%," t#e internet" etc). ;n an M-./ data-stream" t#e precise mi$ture of different frame-
t&pes (;" --for'ard and 8-back'ard) is defined at encodin time in t#e /roup of -ictures
(/O-) structure. ;t is t&picall& 02 or 0F pictures in a se9uence suc# as
;M88M-M88M-M88M-M88M-M88M. One intra-coded picture is follo'ed b& a dozen or so
block motion-compensation picture. %#eir order in t#e bitstream does not correlate
directl& 'it# t#e order of displa&. %#e combination of for'ard-prediction and back'ard-
prediction found in t#e /O- means t#at t#e M-./ bitstream effecti!el& treats imaes as
massi!e doubl&-linked list (Nnut#" 0??D" 2>0). %#e ratio of different frame t&pes to eac#
ot#er affects t#e encodin time because motion compensation is muc# slo'er to encode
t#an t#e #i#l& optimised block transforms. Codecs must make direct tradeoffs bet'een
computational time and space. %#e tradeoffs sometimes result in artifacts !isible on screen
as suc# as blockin and mosaic effects. At times" motion prediction does not 'ork. A
c#ane in camera s#ot" t#e effect of an edit" mi#t mean t#at no blocks are s#ared bet'een
adCacent frames. ;n t#at case" t#e codec falls back on intra-frame encodin.
Man& of t#e complications of t#e codecs arise because t#e& link !er& different scales of
tec#noloical infrastructure" markets and embodied cultural practice. Codecs connect
net'ork band'idt# constraints (a commerciall& marketed ser!ice)" con!entions of
spectators#ip" embodied conition" and media-#istorical forms. Codecs respond to t#e
economic-tec#nical cappin of band'idt# in telecommunications markets. %#eir teemin
patent pools reflect #i# estimates of t#eir !alue. %#e Beneric met#odB of encodin and
decodin imaes for transmission relates !er& closel& to t#e constraints and conditions of
telecommunications and media net'orks. As a con!ention" t#e M-./-2 standard cites
e$plicitl& or implicitl& a reat number of p#&sical 9uantities ranin from screen
dimensions" resolution and colour models t#rou# to net'ork and transmission
infrastructures to t#e clock rates and memor& sizes of semiconductor and data storae
tec#noloies. Oet t#e codec must propaate li#t" colour and sound on screen 'it#in
calibrated ps&c#o-perceptual limits. Anal&sin suc# intersections re9uires 'a&s of
articulatin di!erse realities. <oft'are like codecs mi#t offer places to bein
understandin '#at #appens '#en passaes bet'een different scales and orders
multipl&.
6
References
ffmpe (2006)" AAM-./ Multimedia <&stem" #ttp*==ffmpe.sourcefore.net=inde$.p#p"
Kaccessed 1 Aeb 2006G
;<O=;.C 03>0>-0" ;. ;. (0??F). ;nformation tec#nolo& - /eneric codin of mo!in pictures
and associated audio information* <&stems
;<O=;.C 03>0>-2 (0??F) ;nformation tec#nolo& - /eneric codin of mo!in pictures and
associated audio information* ,ideo
Nittler" A. (0??3) +raculas ,ermac#tnis %ec#nisc#e <c#riften" 7eclam ,erla" :eipzi" pp.
0>2-20D.
Nnut#" +. .. (0??D) %#e art of computer prorammin" Addison-6esle&" 7eadin" Mass.
<mit#" <. 6. (2000)%#e <cientist and .nineerBs
/uide to +iital <inal -rocessin" California %ec#nical -ublis#in.
6ikipedia (2006)" M-./-2"#ttp*==en.'ikipedia.or='iki=M-./-2" Kaccessed 02 5an 2006G

Вам также может понравиться