Вы находитесь на странице: 1из 26

Computer Assisted Language Learning

1999, Vol. 12, No. 1, pp. 328

09588221/99/1201003$15.00
Swets & Zeitlinger

Towards an Aesthetics of Multimedia


Frank L. Borchardt
Department of German, Duke University

ABSTRACT
The considerable contribution of cognitive sciences to the development of the humancomputer
interface opens the argument of the following article. The wide diversity of the disciplines
contributing to this development acts as a reproach to the history and criticism of media for their
failure to contribute their share. The body of this paper presents a concise history of media innovations as an introduction to the problems presented by the humancomputer interface in the
most recent of media innovations. Both the failures and the successes of the history and criticism
of media provide prototypes for the investigation of analogous problems in the most modern of
multimedia. The constituents of multimedia are examined separately (vision and audition), and,
finally, the problems of judging combinations of elements are outlined.

1. INTRODUCTION: COGNITIVE SCIENCES AND THE INTERFACE


Theoretical design, information design, technical design, graphic
design, communication design, all these are coming together. I
think the successful design shops are those that are rising to the
challenge of taking a broader perspective of what design is about.
Bart Marable, quoted in Kuchinskas (1998)
Cognitive sciences may generally be described as the psychology of knowing, but only if that discipline is understood as located as an equal partner in
dynamic combination with adjacent disciplines such as linguistics, computer
science and statistics, biology, and physics. Learning theory and perception
have joined up with phonetics and semantics, speech recognition and computer
vision, neurology, optics and acoustics to explore what it is and how human
Correspondence: Frank L. Borchardt, Department of German, Box 90256, Duke University,
Durham, North Carolina, 27708-0256, USA. Tel: +919 660 3161. Fax: + 919 660 3166. E-mail:
frankbo@acpub.duke.edu.
Manuscript submitted: November 1998
Accepted for publication: January 1999

FRANK L. BORCHARDT

beings know. The great classics of the interface, humancomputer interaction


and presentational and technical design emerge directly from developments in
these disciplines (Andriole, 1995; Cox, 1993; Gardiner & Christie, 1987;
Helander, 1988; Kukulska-Hulme, 1998; Salvendy, 1997; Salvendy & Smith,
1993; Shneiderman, 1987; Treu, 1994).

2. THE CLASSIC SCREENS


The dominance of the cognitive sciences in the discussion of the
humancomputer interface has had fertile consequences, many of them sensible and useful, some, however, less so. A kind of consensus has arisen
around the studies of the last ten years which has had direct impact on the
way that screens are designed, especially by academic users or commercial
producers aiming at the academic market. A remarkable uniformity characterizes these screens. One corner occupying no less than 30 per cent of the
surface will be dedicated to the text, which might be video, graphic or
words in print. The surrounding spaces are routinely divided into smaller
rectangular frames performing helping functions which comment one way or
another on the principal frame. Related information appears in clusters, the
visual arrangement underscoring the relatedness of the information. While
this is surely one effective means of the organization of a screen, it is just as
surely not the only one (Fig. 1). The disproportionate frequency with which
this design has been adopted (with variations) suggests that the observation
of the classic authorities has become normative in certain circles.
Furthermore, the unconditional dominance of metaphors drawn from the
Gutenberg universe in instructional software designed by or for academic
users points in the direction of a missing corrective: the history and criticism
of media have not been sufficiently taken into account.

3. THE MISSING HUMANITIES


What is remarkably and explicitly absent from the great cognitive sciences mix
is the historic discipline concerned more specifically and exclusively than all
others with human knowledge: epistemology, the philosophy of human
knowing. Epistemology, like cognitive sciences, has sister disciplines, among
them the philosophy of beauty and the history of ideas, and niece and nephew

TOWARDS AN AESTHETICS OF MULTIMEDIA

Figure 1. Classic multimedia screen.

disciplines in the history of taste and the history and criticism of media. The
disciplines concerned with knowing and the senses have, over time, come to
be associated with aesthetics, a relatively new term, eighteenth-century in
origin, unknown to the Greeks who provided the stem, aisth- for feeling or
sense, the same stem we all know from anaesthetic. It is in the sense
implied here that this article uses the title term aesthetics: that is, the study
of sensory events in historical and critical environments. Historical means
studying the genealogy of sensory events and understanding how they develop over time. Diachronic is what this historical approach used to be called in
humanistic research to distinguish the approach through time from the synchronic. A synchronic approach to an event largely disregards time and concentrates instead on the constituents of an event. To say that an
eighteenth-century English country house was built with the golden section
of the ancient Greeks in mind is a diachronic observation. To study the effect
on structures built on the axiom of the golden sectionthat is, on an outline

FRANK L. BORCHARDT

where the length of one shorter segment is proportionate to the next, a longer
segment as the longer is to the sum of the two (a:b::b:(a+b), roughly
3:5::5:8)is synchronic. The history of the idea vanishes as the student
wrestles with the question of proportionality in architecture.
The position in which the humancomputer interface finds itself at this
moment seems, by sheer coincidence, to resemble the state of affairs in the
discussion of a new art form from around the year 1700namely, grand opera
(Flaherty, 1978).
4. MULTIMEDIA: DIACHRONIC
4.1. The predecessor: Drama
Unsung, unchanted, plain spoken plays were almost certainly the exception in
the history of drama before AD 1475 and became more prevalent thereafter
purely on the basis of ignorance of how plays actually were supposed to be
performed. Partially by virtue of this misunderstanding, spoken drama
achieved success and standing and established itself as an autonomous form
of expression. Indeed, plain spoken drama gradually became the dominant
form of verbal performance art across Europe. When the founders of grand
opera transformed the spoken text to song and added musical accompaniment,
they almost certainly had no idea that they were generating a new art form.
Quite the contrary. They almost certainly believed that they were reviving the
musically performed drama of Greek antiquity, and doing so with total accuracy. All that opera did was to restore music to drama, from which it had been
dropped by negligence. By virtue of that misunderstanding, another accident
of history, opera, became a new art form.
4.2. Criticism of opera
4.2.1. Negative
Observing these developments around 1700 were certain critics who had studied at the feet of the great French and Italian critics of the previous century
who themselves had studied at the feet of the Greek masters of antiquity. There
were also those who had studied the literary masters of the preceding century,
mostly French, mostly in drama, at the height of a great poetic flowering in
France. In general, this party of critics found opera an abomination. Opera
defied the norms laid out by classical drama. It mixed media, not only music

TOWARDS AN AESTHETICS OF MULTIMEDIA

and drama, but also adding dance, architecture, painting and mime to the concoction. Opera defied simplicity and order. It was complex and asymmetric.
Opera was impure, and, because it was impure, it was bad art.
4.2.2. Empirical
Then there were those who observed that opera was immensely popular in all
classes of society, that opera was great fun to attend, that there were good
operas and bad operas. This party of critics suggested that the norms laid down
by other art formstoday we would say mediawere not necessarily
applicable to this new art form. Furthermore, they suggested that appeal to
authority, deductive application of pre-existing models, and rule-based observation contributed little or nothing useful to explaining the success of the new
art form. What they suggested instead was that one needed to look at the actual operas first, and to do so on their own terms. Then the thoughtful critic
might derive some rules, guidelines and dynamics from the art form itself.
The rules had, however, to be suggested by the object of study, not by its
cousins or its ancestors; i.e., by the art form itself. The social signal that this
enterpriselooking at the performances themselvesmight be worth the
effort eventually arrived some time in the eighteenth century, when opera
achieved the widespread approval of people known otherwise for good taste
and sound judgement. They turned their backs on the deductive rules of the
previous century; generalizations about the art form were supposed to be
acquired by induction. Experience was to become the norm.
4.2.2.1. An apostle of empirical criticism: Gotthold Ephraim Lessing
One of the great applications of this principle was written down by the eighteenth-century German playwright, theologian and critic, Gotthold Ephraim
Lessing. In an essay called Laocoon, Lessing draws a distinction between literature, on the one hand, and two- and three-dimensional art forms on the
other. He concluded that the proper dimension of sculpture, for example, was
space. The principal reality of sculpture, painting and architecture was situation, place, disposition in space. The role of time in these art forms was
restricted to the moment, the appropriate moment, the pregnant moment (or
what you will), but the moment nonetheless. Otherwise plastic art (art that
can be touched) belongs contiguously in space. Poetry, by contrast, is located
sequentially in time (Lessing, 1766).
Lessing reached these insights by observation of the great literary epics of
the past. He noted the static, spatial, contiguous description of Aeneass

FRANK L. BORCHARDT

shield in the Aeneid of Virgil (VIII, 626728), which grows cold and tedious
from the constantly recurring here is, and there is, and nearby stands,
and not far from there is seen. He contrasted this static and colourless
occurrence of sculpture in poetry with the dynamic representation of the shaping of Achilless shield in the Iliad of Homer (XVIII, 478608).
These, among countless other examples, brought Lessing to such generalizations. Having made them, Lessing directs the reader to contemplate the very
materials of the contrasting art forms. It is clear that poetry reaches the eye or
ear of the audience serially and reproduces events that are successive in time;
it is just as clear that three-dimensional art consists of objects that are coexistent in space and reaches the eye of the observer in a flash, however often the
observer may wish to repeat the experience of the instantaneous reception of an
impression in time. In modern terms, it would be safe to say that poetry is, if not
four-dimensional, at least at home in the fourth dimension; whereas art made of
materials which occupy space are, in and of themselves, three-dimensional.
To be sure, no experience determined by matter can escape space and time.
Nonetheless, certain objects change very, very slowly over time and suggest
some form of endurance or permanence; and other objects, just as material
and real, do not unveil themselves in one location, instantly, but in any or
many locations, serially, one state after another, over an extended period of
time. Lessing would maintain that it is no more possible for sculpture, however grandly conceived, however magnificent in proportion, to perform the
same function as a poetic narrative than it would be for a poem to be substituted, in and of itself, for a temple to the gods. A poem and a temple are two
thoroughly different objects: the rules which govern the one cannot possible
be the same as those which govern the other, however related their purposes
may or may not be.
A distinction as crass as that between poetry and architecture may seem
simple-minded indeed. But there are many instances were people cross the distinctions by calling, say, a sculpture, poetry in stone, leaving the listener perhaps impressed but surely baffled at the same time, or by demanding that
poetry draw pictures (ut pictura poesis). The distinctions do, however, need to
be made and remembered, even if later, in mixed or multimedia, they have
again to become blurred. The distinctions have a basis in the empiry, in
straightforward but informed observation. The physical constituents of an art
form in its actuality differ from one art form to the other. Carrara marble is not
iambic pentameter. Physical constituents, furthermore, help to determine what
the suitable and congenial forums will be for the development of the art form.

TOWARDS AN AESTHETICS OF MULTIMEDIA

Statues of Carrara marble belong in public spaces, and buildings public and
private, and are determined by space; poems belong to occasions, some public, many more private, and unravel over time. Let us call the physical constituents the matriel. The matriel determines the dimension of the art form.
4.3. Criticism of modern media
The quandary of eighteenth-century aesthetics over against opera provides an
analogy to the present condition of multimedia: that is to say, a new form has
arisen, and the inclination to employ the conventions of previous forms in the
new form seems irresistible. The solution to the quandary also applies directly to multimedia: inductive method (observation) and intensive study of the
constituent materials, the matriel. This led the predecessors to understand the
difference between marble and wordsand the consequences of such differences for the artsand must do the same for modern multimedia.
4.3.1. Modern media and its antecedents: Marshall McLuhan
4.3.1.1. The Gutenberg workshop
The analogy to present circumstances is incomplete without acknowledging
the contribution of Marshall McLuhan to the modern understanding of the
internal aesthetic mechanisms of media and their external history. McLuhan
was among the first to observe that new media incline to imitate the preceding
dominant communicative form (McLuhan, 1962). This was the case with
printed books, to begin with, which were the closest possible imitation of manuscripts written at the same time. Until the development of Roman type (completed by about 1483), all fonts were more or less successful imitations of
manuscript handwriting. In a certain sense, the Gutenberg Bible is an exception to this rule. The font designed for the forty-two line Bible may have been
inspired by the finest contemporary hand, but the elegance of the characters
vastly transcended that available in all but the most festive, ceremonial manuscripts of the time. Even the Mainz Catholicon (an encyclopaedic dictionary
which generally qualifies as the second book ever printed, at least with an identifying colophon) descended from the lofty level of the forty-two line Bible
and imitated utilitarian contemporary handwriting.
4.3.1.2. Lotus Organizer
This rulethe new imitates the oldapplies to modern technology,
sometimes with frightening perfection: consider Lotus Organizer, with

10

FRANK L. BORCHARDT

electronic rings for the electronic holes in the electronic paper edged with
electronic tabs. Here is a technology denying that it is what it is and pretending, as comprehensively as possible, to present itself as its non-threatening
predecessor.
4.3.2. The predecessors of instructional multimedia
Lessing and McLuhan provide most of the tools necessary for a critique of
multimedia in its present condition and for the prospects of multimedia in the
foreseeable future. Multimedia, however, also has a past, one as influential on
its successors as any ancestor technology. Those with a mission to convey
knowledge of all kinds understood how desirable it was to employ more than
one medium in the task. The earliest were perhaps word and song; later came
text and graphic. When the publishers of modern language textbooks were
obliged to provide audiotape with their printed materials, they were already
engaged in multimedia presentation. Those packages must even more be considered multimedia which have had to include on-line drill-and-practice,
videotape or videodisk, as well as audiotape or CD-ROM. But even these complicated packages are not without precedent.
As early as the seventeenth century, certain visionaries who imagined that
modern language education would one day become a necessity experimented in their printed worksdominated by wordswith illustrations: for
example, graphics which might help to teach the Cyrillic alphabet to
Westerners who knew only Roman fonts. The Janua Linguarum (1631) and
Orbis Sensualium Pictus (1658) of John Comenius provide the ancestors of
illustrated language text books, even of flash cards. The great pedagogical
insight of Comenius was certainly informed by a sense of the difficulty of
learning languages. Comenius, a Moravian from Bohemia, had to face the
problem of trilinguality. What is worse, the three languages with which he
had to dealGerman, Latin and Czecheach belonged to a different language family. Illustrations were one way to bridge the gap (Comenius,
1672). There is clearly a sense in which all of these predecessors can legitimately be called multimedia, but for the present circumstance let multimedia mean the successor of this noble tradition, specifically the one which
employs the electronic digital mode.

TOWARDS AN AESTHETICS OF MULTIMEDIA

11

5. MULTIMEDIA: SYNCHRONIC
5.1. The sensory domain of multimedia
5.1.1. The senses and time
Modern multimedia involve (so far) two of the four senses: hearing and seeing. Touch is involved only in conventional inputting by keyboard and mouse,
and taste is involved only in an applied sense. The elements which make up
multimedia are fundamentally four: text, graphics, video and audio, of which
text, graphics and video remain largely the object of seeing, and audio alone
the object of hearing. Of these four, three may be presented freeze-frame (as
opposed to simultaneously), text, graphics and video; all four, including
audio, can be presented both simultaneously and over time, including audio,
which must be presented over time and cannot be presented any other way.
5.1.2. Hearing
Of the four components, that which appeals to the sense of hearing is probably the least developed so far. However, certain advances are being made in
sound, as much by practitioners as theoreticians. Frankly, this development is
an instance of practice catching up with (long available) technology. As far as
speech recognition is concerned, there are many programs which are beginning to do good work both as a substitute for keyboard inputting and as a shortcut for navigating around programs (LaRocca, 1994).
5.1.3. The components of multimedia audio
The components of multimedia audio are basically three: speech, music and
sound effects. Each of these may find room in an application as either natural or synthesized. The era in which analogue sound was provided on separate
analogue equipment such as the InstaVox (directed by computer but not
immediately available to the computer as digits) is probably long past. The
digitization of natural speech, music and sound effects remains slightly problematical, in that digital recording of satisfactory quality requires substantial
computational resources, both in processing and storage. Until these stumbling
blocks are removed, home-brew, digitized natural audio applications will probably remain somewhat restricted. The pace of progress in the power and capacity of processing and storage, and the accompanying decline of cost, leads one
to imagine that these particular stumbling blocks are not long for this world.
Computer-originated soundthat is, synthesized speech, music and sound

12

FRANK L. BORCHARDT

effectson the other hand, are highly economical in respect to processing and
storage, although it still requires professionals to generate them in the first
place.
Speech recording (i.e., digitization) is available at low cost. For many interactive purposes, speech recording can be used to simulate human interactions.
The quality of this exploitation rests entirely in the imagination of the designer. There are, as yet, far too few instances of creative use of this function. To
be sure, this has to do with the relatively recent employment of audio cards in
the configuration of most computers. As soon as designers are aware that most
of their clientele can easily record speech, then recorded speech will start to
play a greater role.
5.1.4. Scarcity of audio
The disparity between technology and acceptance of technology is not adequate by itself to explain the fact that the spoken, played and heard aspects of
multimedia remain among the least commonly employed. The inclination to
downplay audio seems to be an expression of a neo-puritanical, purist penchant. It is no accident that expendable and merely decorative features of a
program are named bells-and-whistles. There is, to be sure, a great abundance of terrible implementations of audio in multimedia events. This seems
especially to be the case on the Internet, where the joy of invention or discovery occasionally tempts home page designers to regale the innocent surfer with
veritable concerts of irrelevant electronic tinkling. This does not by any means
excuse the high-handed dismissal of all audio to where it is commonly found,
in guidelines or rules-of-thumb. These reveal themselves as artefacts of
their times and locate themselves squarely in the multimedia era as equivalent
to silent movies (see Section 6). There were in those days, too, pundits who
declared that audiences would not be able to tolerate the overabundance of
stimulation. Regardless of the consensus and the authority of todays pundits,
audio is not only important now, but can only grow more important as time
moves on.
5.1.5. Human-to-human, computer-to-human
Audio in the shape of digitized sound (speech and music) is altogether essential for countless Internet events imitating radio and television services.
Likewise, those events which imitate a classroom have already found good use
for audio, to enhance listening and writing skills and accompany video or slide
presentations. Countless music lovers burn their own CDs from digitized

TOWARDS AN AESTHETICS OF MULTIMEDIA

13

music available on the Web, regardless of the copyright consequences. Webbased telephony and tele-conferencing are a greater or lesser reality and presume at present satisfactory rates and quality of audio transmission and
improvement in them over time. Designers of taste and good sense already
employ soundtracks of satisfying utility in order to enhance or complement
their programs. These successful implementations will inevitably develop
that is, grow better and more widespread.
5.1.6. Humancomputer conversation
Nowadays, audio can also be used for some purposes that employ speechrecognition. In this respect, the future holds one dramatic inevitability: that
people will one day communicate with computers in the way they do with
other human beings rather than the way they do now, by means of keyboard
and mouse. That aspect of the future of audio should alone warn the haughty
that they cannot afford to disparage or dismiss the audio component of multimedia.
5.2. Audio as a test case
Audio provides the opportunity for study of the nature of the whole new multimedia art form. Furthermore, it does so at a more primitive stage of development than that of the other components. At this stage it may be a bit easier to
gain an overview of the elements and structures that constitute audio and echo
through the visual components as well. The single most distinctive quality of
audio is that it unravels over time. It always passes through time. It is never
susceptible to freeze-frame, as are all the other components of multimedia.
Manuscript or printed musical notation may, of course, appear all at once, in
one place, as a static paper page, but notation is no more music than a screenplay is a movie. As soon as audio is recognized as an intrinsic and inseparable component of multimedia, the operating metaphors which govern the
realization of multimedia on computer can no longer remain purely spatial.
The line, the page, the book, the bookshelf, the library are all very well, but
they fall short of exhausting the fundamentally dynamic, temporal nature of
multimedia (Borchardt, 1998).
5.2.1. An audio tool chest
One possible list of factors which might help shape observations in respect to
sound include: volume (loud or soft), pitch (high or low), timbre (resonant or
thin), attack and decay, rhythm (rate of recurrence of emphasis), duration,

14

FRANK L. BORCHARDT

velocity (rate of presentation of acoustic content; e.g., of melody), acceleration (whether rate is constant, increasing or decreasing), iteration (if portions
of acoustic content are repeated, and how often), periodicity (recurrence of the
whole, and at what frequency), familiarity (or not; past orientation), predictability (or not; future orientation).
It is possible to distribute these factors across the three kinds of acoustic
presentation likely to occur in multimedia performances: speech, music and
sound effects (see Fig. 2). Overall, the factors can be separated by their specific appropriateness to the sense of hearing, on the one hand (volume, pitch,
timbre), and the way they deal with time, on the other (rhythm, duration,
velocity, acceleration, attack, decay, iteration, periodicity, familiarity and
predictability). Of these, most have to do with time passing by some calibration marker in the present (rhythm, duration, velocity, acceleration,
attack, decay); several can be extended from the present into future time
indefinitely (iteration and periodicity); one invokes the past to the present
expression (familiarity); and one employs the present to await the future
(predictability).
5.2.2. Surgeon Generals warning
Surgeon Generals warning: This kind of schedule is not meant by any standards to be prescriptive; quite the contrary. It is meant at best to be suggestive;
actually, it is meant (1) to be accepted as it is for those satisfied that it is
exhaustive and makes all the real and needed distinctions, (2) to be repudiated, thrown out and begun all over from scratch by those who disagree with
one, several or every presupposition behind it, and (3) to be adapted, abbreviated and expanded by those looking for instruments tailored to their own convictions and particular tasks. Frankly, every time the author studies these
charts, he changes the descriptions (though, more rarely, the descriptors). The
final purpose of such a schedule is to encourage critical thought about the
medium under scrutiny.
The explorer should experiment with the number of qualities per feature.
Three has been chosen here in order to avoid binary and opposite categorizations, and to suggest that the categories can slide across a large spectrum.
Some researchers may find binary distinctions more usefulindeed, more
computerizablethough the result would probably require much larger
tables. Choosing any other number but two, however, also has a drawback: not
all categorizations will fill three slots, and some may require more. There is
always the threat that such a schedule will prove to be Procrustean.

Figure 2. Audio scheme.

TOWARDS AN AESTHETICS OF MULTIMEDIA

15

16

FRANK L. BORCHARDT

Nonetheless, some activity such as this is probably essential to any creative


critique of the medium. With such an instrument it should be possible for a
viewer (or listener) to make the critical distinctions necessary for understanding why a certain effect is pleasing or disturbing or supportive of the greater
whole or distracting. I like it, or I dont like it, is not enough. Even statistical studies are inclined to leave a great deal to be desired, in part because of
the high degree of subjectivity involved in both examiner and test groups, as
well as the unimaginably complex and numerous variables that accompany
perception. Only experiments of the most intensely focused and restricted kind
promise outcomes applicable to other situations.
5.3. Vision
5.3.1. A vision tool chest
5.3.1.1. Basic distinctions: Freeze frame vs. over time, legibility, etc.
A similar instrument for visual representation on screen would have to take
into account graphics, video and text, all in spatial, static or freeze frame
mode, as well as in temporal, dynamic or over time mode (see Fig. 3). The
first questions a viewer is likely to ask automatically of the elements of a
screen have to do with legibility. Are these fonts large and clear enough to be
read at a comfortable distance? Are these graphics immediately communicative or do they require physical (or mental) squinting of the eyes? Does this
video provide information or entertainment directly as a feast for the eyes or
are these talking headsat best, radio with visuals? On the assumption that
screen messages mean to persuade, instruct or delight, what role does location
on the screen play in the accomplishment of such ends? Or proportion,
absolute and relative? And, just in general, what are the qualities of the components of the screen with which a designer can design?
5.3.1.2. Surface and angle
It would be legitimate to ask how different an effect might be in full-screen mode
as distinguished from framed on the surface of the screen or removed behind the
surface of the screen by a window. Screen objects can appear across the surface
of the screen, as text normally does in a word processor or graphics in a paint
program. The mental test of surface would be to see whether a user might be
tempted to use the screen as a piece of paper and go after a typo with a handheld blue pen. For text and graphics, full-screen mode will usually locate objects

17

Figure 3. Vision scheme.

TOWARDS AN AESTHETICS OF MULTIMEDIA

Figure 4. Open audio scheme.

18
FRANK L. BORCHARDT

Figure 5. Open vision scheme.

TOWARDS AN AESTHETICS OF MULTIMEDIA

19

20

FRANK L. BORCHARDT

on the surface of the screen, even though special purposes (like 3-D screen
savers) might defy that generalization more for the fun of it than anything else.
Video, on the other hand, whether full-screen or windowed, will almost always
have a greater or lesser remove from the viewer: that is, the distance of the
camera lens from the object being pictured. This phenomenon is very like the
decision a portrait painter has to make when placing the subject in space on the
canvas. How far from the surface of the canvas does the artist desire the subject
to be? If any, what message would either the artist or the patron wish to send by
such a decision? Likewise, the angle of view almost always conveys a message:
looking up (worshipfully), looking down (condescendingly). In the single view
perspective of traditional painting, photography and video, depth is sacrificed as
soon as the view exceeds the window of legibility: that is, the angles from which
the perspective may be read. Outside that windowlooking at a video sideways,
for examplethe surface becomes two-dimensional and all the information falls
flat on the surface of the screen (or canvas). What can a designer do with that
knowledge? If those who teach the eager learner how to look at a painting (and
not just to stare at it), and to see what is there, are the historians and critics of
art, then these have a lot to teach the designer of screens.
5.3.1.3. Contrast and texture
And what about colour and contrast and texture? It is simply too early in the
history of multimedia to make confident generalizations about them. There are
a few basics dictated by technologye.g., having white text on textured background may result in the inability of browsers to print out the text in text mode,
because suppression of the background could result in white on white printingbut these technical glitches are relatively few. What is required is the
study of effective screens and a determination from experience (and not from
authority) of what works and what does not.
5.4. Animation
This principle applies most vigorously to the temporal, dynamic, over time
aspects of providing multimedia to the sense of sight. The employment of animation on any kind of computer risks the wrath of the same neo-puritans who
despise audio. The ultimate reason for this kind of dogmatism is an unwillingness to abandon the Gutenberg metaphor and, with it, the necessary admission that computers differ from books by one critical dimension: time. The
implementation by most of the major players on the World Wide Web of animated logos (e.g., both Netscape and Microsoft Internet Explorer) provides

TOWARDS AN AESTHETICS OF MULTIMEDIA

21

hope that some animation is being found acceptable by those with a serious
economic stake in the future development of this new medium.
5.4.1. An animation tool chest
5.4.1.1. Space and depth
The questions asked of an Old Master painting by an art historian need to be
asked of computer visuals as well, not only when they are stable or static but
also when they are in motion. The critical observer needs to be able to distinguish whether the action hugs the surface of the screen, or tries to protrude, or
falls back into the middle distance, or disappears into the vanishing point.
Likewise, the critical observer needs to be able to distinguish whether an
action is framed but still on the surface, or whether it is windowed and
removed back behind a proscenium, and be willing to tell the difference
between a frame, a window and a proscenium. Finally, the critical observer
needs to be willing to hypothesize about the effects of the various delimits and
depths available on a screen.
5.4.1.2. Add time to make sequence
Since animation takes place, by definition, over time, the problem of
sequence has to be faced. The critical observer needs to study how often an
action may be presented. If it is only once, straight through, from beginning
through middle to end, then the critical observer may need to think about how
that may differ from when there is repetition. The consequences need to be
articulated regarding whether an action can be interrupted, reversed, played
again in whole, in part, whether it must it be played again, how often, at what
intervals, and whether and how the user can intervene. The ultimate question
is just how do these possibilities affect the psyche of the Internet surfer or of
the multimedia client.
Time and Tense
With time comes also tense; in this case, past, present and future. The multimedia developer would do well to know whether a certain screen action depends
entirely on some historic precedent, is only influenced by the known and familiar, or comes wholly as a surprise. The student of the screen needs to know (1)
whether an action lays out its own rules by explanation or exposition, whether
it is self- or endo-referential, or (2) whether it appeals to knowledge, experience,
or assumptions common in the community and is thereby exo-referential, or (3)

22

FRANK L. BORCHARDT

whether it experiments with the largely or wholly novel. The developer needs
to be able to anticipate (1) whether a viewer can tell whats coming next
because the action is well known, (2) whether the viewer will recognize the
action as only resembling available models because it follows general outlines,
or (3) whether the viewer will be astonished by the action being entirely new
and unanticipated.
5.4.2. Animation and text
The modern computer is a universal mediator. If anything anyone can imagine by way of stimulus, image, action or information can be mediated, the
computer can be made to mediate it. By virtue of that potential, the computer
can also be made to resemble a book. This does not mean that that is the most
appropriate medium for it to emulate. A movie could also be made to project
the pages of a book on the silver screen, a page at a time, with plenty of opportunity to read before the page is turned. That it is possible does not mean that
such an implementation of cinema makes a lot of sense. In the long term, computers are no more effective page turners than motion picture film.
The same problem of the mediums matriel intrudes as intruded between
poetry and sculpture for Lessing. For the paper page, placing and maintaining
the shape of the data requires only a single intervention by a scribal pen or the
impression of an inked type. On the video screen, the phosphors have to be
bombarded many thousands of times a second to keep light there, and are all
ready to disappear into darkness on that many occasions per second, awaiting
only a power failure for a well-deserved rest. The same power failure may render the paper pages information as invisible as the CRTs, but it does not
change or eliminate the disposition of the data on the page. Bombarded phosphors causing light to emanate from a cathode ray tube (or its successor) are
materially different from photons bouncing, reflected, off a passive page.
Poetry is not sculpture.
5.4.2.1. Kinetic text
The single element in which this fundamental reality of the new medium is
hardest to accept is text. It would be better if a new word were invented alongside read to describe the process of taking information off the screen. The
vast majority of regular computer users do not employ the screen in the same
way that they employ the printed page. When the screen begins to appear too
much like the printed page, with a critical amount of too much information, the
common practice is to print a hard-copy version of the text. As many times as

TOWARDS AN AESTHETICS OF MULTIMEDIA

23

Horizontal scrolling (one-line, right-to-left, tickertape, New York Times Building)


Vertical scrolling (north-south, whole text,
WordStar ^QZ, ^QR)
Front-to-back scrolling (3-D Once upon a time,
in a galaxy far far away . . ., whole text)
Teletype (left-to-right, unit-at-a-time, user-determined limit on-screen retention of old information)
Reading Dynamics (frame or window moving
across text at user-controllable speed and width,
visibility of remaining text and whole text scrolling
user-determined)
RSVP, Rapid Serial Visual Presentation (word
at a time, large font, from one screen area)

Figure 6. Basic text presentation alternatives for a time-sensitive medium (all usertailorable).

this happens around the world every day, that number of times a user recognizes the difference between electronic and print media and acts on that difference. Experiments need to be undertaken whereby alternatives to the
present-day conventions for the representation of text on screen are explored.
Those toys which appear on the Web upon a search for animated text or text
animation provide countless instances of tortured text (growing, shrinking,
twirling horizontally or vertically, twisting, morphing, changing colour, etc.,
etc.), any one of which might be usefully employed to liven up a terminally
boring, static screen. They do not, however, represent genuine alternatives to
Gutenberg page screen presentation. Let us try to distinguish animated text
(or text animation) from alternatives to the static Gutenberg page by calling
them dynamic text and calling the greater category, including both animated
and dynamic text, kinetic text.
5.4.2.2. Dynamic text
The tickertape (and the analogous Java applet) permits information to flow
from right to left across the screen, usually at the bottom. By virtue of its presentation of continuous text, this application provides an alternative to animated text, which typically presents short, repetitive, phrase-length language.
By virtue of its presentation in motion, the tickertape also provides a text

24

FRANK L. BORCHARDT

presentation mode alternative to Gutenberg. The same could be said for user
controllable vertical scrolling, which older users might remember from the
WordStar word processing program (^QZ [up], ^QR [down] + n 19 to control
speed). Likewise, a front-to-back 3-D journey would accord with a dynamic,
non-print medium, as in Star Wars, Once upon a time, in a galaxy far, far
away . . .
A personal favourite of this author is teletypethat is, user controllable,
left-to-right text presentationfor the inclusion of which feature in the textbased version of CALIS (Computer Assisted Language Instructional System)
he is responsible. The normal process would be one letter at a time at a pace
comparable to the users normal reading speed. Variations could include one
word or several at a time. And then decisions would need to be made about
how much information to retain on the screen: a few or many lines, scrolled
off a line, or several, or a screenful at a time. One commercially proven strategy, Reading Dynamics, locates a frame of one, two or three words across a
text and moves at progressively greater speeds as the learner grows accustomed to the limitation. In a modern incarnation of the idea, decisions would
have to be reached as to whether the remainder of the text would be completely
invisible, greyed out or normal, how much text would appear on the screen
at one time, and how much would be retained after initial processing. Even
more radical is a strategy called RSVP (Rapid Serial Visual Presentation)
(Rubin & Turano, 1992, 1994), in which words in user controllable (large)
fonts are presented one at a time at (user controllable) speeds greatly exceeding normal reading speeds. It seems that, with a little practice (forty-five minutes might suffice), reading speeds could be dramatically increased, with
improved retention (http://www.vallier.com/tenax/cornix.html [8 January
1999]).
5.5. Video
Critical understanding of the nature of video and of the mechanics which govern its presentation is gradually making its way to the Web, though that understanding is still deeply dependent on the antecedent technology, cinema.
Video-analysis engines recognize the dividers that leave behind a shot or a
scene and do so by comparing adjacent frames and looking for segmentation
events:
1. Cuts and blank frames (breaks in the video stream on account of shifting to
another camera or editing to another image)

TOWARDS AN AESTHETICS OF MULTIMEDIA

25

2. Fades and dissolves (where a transition takes place over a number of frames)
3. Camera movement (where pan, tilt, dolly, truck, arc, or zoom [8
January 1999] may point to a change of scene)
4. Salient frames (where, for some reason other than camera movement, a large
proportion of the image in one segment is different from surrounding
frames)
Video-analysis software permits access to any segment in any order and can
select key frames from each scene and string them together to make a storyboard. Any resulting video can then be accessed by a standard Web browser
and annotated as necessary (Kaehms, 1998).
The segmentation events are the fundamental techniques of scene shifting
in cinema and appear to be equally applicable to video. They represent the
internal dynamics of these media, and are many (but not all) of the basic
dynamics that distinguish cinema and video from live performance. Making
these basic tools of video editing available to the Web would seem to be a great
step forward in allowing media-cognizant and media-appropriate screen and
site design. In and of themselves, however, they do not deal with audio.

6. SOME OBSTACLES
The use of colour in general, and of colour combined with text in particular,
is routinely subject to intense negative criticism, though bold, innovative, and
effective use of colour on the World Wide Web has begun to counter those
attacks. Colour was possible on the Gutenberg page, but as it was expensive
and complicated it was never really extended to text, except for the red of
rubrics. When the features used for presenting information on the electronic screen depart to any great degree from the Gutenberg model, the designer
is risking widespread misunderstanding and disapprobation. It is amazing to
observe how much caution is invoked when treating the electronic screen as
anything other than paper. The blink function in HTML, for example, is virtually never described without the advice not to use it, because it is irritating.
It happens also to be the first element a learner of HTML confronts which cannot be reproduced by the Gutenberg page.
The observation was made above that the inclination to employ the conventions of previous forms in the new form seems irresistible (Section 4.3).
To the extent that those features which depart from the Gutenberg page,

26

FRANK L. BORCHARDT

animation and audio, are downplayed in the development of multimedia applicationsthat is to say, employed not at all, naively or accidentally, or badly
and without thought or effortthe present state of the art may be said to be
located in a phase more or less comparable to the silent cinema: not really
multimedia, or just barely, by virtue of text and moving picture. One day a
more inclusive metaphor, one probably derived from the general category of
the modern cinema, will probably emerge to provide models for fully integrated and effective multimedia. In the meantime, before such a metaphor has
been found and some integration of the constituent media has been achieved,
perhaps a glance at the history of the arts will provide a precedent: Gesamtkunstwerk, or total work of art, as Richard Wagner grandiosely envisioned
his rewriting of the rules of the opera stage of his time (Wagner, 1850). In
one sense, this brings us back to our opening argument.
Until such time as preconceptions from the antecedent technologies are
abandoned, until such time as the Wagnerian or a comparable metaphor can
address the complex interrelations of the constituent media, it will probably be
necessary to investigate the components one at a timee.g., video independent of audio. When designers are ready to acknowledge the full set of consequences of dealing with a new medium, one might dare to propose standards
for (spoken and heard) speech accompanied by (seen) dynamic text. There is,
however, little sense in proposing standards for the coalescence of language
heard and seen in one experience in multimedia if no one has yet accepted the
inevitability of dynamic text to begin with. The pairing of speech and text
should not have to be so problematical. Adult foreign language learners would
certainly be grateful for a dynamic visual (textual) representation of natural
foreign speech, especially if it is spoken fast, especially if that speed elides the
beginnings and ends of words. If only for segmentation and articulation, such
a pairing of speech and text would be a godsend. Link dynamic text and natural speech with a talking head and one has a highly desirable and fullyfledged instance of an immensely useful multimedia application.
In order to achieve such an obvious good, one would have to abandon the
Gutenberg metaphor and accept a new metaphor, such as silent captioning for
the hearing impaired from television. Incidentally, there is probably a great
deal to be learned for ordinary, day-to-day applications from the study of handicapped needs for high technology, where the burden of the technological past
is explicitly under attack (Edwards, 1995; Williamson, et al., 1986).
The critical component of interactivitythat is, of user intervention and
dialogue with the computerhas been only lightly touched on here, in respect

TOWARDS AN AESTHETICS OF MULTIMEDIA

27

of speech recognition (Section 5.1.5). Multimedia does not, in and of itself,


necessarily imply interactivity. It would, however, be a sorry outcome indeed
if the full flowering of the medium did not find a great deal of room for user
inputting beyond point-and-click and the occasional ASCII essaythe dominant modes of user interaction at present.

7. CONCLUSION
Whatever direction multimedia takes as technology races ahead, the next
attempt to fathom the internal workings of multimedia is going to have to fold
the components together and determine how they interact and change and
adapt because of their internal context and interaction. To achieve that goal, it
would be well for designers and users alike to acquire the habit of critical
observation, first by abandoning the advice of the pundits, however well-meaning, then by diving directly into experience, and, thirdly, by assembling the
intellectual, critical tools for deciphering the mystery of a wonderful screen or
a powerful application. It would be a good start to make a list of sites one is
convinced are well designed through personal experience. Then it would be
well to identify what qualities the successful sites have in common. Some serious reflection on the nature of the matriel would be in order, adducing logic,
analogy and experience to the act of reflection; perhaps then a designer might
want to construct instruments like those proposed here (Figs. 2 and 3) but better, to help in visualizing distinctions and similarities (Figs. 4 and 5). Perhaps
the most important distinctions which can be made are those which accurately
assign certain practices to the taste of the time or the taste of the designer or
the user, and other practices to the internal dynamics of the medium, determined by the matriel. A strong bridge between the two, the subject perceiving the medium, penetrating to its dynamics and awakening them from
dormancy, is what will make multimedia irresistible.

REFERENCES
Andriole, S.J. (1995) Cognitive Systems Engineering for User-Computer Interface Design,
Prototyping, and Evaluation. Hillsdale, NJ: Lawrence Erlbaum.
Borchardt, F.L. (1998) On the history and aesthetics of screen design (or, why do most screens
put learners to sleep?), in K. Cameron (ed.) Multimedia CALL: Theory and Practice.
Exeter: Elm Bank Publications, pp.310.

28

FRANK L. BORCHARDT

Comenius, J.A. (1672, reprinted 1967) Orbis Sensualium Pictus, ed. James Bowen. Sydney:
Sydney University Press.
Cox, K. & Walker, D. (1993) User Interface Design. New York: Prentice Hall.
Edwards, A.D.N. (ed.) (1995) Extra-ordinary HumanComputer Interaction: Interfaces for
Users with Disabilities. Cambridge series on humancomputer interaction, 7.
Cambridge: Cambridge University Press.
Flaherty, G. (1978) Opera in the Development of German Critical Thought. Princeton: Princeton
University Press.
Gardiner, M.M. & Christie, B. (eds) (1987). Applying Cognitive Psychology to User-Interface
Design. New York: Wiley.
Helander, M. (ed.) (1988) Handbook of HumanComputer Interaction. Amsterdam: North
Holland.
Kaehms, B. (1998) Parsing video, Web Techniques 3 (12): 66.
Kuchinskas, S. (1998) Raising the roof on web design: A look below the surface reveals
another realm, Web Techniques 3 (7): 435.
Kukulska-Hulme, A. (1998) Language and Communication: Essential Concepts for User
Interface and Documentation Design. New York: Oxford University Press.
LaRocca, S. (1994) A voice interactive multimedia language laboratory, in Proceedings of the
1994 CALICO Annual Symposium Human Factors, Durham, NC. pp.1457.
Laurel, B. (ed.) (1991) The Art of HumanComputer Interface Design. Reading, MA: AddisonWesley.
Lessing, G.E. (1766, reprinted 1965) Laocoon: An Essay upon the Limits of Painting and Poetry,
trans. Ellen Frothingham. New York: Noonday Press.
McLuhan, M. (1962) The Gutenberg Galaxy: The Making of Typographic Man. Toronto:
University of Toronto Press.
Rubin,G.S. & Turano, K. (1992) Reading without saccadic eye movements, Vision Research
32 (5): 895902 .
Rubin,G.S. & Turano, K. (1994) Low vision reading with sequential word presentation, Vision
Research 34 (13): 172333
Salvendy, G. (1997) Handbook of Human Factors and Ergonomics. New York: J. Wiley.
Salvendy, G. & Smith, M. (1993) HumanComputer Interaction: Software and Hardware
Interfaces. Amsterdam: Elsevier.
Shneiderman, B. (1987) Designing the User Interface. Reading, MA: Addison-Wesley.
Treu, S. (1994) User Interface Design: A Structured Approach. New York: Plenum.
Wagner, R. (1850) Das Kunstwerk der Zukunft, in W. Golther (ed.) Gesammelte Schriften und
Dichtungen 3. Berlin: Bong, pp.42177.
Williamson, N.L., Muter, P. & Kruk, R.S. (1986) Computerized presentation of text for the
visually handicapped, in E. Hjelmquist & L.G. Nilson (eds) Communications and
Handicap: Aspects of Psychological Compensation and Technical Aids. Amsterdam:
Elsevier.