Вы находитесь на странице: 1из 41

ABSTRACT

Anytime anywhere Internet access has become the goal for current
technology vendors.

The World Wide Web (WWW) has seen tremendous

growth in the recent years and has become the primary source of information
all over the world. Computing power today is increasingly moving away from
the desktop computer to wireless access medium such as mobile phones and
mobile computing devices such as PDAs and tablet PCs. The challenge that is
presented to the present Internet world, is to make the enormous web content
accessible to such users as well as visually impaired users. The existing web
infrastructure was designed for traditional desktop browsers and not for handheld devices. The data in the web is stored in HTML (Hyper Text Markup
Language) format which is not suited for devices that has less processing
power, limited screen size, constrained memory and input capabilities. Voice
has always been an accepted medium of user interaction, which greatly
simplifies the input process. The development of interactive voice browsers
which uses an improved means of voice recognition and efficient Text-tospeech (TTS) engines has made it possible for the mobile users to access the
Internet. This can be done using VoiceXML (VXML), a standard markup
language.
Just as a web browser renders HTML documents visually, a
VoiceXML interpreter renders VoiceXML documents aurally. Since the
documents are rendered aurally they can be heard over the phone.
Applications using VXML are considered cost-effective and convenient to
interact with a software application without human interaction or any
expensive computing devices.
1

Chapter -1
INTRODUCTION
1.1 Overview:
VoiceXML is a derivative of World Wide Web consortiums (W3C) XMLThe Extensible Markup Language. VoiceXML is designed for creating audio
dialogs that combine speech, audio digital, speech recognition, DTMF key
input, recorded or synthetic speech, and telephony. The major goal is to bring
web-based development and content delivery to interactive voice respond
applications.

1.2 What is XML?


XML = Extensible Markup Language
A flexible way to create common information formats and share both the
format and the data on the web
Example:
computer makers might agree on a standard or common way to describe the
information about laptop computers (processor speed, memory size, and so on)
and then describe the product information format with XML.
Benefits:
easy information interchange

1.3 What is VoiceXML?


VoiceXML = Voice eXtensible Markup Language: a dialog markup language
HTML assumes a graphical web browser, with keyboard, display and mouse
VoiceXML extends the web to voice- based devices: audio input and output
2

VoiceXML can be thought of as a markup language for voice, like HTML is


for text.
VoiceXML is used extensively for speech recognition and application
development.
Since VoiceXML is XML, we must adhere to the basic XML rules

1.4 Evolution of VoiceXML


Top 4 companies including AT&T, Lucent Technologies, Motorola and IBM
have developed VXML(Voice Extensible Markup Language) technology as a
collaboration .
The evolution of VXML can be explained using the graphical representation
shown below.

1.5 Graphical Browser V/s Voice Browser

Since VoiceXML documents contain only forms and blocks, it is sensible to


compare HTML and VoiceXML in terms of forms. Basically, forms are used in
HTML for obtaining user input. In VoiceXML also the user interaction is
provided through forms, but this form is not as powerful and effective as the
HTML form, for the reasons mentioned below. Any HTML document is a
single unit, which is fetched from an URL and presented to the users as it is,
with its full efficiency. But VoiceXML documents contain a number of forms
and dialogs and each has to be delivered to the user through audio in a
sequential manner. This is due to visual mediums capability to display
numerous items in parallel and inherently sequential nature of the voice
medium. Hence even though it is claimed that the HTML form can be
converted to its equivalent VoiceXML counterpart, it is structurally different
and has some shortcomings.
Feature

Graphical Browser
4

Voice Browser

Language
Browser output

HTML
VoiceXML
Text and images laid out Streaming audio and TTS
according to mark- up spoken according to marktags

up tags; also pre- recorded

User input
Resources

sound files
Keyboard and mouse
DTMF and spoken Voice
HTML pages, images, VoiceXML pages, speech

retrieved from

Java applets, ActiveX recognition grammars, sound

Web server
Hyperlinking

objects
files & streaming audio
Click on 'hotspot' text or Say 'hotspot' word (eg 'help'),
images, or submit form

Visual Web Model


The user requests a page by opening

or submit form

Voice Web Model


The user dials in to a particular

their browser and typing in a URL.


phone number or extension.
The browser makes an HTTP request The voice browser sends an HTTP
to a server for that HTML page.

request for the VoiceXML document


to a server determined from the

The browser consumes the HTML to

dialed number.
The voice browser renders the

create a visual page that responds to

VoiceXML as a sequential dialog,

user input through the keyboard and

consisting of prompts using TTS or

mouse.

recorded audio. Input is through


speech or DTMF.

Chapter 2
VoiceXML Concepts and Features.
2.1 VXML Concepts
A VoiceXML application consists of a set of documents that describe a
conversational finite state machine. The user is always in one conversational
state, or dialog, at a time. Each dialog determines the next dialog to transition
to. Transitions are specified using Universal Resource Identifiers (URI)
pointing to the next document and dialog to use. If a URI does not refer to a
document, the current document is assumed and, if it does not refer to a dialog,
the first dialog in the document is assumed. Execution is terminated when a
dialog does not specify a successor, or if it has an element that explicitly exits
the conversation.

2.2 VoiceXML Scenario: What Happens Underneath


User dials the phone number of the required service. Here consider
Bookseller service.
The VoiceXML gateway at the Voice Server Provider (VSP) hosting the
service receives the call along with information about the call, such as
dialed number and dialing number.
The VoiceXML gateway searches a database and maps the dialed
number to a URL, which is the location of the book service's VoiceXML
main page (books.vxml) on a Web server.
The VoiceXML gateway retrieves books.vxml and may retrieve
associated files, such as grammars or recorded audio, if specified.
6

The associated files may be cached on the VoiceXML gateway.

The VoiceXML gateway interprets the VoiceXML, stepping through


books.vxml and interacting with the user as defined by the application.
As necessary, additional VoiceXML and associated files are downloaded
from the Web server.

2.3 VoiceXML Features


Just as a user can interact with a HTML page, he/she can also interact with a
VXML page.
The notable features in VXML are:
1. Recognition of spoken/DTMF (Dual Tone Multiple Frequency) input.
Here DTMF refers to pressing Telephone keys.
2. Assigning spoken input to variables in the document and making
decisions based on the assigned values to the variables.
3. Playing synthesized speech, audio files with the help of Text-to-Speech
(TTS) converter.
4. Linking to other documents/other areas of the same document as an
HTML file would do.
Using voice as a potential input and output medium opens up
many new possibilities in terms of, not only mobile access to information, but
also access for the disabled. Aural interfaces just require the use of a regular
telephone to access information stored in our databases. Today one can find
telephones almost everywhere and mobile phones are far more portable and
accessible than computers. Therefore the support of voice browsers was just a
logical step towards realizing our vision of ubiquitous information
==>information for everyone, everywhere. Even in the case of WAP-enabled
7

mobile phones, and the use of a voice interface may be a much more
convenient means of accessing information. Navigation by voice is by far
more pleasant and faster than the use of touch-tone input or entering
information using the small keypads of mobile WAP phones. Also, in some
situations, voice output may be preferred over visual output. For example, a
person may perform a manual task, while simultaneously receiving
information via a voice interface. Just think of an employee driving to his
office by car. He can listen to the news on the companys web portal site while
his eyes are concentrating on the traffic. With respect to the disabled, voiceenabled applications are valuable to users who can either not use their hands
for keyboard input or their eyes to process visual output. Further, voice
interfaces require no special instruction or experience. They also allow new
forms of human-computer interaction based on a combination of visual and
voice interfaces. We can build applications, which are either fully based on
voice or use speech technology to augment existing graphical user interfaces.

2.4 Goals of VoiceXML


VoiceXMLs main goal is to bring the full power of web development and
content delivery to voice response applications, and to free the authors of such
applications from low-level programming and resource management. It
enables integration of voice services with data services using the familiar
client-server paradigm. A voice service is viewed as a sequence of interaction
dialogs between a user and an implementation platform. The dialogs are
provided by document servers, which may be external to the implementation
platform. Document servers maintain overall service logic, perform database
and legacy system operations, and produce dialogs. A VoiceXML document
specifies each interaction dialog to be conducted by a VoiceXML interpreter.
8

User input affects dialog interpretation and is collected into requests submitted
to a document server. The document server may reply with another VoiceXML
document to continue the users session with other dialogs.
VoiceXML is a markup language that:
Minimizes client/server interactions by specifying multiple interactions
per document.
Shields application authors from low-level, and platform-specific
details.
Separates user interaction code (in VoiceXML) from service logic (CGI
scripts).
Promotes service portability across implementation platforms.
VoiceXML is a common language for content providers, tool providers,
and platform providers.
Is easy to use for simple interactions, and yet provides language features
to support complex dialogs.
While VoiceXML strives to accommodate the requirements of a
majority of voice response services, services with stringent requirements may
best be served by dedicated applications that employ a finer level of control.

Chapter -3
VoiceXML Architechture and Language Features
3.1 Evolution of VXML Architecture

10

3.2 Architechture

The User first contacts the web server requesting for VXML pages. This
request is directed to the VoiceXML interpreter context for initial interaction,
like recognizing the call etc. Later it is passed to the VXML interpreter which
takes care of the dialog to be played which may involve getting inputs from the
user. At this point it may involve getting inputs from the user. Here certain
grammars (rules to recognize input, discussed later), may be active to validate
the input and to switch to another sub-dialog based on the input. The
VoiceXML interpreter context also has certain active grammars which may be

11

looking for phrases from the user, which would take the user to a different
level, like exiting from the web-page.

VoiceXML is the most important standard for speech applications


to come out of W3C activity. VoiceXML is a standard for the development of
speech based telephony applications. It supports creation of voice dialogs of
various types. A VoiceXML application (organized as documents) can be
viewed as a finite-state machine (FSM), where the user must be in one of the
states (corresponding to dialogs). Transitions in this FSM are represented by
URI references to other dialogs, which may be within the same VoiceXML
document, or links to other documents. VoiceXML provides support for three
kinds of dialog elements forms, menus, and sub-dialogs. Forms provide
information to the user, collect the resulting user input, and interpret the
meaning of the input using XML grammars. Menus allow the user to navigate
through a series of alternative choices, and allow the user to transition to other
dialogs, which may be in the same or in different documents. Sub-dialogs
allow the user to transition to another dialog, and return when finished (like a

12

function call). A sub-dialog might confirm a users action or handle specific


tasks.

3.2.1 VXML INTREPRETER

A document server (e.g. a web server) processes requests from a client


application, the VoiceXML Interpreter, through the VoiceXML interpreter
context. The server produces VoiceXML documents in reply, which are
processed by the VoiceXML Interpreter. The VoiceXML interpreter context
may monitor user inputs in parallel with the VoiceXML interpreter. For
example, one VoiceXML interpreter context may always listen for a special
escape phrase that takes the user to a high-level personal assistant, and another
may listen for escape phrases that alter user preferences like volume or text-tospeech characteristics. The implementation platform is controlled by the
VoiceXML interpreter context and by the VoiceXML interpreter. For instance,
in an interactive voice response application, the VoiceXML interpreter context
13

may be responsible for detecting an incoming call, acquiring the initial


VoiceXML document, and answering the call, while the VoiceXML interpreter
conducts the dialog after answer. The implementation platform generates
events in response to user actions (e.g. spoken or character input received,
disconnect) and system events (e.g. timer expiration). Some of these events are
acted upon by the VoiceXML interpreter itself, as specified by the VoiceXML
document, while others are acted upon by the VoiceXML interpreter context.
There are 47 elements (or tags) in the VoiceXML format
compared to 91 for HTML. VoiceXML is capable of something as simple as
delivering content over the phone or as complex as a full-fledged E-Commerce
application. No introduction is complete without a Hello World example.
Example 1.
<?xml version="1.0"?>
<vxml version="1.0">
<form>
<block>Hello World!</block>
</form>
</vxml>
The example above will synthesize "Hello World!" and then exit

3.3 VoiceXML as a Language


VoiceXML can be thought of as a markup language for voice, like HTML is
for text. The language describes the human-machine interaction provided by
voice response systems, which includes:
Output of synthesized speech (text-to-speech).
Output of audio files.
14

Recognition of spoken input.


Recognition of DTMF input (touch tone).
Recording of spoken input.
Telephony features such as call transfer and disconnect.

3.3.1 Dialogs & Sub dialogs


Dialogs are basically a set of executable commands in VoiceXML. Forms are
used to obtain inputs from the user through voice or DTMF. Menu provides the
user with a list of options, and it transits to that Universal Resource Identifiers
(URI) based on the choice. A sub-dialog is just like a function call in any
programming language. A sub-dialog is called from a dialog and then the
control is shifted to the sub-dialog and once the sub-dialog is over, the control
is returned to the dialog, which invoked the sub-dialog.
There are two kinds of dialogs: forms and menus. Forms define
an interaction that collects values for a set of field item variables. Each field
may specify a grammar that defines the allowable inputs for that field. If a
form-level grammar is present, it can be used to fill several fields from one
utterance.
Fields are the major building blocks of forms. A field declares a
variable and specifies the prompts, grammars, DTMF sequences, help
messages, and other event handlers that are used to obtain it. Each field
declares a VoiceXML field item variable in the form dialog scope. These may
be submitted once the form is filled, or copied into other variables. For
Example:
<form id="balance_info">
15

<block>Welcome to the account balance inquiry


service.</block>
<field name="account" type="digits">
<prompt>What account number?</prompt>
<catch event="help">
Please speak the account number for which you
want the balance.
</catch>
</field>
<field name="pin" type="digits">
<prompt>Your PIN?</prompt>
</field>
<block>
<submit next="/servlet/balance" namelist="account pin"/>
</block>
</form>
Each field has its own speech and/or DTMF grammars, specified
explicitly using <grammar> and <dtmf> elements, or implicitly using the type
attribute. The type attribute is used for standard built-in grammars, like digits,
boolean, or number. The type attribute also governs how that field value is
spoken by the speech synthesizer.
Each field can have one or more prompts. If there is one, it is
repeatedly used to prompt the user for the value until one is provided. If there
are many, they must be given count attributes. These determine which prompt
to use on each attempt.

16

A menu presents the user with a choice of options and then


transitions to another dialog based on that choice.
<menu>
<prompt>Welcome to Big Buck's Bank. Say one of: <enumerate/></prompt>
<choice next="/servlet/account.vxml">
Account
</choice>
<choice next="http://www.ft.com/news.vxml">
News
</choice>
<choice next="/servlet/operator.vxml">
Operator
</choice>
<noinput>Please say one of <enumerate/></noinput>
</menu>
A sub-dialog is like a function call, in that it provides a
mechanism for invoking a new interaction, and returning to the original form.
Local data, grammars, and state information are saved and are available upon
returning to the calling document. Sub-dialogs can be used, for example, to
create a confirmation sequence that may require a database query; to create a
set of components that may be shared among documents in a single
application; or to create a reusable library of dialogs shared among many
applications.

17

3.3.2 Sessions
A session begins when the user starts to interact with a VoiceXML interpreter
context, continues as documents are loaded and processed, and ends when
requested by the user, a document, or the interpreter context.

3.3.3 Application

An application is a set of documents sharing the same application root


document. Whenever the user interacts with a document in an application, its
application root document is also loaded. The application root document
remains loaded while the user is transitioning between other documents in the
same application, and it is unloaded when the user transitions to a document
that is not in the application.
18

3.3.4 Grammars
Each dialog has one or more speech and/or DTMF grammars associated with
it. In machine directed applications, each dialog grammars are active only
when the user is in that dialog. In mixed initiative applications, where the user
and the machine alternate in determining what to do next, some of the dialogs
are flagged to make their grammars active (i.e., listened for) even when the
user is in another dialog in the same document, or on another loaded document
in the same application. In this situation, if the user says something matching
another dialog active grammars, execution transitions to that other dialog, with
the user utterance treated as if it were said in that dialog. Mixed initiative adds
flexibility and power to voice applications.
<link event="help">
<grammar type="application/x-jsgf">
[please] help [me] [please] |
[please] I (need|want) help [please]
</grammar>
</link>

3.3.5 Events
VoiceXML provides a form-filling mechanism for handling "normal" user
input. In addition, VoiceXML defines a mechanism for handling events not
covered by the form mechanism.
Events are thrown by the platform under a variety of circumstances, such as
when the user does not respond, doesn't respond intelligibly, requests help, etc.
The interpreter also throws events if it finds a semantic error in a VoiceXML
document. Events are caught by catch elements or their syntactic shorthand.
19

Each element in which an event can occur may specify catch elements. Catch
elements are also inherited from enclosing elements "as if by copy." In this
way, common event handling behavior can be specified at any level, and it
applies to all lower levels.
<catch event="help">
Please speak the account number for which you
want the balance.
</catch>

3.3.6 Links
A link supports mixed initiatives. It specifies a grammar that is active
whenever the user is in the scope of the link. If user input matches the link
grammar, control transfers to the link destination URI. A can be used to throw
an event to go to a destination URI
<link next="/servlet/account.vxml">
<grammar type="application/x-jsgf">
account | Account balance inquiry
</grammar>
<dtmf>1</dtmf>
</link>

3.3.7 Form Interpretation Algorithm


The form interpretation algorithm (FIA) drives the interaction between the user
and a VoiceXML form or menu. A menu can be viewed as a form containing a
20

single field whose grammar and whose <filled> action are constructed from
the <choice> elements.

The FIA must handle:


Form initialization.
Prompting, including the management of the prompt counters needed
for prompt tapering.
Grammar activation and deactivation at the form and form item levels.
Entering the form with an utterance that matched one of the forms
document-scoped grammars while the user was visiting a different form
or menu.
Leaving the form because the user matched another form, menu, or
links document-scoped grammar.
Processing multiple field fills from one utterance, including the
execution of the relevant <filled> actions.
Selecting the next form item to visit, and then processing that form item.
Choosing the correct catch element to handle any events thrown while
processing a form item.

Chapter -4
Practical Applications of VoiceXML.
21

4.1 Absentee System application


4.1.1 Introduction
In this a VoiceXML absentee system is shown that enables students to
telephone in their class about their absence that is recorded in a university
database. This application is suitable for any reasonably sized organization for
a cost-effective and convenient way to record employee absences by having
them interact directly with a computer with a telephone.
The Absentee System application was developed basically for
Pace University students to report class absences. This application is suitable
to be modified to work for any department or organization. The application
had been developed using PHP on mySQL database at Pace University.

4.1.2 Features
The VoiceXML Absentee System has been designed to receive and keep
records of absentee calls from students, faculty, and university staff. Two
interfaces were created for this application.
1) A web interface that will provide enrollment to this service and access to
information of the absences and
2) A phone interface (VXML) where users call in to record their absence.
The user must first enroll via the web prior to using the phone service.
The user will enter some pertinent information and create a unique
userId and password to subsequently enter the Absentee System via a
telephone.
22

4.1.3

Working

All users will provide the following information when enrolling on the web: a
name, email address, a unique userId and a password. The faculty may be
asked to enter the course Ids and the semester they are teaching those courses,
which means that the faculty may update this information every semester. Staff
members may be asked to enter the campus they are working at, department
they are working in and whether or not they are managers. The faculty and
staff provide this additional information so that they are able to view their
students or their staff members absentee records, rather than viewing all
students or staff members absentee records.
Each user will be categorized into one of four login types, which
are as follows: student,staff, faculty or administrator. The administrator will
have access to all information provided by the system, as well as creating an
additional administrator user id and password. Access to some information will
be granted to some users, such as instructors and employers, who will be able
to view the absence records for their courses or departments, respectively.
When the user calls the system s/he will be asked to enter the user
id and password. Upon successfully entering into the system, the user will go
through a series of questions. The login information will determine which
category the user is in and the appropriate questions will be asked. If the user
is a student or instructor then the system will ask for the users course Id for
the class that will be missed and date the class will be missed. If the user is a
member of the university staff then the system will ask for the day that the user
will be absent. All the information obtained by the system will be stored into a

23

database, which the authorized users can access via the web. The information
can be viewed on the web.

4.1.4: PARTIAL CODE


The absentee application capturing the course and date of absence are shown
in the below:
(partial VXML Code from absentee application)
<?xml version="1.0" ?>
<!DOCTYPE vxml (View Source for full doctype...)>
<vxml application="http://resources.tellme.com/lib/universals.vxml">
<form id="crsdate" anchor="false">
<var name="username" />
<block>
<assign name="username" expr="" />
</block>
<block>
<prompt> Hello
</prompt>
</block>
<field name="courseid" timeoutondtmf="false" confirm="no" bargein="true"
magicword="false" phoneticpruning="false">
<prompt>Please say the course i d</prompt>
<grammar type="application/x-gsl" mode="voice">
<![CDATA[
[
(Char_speak:d1 Char_speak:d2 Dig_speak:d3 Dig_speak:d4
24

Dig_speak:d5) {<option strcat($d1 strcat($d2 strcat($d3 strcat($d4


$d5))))>}
(cancel) {<option "cancel">}
]
Char_speak
[
c{return(c)}
s{return(s)}
]
Dig_speak
[
one {return(1)}
two {return(2)}
three {return(3)}
four {return(4)}
five {return(5)}
six {return(6)}
seven {return(7)}
eight {return(8)}
nine {return(9)}
[zero oh] {return(0)}
]
]]>
</grammar>
<filled>
<prompt>
You said
<value expr="courseid" />
25

</prompt>
</filled>
<catch event="nomatch" count="10">The course i d is invalid Please say the
course i d.</catch>
<noinput>
I did not understand the course i d.
<reprompt order="curr" />
</noinput>
</field>
<field

name="date_absent"

timeoutondtmf="false"

confirm="no"

bargein="true"
magicword="false" phoneticpruning="false">
<prompt>What is the date you will be absent?</prompt>
<grammar type="application/x-gsl" mode="voice">
<![CDATA[
TELLME_DATE
]]>
</grammar>
<catch event="nomatch" count="10">I am sorry I can not understand. Please
repeat the date you will be absent.</catch>
<noinput>
I did not understand the date.
<reprompt order="curr" />
</noinput>
<filled>
<goto next="#finddate" method="get" />
</filled>
</field>
26

</form>
<form id="finddate" anchor="false">
<block>
<script>
<![CDATA[
var date1 = vxmldata.get("date_absent");
var myDate='';
var myYear='';
var myMonth='';
var myDaten='';
var mySpecial='';
var myMonthc = '';
function ParseGrammar2(sGramResult) {
myDate = '';
var cMonth = new
Array('january','february','march','april','may','june','july','august','septe
mber','october','november','december');
var arrNames = [];
var arrValues = [];
var arrNamesValues = GramResult.split('^');
for (var i = 0; i < arrNamesValues.length; i++) {
var arrNameValuePair =
arrNamesValues[i].split('=');
arrNames[i] = arrNameValuePair[0];
arrValues[i] = arrNameValuePair[1];
if (arrNames[i] == 'month')
{myMonthc = arrValues[i] };
if (arrNames[i] == 'date')
27

{myDaten = arrValues[i] };
if (arrNames[i] == 'year')
{myYear = arrValues[i] };
if (arrNames[i] == 'special_date')
{mySpecial = arrValues[i] };
}
var i;
for (i=0;i < cMonth.length;i++){
if (cMonth[i] == myMonthc) break;
}
myMonth = i + 1;
myDate = myYear + '-' + myMonth + '-'
+ myDaten ;
} // eo function
ParseGrammar2(date1);
]]>
</script>
</block>
<block>
You said,
<value expr="myMonthc" />
,
<value expr="myDaten" />
</block>
<field name="yesno" timeoutondtmf="false" confirm="no" bargein="true"
magicword="false" phoneticpruning="false">
<prompt>Is this the date you said?. say, 'yes' or 'no'.</prompt>
<grammar type="application/x-gsl" mode="voice">
28

<![CDATA[
YES_NO
]]>
</grammar>
<catch event="nomatch" count="3">I am sorry I can not understand. Please
repeat your response. yes or no.</catch>
<filled>
<prompt>
you said
<value expr="yesno" />
</prompt>
<if cond="yesno != 'yes'">
<prompt>Please re-enter Course and Date of Absent</prompt>
<goto next="#crsdate" method="get" />
<else />
<submit next="saveAbsenteeRecord.php" method="post"
namelist="courseid myYear myMonth myDaten username"
/>
</if>
</filled>
</field>
<block>
<submit next="saveAbsenteeRecord.php" method="post" namelist="courseid
myYear myMonth myDaten username" />
</block>
</form>
</vxml>
29

4.2 THE PITTSBURGH BUSLINE


4.2.1 Introduction
The Pittsburgh Busline application is a telephone-based system that provides
schedule information about buses traveling in and out of Pittsburghs
University neighborhood. The Busline systems were primarily developed
using an early implementation of VoiceXML 1.0. Bus schedule information is
a relatively simple domain, requiring only three pieces of information from the
user: the departure location, the bus route or routes that the user is interested
in, and the direction or destination of travel.

4.2.2 Working And Sample Dialog


The two Pittsburgh Busline systems were developed concurrently and
independently. Here each system uses a different order in which to solicit
information from the user. System A asks first for location, then direction, and
finally route, whereas system B asks for location, route, and then direction.
Figures 1 and 2 show the call flow for the two systems.

30

System A development focused on novices, aiming for a helpful,


informative system. This system has extensive help information so that the
user needs minimal domain knowledge to get information. System A
development focused on novices, aiming for a helpful, informative system.
This system has extensive help information so that the user needs minimal
domain knowledge to get information . This was partly in response to informal
feedback from users during the early developmental stages. Specifically, once
the system has the location and direction, it can provide the novice user with
all the possible bus routes that travel in the direction specified. At the same
time, it was also designed to take the least possible amount of time to complete
the task, so it allows expert users to just say the information (even before
prompting for it), and to quickly flow through the dialog . This proved
particularly useful for this task, since most people calling the system had an
immediate need for the information (for example, hoping to catch the very
next bus). See Fig. A for a sample dialog.

31

Figure A: Sample dialog using Busline A.

System B, on the other hand, was developed with expert users in


mind. Since the focus of this system was to provide information as quickly and
easily as possible to someone familiar with the domain (i.e. a frequent bus
rider), it was designed to eventually make use of a user profile database. The
system expects the user to be familiar with the city bus system, and thus be
able to provide the correct bus number(s) for the desired destination. As a
result, system B has less help information but allows the user to specify
multiple bus routes in a single query, instead of cycling through the dialog
multiple times. The system also allows the user to barge-in; an example is
marked by * in Fig. B, which shows a short basic dialog (i.e. without any
requests for help, undo, etc.).

32

Figure B: Sample Busline B dialog.

In both Busline applications, a dialog manager (VoiceXML code)


is responsible for filling all three slots from the user, then building a query
with those values, and finally initiating a backend retrieval through a CGI
invocation.
Both systems allow the user to correct an utterance that was
misunderstood, start over from the beginning, request help, have the last
question repeated, and make multiple queries in the same call. Both systems
also used recorded prompts instead of the platform's built-in TTS system,
except for dynamically generated query responses, which could not be
prerecorded.

33

4.2.3 Conclusion
Since the Busline domain supports a single task, it seemed natural to
implement the systems using a system-initiative policy, prompting the user for
each of the required slots. Even had it been possible to create a more openended interaction, it seems that for this particular type of task, having the
computer drive the dialog worked quite well. It could also be said that this is
the most efficient way to fill the three slots to retrieve the information: as with
most simple information retrieval tasks, a significant determiner of success is
the users ability to remember what the system needs to know. Transferring
this responsibility to the system increases its usability.

34

Chapter 5
Commercial Applications of VXML

Information retrieval applications: Output tends to be pre-recorded


information, and voice input is often constrained to a few navigation
commands and limited data entry (e.g., "previous," "next" to control the data
flow). Information retrieval applications can provide news, sports, traffic,
weather, and stock information, as well as more specialized information (e.g.,
intranet-based company news). Voice output could be used extensively in
applications, for instance to give driving directions.
Electronic commerce: Customer service applications such as account status
(see our earlier example), package tracking, and call centers are well-suited.
Financial applications for banking, stock quotes and trading, seem feasible,
too.
Telephone services: Voice dialing, telephone conference room management
can be voice-enabled using VoiceXML. An organization can make available a
voice Web site with company information, news, upcoming events, and an
address book. The address book could be used in voice dialing people in that
organization.
Unified messaging applications: E-mail messages can be read over the
phone, outgoing e-mail can be recorded (and in the future transcribed) over the
phone, and voice-oriented address information can be synchronized with
35

personal organizers and e-mail systems. Pager messages can be originated


from the phone, or routed to the phone.
Intranet applications: such as inventory control, supply chain management,
and human resource services can be voice-enabled with VoiceXML since the
security mechanisms of the Web apply there, too.
There are many other areas where voice services will be used.
While all VoiceXML services will benefit visually impaired people, it may be
that other VoiceXML services will be specially created for this community.
Even though this voice recognition technique is still limited to the
English speaking countries, it is expected to bring sweeping changes to the
paradigm of customer marketing and multilingual support in the future.
Voice-enabled applications will grow by leaps and bounds in the next couple
of months, and any service that can be requested through an HTML form could
also be made available through VoiceXML. If a clean distinction between logic
and presentation exists in your scripts and servlets for Web-based applications,
these might even be reusable to power voice applications, just changing the
presentation layer from HTML to VoiceXML. Good application architecture
pays off sometimes.

36

Chapter 6
CONCLUSION

VXML will have profound impacts, changing the way we use the phone - and
perhaps the design of phones themselves - as well as changing the nature and
evolution of the Web. By making it easier to program Web applications for
voice access, VXML can bring high efficiency to call center and intranet
development. Applications using VXML are considered cost-effective and
convenient to interact with a software application without human interaction or
any expensive computing devices.
Traditional Interactive Voice Response (IVR) systems have been
expensive to deploy and maintain because they require a mastery of
proprietary tools and technologies, expensive hardware and professionals
trained on specific software and hardware. VoiceXML enables effective
exploration of dialog system design. Commercial VoiceXML development
environments offer a relatively easy entry point that allows diverse dialog
systems to be built.

37

Bibliography

1) VoiceXML Absentee System,


http://csis.pace.edu/csis/masplas/plo.pdf
2) Mixed-Initiative Interaction = Mixed Computation,
http://perez.cs.vt.edu/publications/2002/miimc.pdf
3) Aural Interfaces to Databases Based on VOICEXML
www.globis.ethz.ch/publications/docs/2002a-sngh-vdb.pdf
4) BeVocal: http://cafe.bevocal.com
5) VoiceXML: http://www.w3.org/TR/VoiceXML2.0
6) Tellme: http://studio.tellme.com
7) Periphonics: http://nortelnetworks.com/products/04/oscar

38

Appendix A
GLOSSARY OF TERMS

Application
A collection of VoiceXML documents that are tagged with the same application
name attribute.
Dialog
An interaction with the user specified in a VoiceXML document. Types of
dialogs include forms and menus.
Event
A notification thrown by the implementation platform, VoiceXML
interpreter context, VoiceXML interpreter, or VoiceXML code. Events include
exceptional conditions (semantic errors), normal errors (user did not say
something recognizable), normal events (user wants to exit), and user defined
events.
Form
A dialog that interacts with the user in a highly flexible fashion with the
computer and the user sharing the initiative.
Link
A set of grammars that when matched by something the user says or keys in,
either transitions to a new dialog or document or throws an event in the current
form item.
Menu
A dialog presenting the user with a set of choices and takes action on the
selected one.
Session
A connection between a user and an implementation platform, e.g. a telephone
call to a voice response system. One session may involve the interpretation of
more than one VoiceXML document.

39

Sub-dialog
A VoiceXML dialog (or document) invoked from the current dialog in a
manner analogous to function calls.
User
A person whose interaction with an implementation platform is controlled by a
VoiceXML interpreter.
URI
Uniform Resource Indicator.
URL
Uniform Resource Locator.
VoiceXML document
An XML document conforming to the VoiceXML specification.
VoiceXML interpreter
A computer program that interprets a VoiceXML document to control an
implementation platform for the purpose of conducting an interaction with a
user.
VoiceXML interpreter context
A computer program that uses a VoiceXML interpreter to interpret a VoiceXML
Document and that may also interact with the implementation platform
independently of the VoiceXML interpreter.
W3C
World Wide Web Consortium

40

41

Вам также может понравиться