Академический Документы
Профессиональный Документы
Культура Документы
Anytime anywhere Internet access has become the goal for current
technology vendors.
growth in the recent years and has become the primary source of information
all over the world. Computing power today is increasingly moving away from
the desktop computer to wireless access medium such as mobile phones and
mobile computing devices such as PDAs and tablet PCs. The challenge that is
presented to the present Internet world, is to make the enormous web content
accessible to such users as well as visually impaired users. The existing web
infrastructure was designed for traditional desktop browsers and not for handheld devices. The data in the web is stored in HTML (Hyper Text Markup
Language) format which is not suited for devices that has less processing
power, limited screen size, constrained memory and input capabilities. Voice
has always been an accepted medium of user interaction, which greatly
simplifies the input process. The development of interactive voice browsers
which uses an improved means of voice recognition and efficient Text-tospeech (TTS) engines has made it possible for the mobile users to access the
Internet. This can be done using VoiceXML (VXML), a standard markup
language.
Just as a web browser renders HTML documents visually, a
VoiceXML interpreter renders VoiceXML documents aurally. Since the
documents are rendered aurally they can be heard over the phone.
Applications using VXML are considered cost-effective and convenient to
interact with a software application without human interaction or any
expensive computing devices.
1
Chapter -1
INTRODUCTION
1.1 Overview:
VoiceXML is a derivative of World Wide Web consortiums (W3C) XMLThe Extensible Markup Language. VoiceXML is designed for creating audio
dialogs that combine speech, audio digital, speech recognition, DTMF key
input, recorded or synthetic speech, and telephony. The major goal is to bring
web-based development and content delivery to interactive voice respond
applications.
Graphical Browser
4
Voice Browser
Language
Browser output
HTML
VoiceXML
Text and images laid out Streaming audio and TTS
according to mark- up spoken according to marktags
User input
Resources
sound files
Keyboard and mouse
DTMF and spoken Voice
HTML pages, images, VoiceXML pages, speech
retrieved from
Web server
Hyperlinking
objects
files & streaming audio
Click on 'hotspot' text or Say 'hotspot' word (eg 'help'),
images, or submit form
or submit form
dialed number.
The voice browser renders the
mouse.
Chapter 2
VoiceXML Concepts and Features.
2.1 VXML Concepts
A VoiceXML application consists of a set of documents that describe a
conversational finite state machine. The user is always in one conversational
state, or dialog, at a time. Each dialog determines the next dialog to transition
to. Transitions are specified using Universal Resource Identifiers (URI)
pointing to the next document and dialog to use. If a URI does not refer to a
document, the current document is assumed and, if it does not refer to a dialog,
the first dialog in the document is assumed. Execution is terminated when a
dialog does not specify a successor, or if it has an element that explicitly exits
the conversation.
mobile phones, and the use of a voice interface may be a much more
convenient means of accessing information. Navigation by voice is by far
more pleasant and faster than the use of touch-tone input or entering
information using the small keypads of mobile WAP phones. Also, in some
situations, voice output may be preferred over visual output. For example, a
person may perform a manual task, while simultaneously receiving
information via a voice interface. Just think of an employee driving to his
office by car. He can listen to the news on the companys web portal site while
his eyes are concentrating on the traffic. With respect to the disabled, voiceenabled applications are valuable to users who can either not use their hands
for keyboard input or their eyes to process visual output. Further, voice
interfaces require no special instruction or experience. They also allow new
forms of human-computer interaction based on a combination of visual and
voice interfaces. We can build applications, which are either fully based on
voice or use speech technology to augment existing graphical user interfaces.
User input affects dialog interpretation and is collected into requests submitted
to a document server. The document server may reply with another VoiceXML
document to continue the users session with other dialogs.
VoiceXML is a markup language that:
Minimizes client/server interactions by specifying multiple interactions
per document.
Shields application authors from low-level, and platform-specific
details.
Separates user interaction code (in VoiceXML) from service logic (CGI
scripts).
Promotes service portability across implementation platforms.
VoiceXML is a common language for content providers, tool providers,
and platform providers.
Is easy to use for simple interactions, and yet provides language features
to support complex dialogs.
While VoiceXML strives to accommodate the requirements of a
majority of voice response services, services with stringent requirements may
best be served by dedicated applications that employ a finer level of control.
Chapter -3
VoiceXML Architechture and Language Features
3.1 Evolution of VXML Architecture
10
3.2 Architechture
The User first contacts the web server requesting for VXML pages. This
request is directed to the VoiceXML interpreter context for initial interaction,
like recognizing the call etc. Later it is passed to the VXML interpreter which
takes care of the dialog to be played which may involve getting inputs from the
user. At this point it may involve getting inputs from the user. Here certain
grammars (rules to recognize input, discussed later), may be active to validate
the input and to switch to another sub-dialog based on the input. The
VoiceXML interpreter context also has certain active grammars which may be
11
looking for phrases from the user, which would take the user to a different
level, like exiting from the web-page.
12
16
17
3.3.2 Sessions
A session begins when the user starts to interact with a VoiceXML interpreter
context, continues as documents are loaded and processed, and ends when
requested by the user, a document, or the interpreter context.
3.3.3 Application
3.3.4 Grammars
Each dialog has one or more speech and/or DTMF grammars associated with
it. In machine directed applications, each dialog grammars are active only
when the user is in that dialog. In mixed initiative applications, where the user
and the machine alternate in determining what to do next, some of the dialogs
are flagged to make their grammars active (i.e., listened for) even when the
user is in another dialog in the same document, or on another loaded document
in the same application. In this situation, if the user says something matching
another dialog active grammars, execution transitions to that other dialog, with
the user utterance treated as if it were said in that dialog. Mixed initiative adds
flexibility and power to voice applications.
<link event="help">
<grammar type="application/x-jsgf">
[please] help [me] [please] |
[please] I (need|want) help [please]
</grammar>
</link>
3.3.5 Events
VoiceXML provides a form-filling mechanism for handling "normal" user
input. In addition, VoiceXML defines a mechanism for handling events not
covered by the form mechanism.
Events are thrown by the platform under a variety of circumstances, such as
when the user does not respond, doesn't respond intelligibly, requests help, etc.
The interpreter also throws events if it finds a semantic error in a VoiceXML
document. Events are caught by catch elements or their syntactic shorthand.
19
Each element in which an event can occur may specify catch elements. Catch
elements are also inherited from enclosing elements "as if by copy." In this
way, common event handling behavior can be specified at any level, and it
applies to all lower levels.
<catch event="help">
Please speak the account number for which you
want the balance.
</catch>
3.3.6 Links
A link supports mixed initiatives. It specifies a grammar that is active
whenever the user is in the scope of the link. If user input matches the link
grammar, control transfers to the link destination URI. A can be used to throw
an event to go to a destination URI
<link next="/servlet/account.vxml">
<grammar type="application/x-jsgf">
account | Account balance inquiry
</grammar>
<dtmf>1</dtmf>
</link>
single field whose grammar and whose <filled> action are constructed from
the <choice> elements.
Chapter -4
Practical Applications of VoiceXML.
21
4.1.2 Features
The VoiceXML Absentee System has been designed to receive and keep
records of absentee calls from students, faculty, and university staff. Two
interfaces were created for this application.
1) A web interface that will provide enrollment to this service and access to
information of the absences and
2) A phone interface (VXML) where users call in to record their absence.
The user must first enroll via the web prior to using the phone service.
The user will enter some pertinent information and create a unique
userId and password to subsequently enter the Absentee System via a
telephone.
22
4.1.3
Working
All users will provide the following information when enrolling on the web: a
name, email address, a unique userId and a password. The faculty may be
asked to enter the course Ids and the semester they are teaching those courses,
which means that the faculty may update this information every semester. Staff
members may be asked to enter the campus they are working at, department
they are working in and whether or not they are managers. The faculty and
staff provide this additional information so that they are able to view their
students or their staff members absentee records, rather than viewing all
students or staff members absentee records.
Each user will be categorized into one of four login types, which
are as follows: student,staff, faculty or administrator. The administrator will
have access to all information provided by the system, as well as creating an
additional administrator user id and password. Access to some information will
be granted to some users, such as instructors and employers, who will be able
to view the absence records for their courses or departments, respectively.
When the user calls the system s/he will be asked to enter the user
id and password. Upon successfully entering into the system, the user will go
through a series of questions. The login information will determine which
category the user is in and the appropriate questions will be asked. If the user
is a student or instructor then the system will ask for the users course Id for
the class that will be missed and date the class will be missed. If the user is a
member of the university staff then the system will ask for the day that the user
will be absent. All the information obtained by the system will be stored into a
23
database, which the authorized users can access via the web. The information
can be viewed on the web.
</prompt>
</filled>
<catch event="nomatch" count="10">The course i d is invalid Please say the
course i d.</catch>
<noinput>
I did not understand the course i d.
<reprompt order="curr" />
</noinput>
</field>
<field
name="date_absent"
timeoutondtmf="false"
confirm="no"
bargein="true"
magicword="false" phoneticpruning="false">
<prompt>What is the date you will be absent?</prompt>
<grammar type="application/x-gsl" mode="voice">
<![CDATA[
TELLME_DATE
]]>
</grammar>
<catch event="nomatch" count="10">I am sorry I can not understand. Please
repeat the date you will be absent.</catch>
<noinput>
I did not understand the date.
<reprompt order="curr" />
</noinput>
<filled>
<goto next="#finddate" method="get" />
</filled>
</field>
26
</form>
<form id="finddate" anchor="false">
<block>
<script>
<![CDATA[
var date1 = vxmldata.get("date_absent");
var myDate='';
var myYear='';
var myMonth='';
var myDaten='';
var mySpecial='';
var myMonthc = '';
function ParseGrammar2(sGramResult) {
myDate = '';
var cMonth = new
Array('january','february','march','april','may','june','july','august','septe
mber','october','november','december');
var arrNames = [];
var arrValues = [];
var arrNamesValues = GramResult.split('^');
for (var i = 0; i < arrNamesValues.length; i++) {
var arrNameValuePair =
arrNamesValues[i].split('=');
arrNames[i] = arrNameValuePair[0];
arrValues[i] = arrNameValuePair[1];
if (arrNames[i] == 'month')
{myMonthc = arrValues[i] };
if (arrNames[i] == 'date')
27
{myDaten = arrValues[i] };
if (arrNames[i] == 'year')
{myYear = arrValues[i] };
if (arrNames[i] == 'special_date')
{mySpecial = arrValues[i] };
}
var i;
for (i=0;i < cMonth.length;i++){
if (cMonth[i] == myMonthc) break;
}
myMonth = i + 1;
myDate = myYear + '-' + myMonth + '-'
+ myDaten ;
} // eo function
ParseGrammar2(date1);
]]>
</script>
</block>
<block>
You said,
<value expr="myMonthc" />
,
<value expr="myDaten" />
</block>
<field name="yesno" timeoutondtmf="false" confirm="no" bargein="true"
magicword="false" phoneticpruning="false">
<prompt>Is this the date you said?. say, 'yes' or 'no'.</prompt>
<grammar type="application/x-gsl" mode="voice">
28
<![CDATA[
YES_NO
]]>
</grammar>
<catch event="nomatch" count="3">I am sorry I can not understand. Please
repeat your response. yes or no.</catch>
<filled>
<prompt>
you said
<value expr="yesno" />
</prompt>
<if cond="yesno != 'yes'">
<prompt>Please re-enter Course and Date of Absent</prompt>
<goto next="#crsdate" method="get" />
<else />
<submit next="saveAbsenteeRecord.php" method="post"
namelist="courseid myYear myMonth myDaten username"
/>
</if>
</filled>
</field>
<block>
<submit next="saveAbsenteeRecord.php" method="post" namelist="courseid
myYear myMonth myDaten username" />
</block>
</form>
</vxml>
29
30
31
32
33
4.2.3 Conclusion
Since the Busline domain supports a single task, it seemed natural to
implement the systems using a system-initiative policy, prompting the user for
each of the required slots. Even had it been possible to create a more openended interaction, it seems that for this particular type of task, having the
computer drive the dialog worked quite well. It could also be said that this is
the most efficient way to fill the three slots to retrieve the information: as with
most simple information retrieval tasks, a significant determiner of success is
the users ability to remember what the system needs to know. Transferring
this responsibility to the system increases its usability.
34
Chapter 5
Commercial Applications of VXML
36
Chapter 6
CONCLUSION
VXML will have profound impacts, changing the way we use the phone - and
perhaps the design of phones themselves - as well as changing the nature and
evolution of the Web. By making it easier to program Web applications for
voice access, VXML can bring high efficiency to call center and intranet
development. Applications using VXML are considered cost-effective and
convenient to interact with a software application without human interaction or
any expensive computing devices.
Traditional Interactive Voice Response (IVR) systems have been
expensive to deploy and maintain because they require a mastery of
proprietary tools and technologies, expensive hardware and professionals
trained on specific software and hardware. VoiceXML enables effective
exploration of dialog system design. Commercial VoiceXML development
environments offer a relatively easy entry point that allows diverse dialog
systems to be built.
37
Bibliography
38
Appendix A
GLOSSARY OF TERMS
Application
A collection of VoiceXML documents that are tagged with the same application
name attribute.
Dialog
An interaction with the user specified in a VoiceXML document. Types of
dialogs include forms and menus.
Event
A notification thrown by the implementation platform, VoiceXML
interpreter context, VoiceXML interpreter, or VoiceXML code. Events include
exceptional conditions (semantic errors), normal errors (user did not say
something recognizable), normal events (user wants to exit), and user defined
events.
Form
A dialog that interacts with the user in a highly flexible fashion with the
computer and the user sharing the initiative.
Link
A set of grammars that when matched by something the user says or keys in,
either transitions to a new dialog or document or throws an event in the current
form item.
Menu
A dialog presenting the user with a set of choices and takes action on the
selected one.
Session
A connection between a user and an implementation platform, e.g. a telephone
call to a voice response system. One session may involve the interpretation of
more than one VoiceXML document.
39
Sub-dialog
A VoiceXML dialog (or document) invoked from the current dialog in a
manner analogous to function calls.
User
A person whose interaction with an implementation platform is controlled by a
VoiceXML interpreter.
URI
Uniform Resource Indicator.
URL
Uniform Resource Locator.
VoiceXML document
An XML document conforming to the VoiceXML specification.
VoiceXML interpreter
A computer program that interprets a VoiceXML document to control an
implementation platform for the purpose of conducting an interaction with a
user.
VoiceXML interpreter context
A computer program that uses a VoiceXML interpreter to interpret a VoiceXML
Document and that may also interact with the implementation platform
independently of the VoiceXML interpreter.
W3C
World Wide Web Consortium
40
41