Veritas D2.8.1

Accessible and Assistive ICT
VERITAS
Virtual and Augmented Environments and Realistic User Interactions
To achieve Embedded Accessibility DesignS
247765
First prototypes of the multimodal interface

tool set
Deliverable No. D2.8.1

SubProject No. SP2 SubProject Innovative VR models, tools
Title and simulation environments
Workpackage No. W2.8 Workpackage Multimodal Interfaces
Title
Activity No. A2.8.3 Activity Title Modality compensation for
improving interaction efficiency
Authors Panagiotis Moschonas (CERTH/ITI),
Dimitrios Tzovaras (CERTH/ITI),
George Ghinea (Brunel University)
Status F (Final)
Dissemination level: Pu (Public)
File Name: D2.8.1 First prototypes of the multimodal
interface tool set
Project start date and duration 01 January 2010, 48 Months
VERITAS D2.8.1 PU Grant Agreement # 247765
Version History Table

Version Dates and comments
no.
1 Draft version created and sent for peer review. October 2011.
2 Final version created. December 2011.
December 2011 i CERTH/ITI

Table of Contents
Version History Table............................................................................................i
List of Figures......................................................................................................vi
List of Tables........................................................................................................ix
List of Code Snippets..........................................................................................xi
List of Abbreviations...........................................................................................xiii
Executive Summary.............................................................................................1
1 Introduction........................................................................................................2
1.1 Defining Multimodal Interaction............................................................................2
1.1.1 Multimodal interaction: a human-centered view..................................................2
1.1.2 Multimodal interaction: a system-centered view..................................................3
1.2 Modelling Multimodal Interaction.........................................................................5
1.2.1 Bersens taxonomy...............................................................................................5
1.2.2 Architecture of multimodal user interfaces...........................................................6
1.2.3 Fusion of input modalities....................................................................................7
1.2.3.1 PAC-Amodeus................................................................................................... 7
1.2.3.2 Open Agent Architecture....................................................................................7
1.2.3.3 Multimodal Architecture proposed by Obrenovic et al........................................8
1.3 Multimodal Applications.....................................................................................11
1.3.1 Ambient spaces..................................................................................................12
1.3.2 Mobile/wearable.................................................................................................12
1.3.3 Virtual environments..........................................................................................13
1.3.4 Art.......................................................................................................................13
1.3.5 Users with disabilities.........................................................................................13
1.3.5.1 Users with disabilities automotive ................................................................14
1.3.5.2 Users with disabilities smart living spaces....................................................14
1.3.5.3 Users with disabilities workplace design.......................................................15
1.3.5.4 Users with disabilities infotainment...............................................................15
1.3.5.5 Users with disabilities personal healthcare and well-being...........................16
1.3.6 Public and private spaces..................................................................................16
1.3.7 Other..................................................................................................................16
2 Assistive Technologies (AT).............................................................................17
2.1 Assistive Technologies for visually impaired users.............................................17
2.1.1 Electronic reading devices.................................................................................17
2.1.2 Lasercane..........................................................................................................17
2.1.3 Braille and refreshable braille............................................................................18
2.1.4 Screen magnifiers..............................................................................................18
2.1.5 Screen readers...................................................................................................18
2.1.6 Text browsers.....................................................................................................18
2.1.7 Voice browsers...................................................................................................19
2.1.8 Haptic devices....................................................................................................19
2.2 Assistive Technologies for hearing impaired users.............................................19
2.2.1 Hearing aids.......................................................................................................19
2.2.2 Visual System for Telephone and Mobile Phone Use........................................20
2.2.3 Visual notification devices..................................................................................21
2.2.4 Gesture recognition............................................................................................21
2.3 Assistive Technologies for motor impaired users...............................................22
December 2011 ii CERTH/ITI

2.3.1 Canes and walkers.............................................................................................22

2.3.2 Wheelchairs.......................................................................................................23
2.3.3 Reachers............................................................................................................23
2.3.4 Alternative keyboards or switches......................................................................24
2.3.5 Speech recognition............................................................................................24
2.3.6 Eye tracking.......................................................................................................24
2.3.7 Application-domain oriented assistive devices..................................................25
2.4 Assistive Technologies for cognitive impaired users..........................................26
2.4.1 GPS Wristwatch.................................................................................................26
2.4.2 Activity compass.................................................................................................26
2.4.3 Tabbing through structural elements..................................................................27
3 Multimodal interfaces and addressed user groups..........................................27
3.1 Interfaces definitions..........................................................................................27
3.1.1 Voice/Sound-based Interfaces...........................................................................27
3.1.1.1 Speech recognition interfaces.........................................................................28
3.1.1.2 Speech synthesis interfaces............................................................................28
3.1.1.3 Screen reading interfaces................................................................................28
3.1.1.4 Voice browsing interfaces................................................................................28
3.1.1.5 Audio playback interfaces................................................................................29
3.1.2 Text-based Interfaces.........................................................................................29
3.1.2.1 Pen based interfaces.......................................................................................29
3.1.2.2 Alternative keyboards/switches interfaces.......................................................29
3.1.2.3 Refreshable braille display interfaces..............................................................29
3.1.3 Vision-based Interfaces......................................................................................30
3.1.3.1 Gaze/Eye tracking interfaces...........................................................................30
3.1.3.2 Facial expression recognition interfaces..........................................................30
3.1.3.3 Augmented reality interfaces...........................................................................30
3.1.3.4 Visual notification interfaces............................................................................30
3.1.4 Gesture/Sign-based Interfaces..........................................................................31
3.1.4.1 Gesture recognition based interfaces..............................................................31
3.1.4.2 Haptic interfaces.............................................................................................. 31
3.1.4.3 Sign language synthesis interfaces.................................................................31
3.2 User Groups Descriptions..................................................................................31
3.2.1 Elderly................................................................................................................32
3.2.2 Cerebral Palsy....................................................................................................33
3.2.3 Parkinson's Disease...........................................................................................34
3.2.4 Rheumatoid Arthritis...........................................................................................35
3.2.5 Osteoarthritis .....................................................................................................35
3.2.6 Gonarthritis.........................................................................................................35
3.2.7 Coxarthritis ........................................................................................................36
3.2.8 Adhesive capsulitis of shoulder..........................................................................36
3.2.9 Spinal cord injury................................................................................................36
3.2.10 Hemiparesis.....................................................................................................36
3.2.11 Hemiplegia.......................................................................................................37
3.2.12 Cerebrovascular accident (Stroke)..................................................................38
3.2.13 Multiple Sclerosis.............................................................................................38
3.2.14 Age-related macular degeneration...................................................................39
3.2.15 Glaucoma.........................................................................................................39
3.2.16 Color Vision Deficiency....................................................................................39
3.2.17 Cataract............................................................................................................39
3.2.18 Diabetic retinopathy.........................................................................................40
3.2.19 Otitis.................................................................................................................40
3.2.20 Otosclerosis.....................................................................................................40
December 2011 iii CERTH/ITI

3.2.21 Noise induced hearing loss..............................................................................41

3.2.22 Profound hearing loss......................................................................................41
3.2.23 Presbycusys.....................................................................................................41
3.2.24 Stuttering..........................................................................................................42
3.2.25 Cluttering..........................................................................................................42
3.2.26 Muteness..........................................................................................................42
3.2.27 Dysarthria.........................................................................................................42
3.2.28 Dementia..........................................................................................................43
3.2.29 Alzheimer's disease.........................................................................................43
4 Multimodal Interaction Models.........................................................................43
4.1 Overview........................................................................................................... 43
4.2 Generic structure...............................................................................................45
4.3 Generic Multimodal Interaction Models..............................................................49
4.3.1 Walk...................................................................................................................49
4.3.2 See.....................................................................................................................50
4.3.3 Hear...................................................................................................................51
4.4 Automotive Multimodal Interaction Models.........................................................52
4.4.1 Grasp: Door handle............................................................................................52
4.4.2 Pull (hand): Door handle....................................................................................53
4.4.3 Walk: To car seat................................................................................................54
4.4.4 Grasp (right hand): Steering wheel....................................................................56
4.5 Smart living spaces Multimodal Interaction Models...........................................57
4.5.1 Grasp (hand): Door handle................................................................................58
4.5.2 Pull (hand): Window bar.....................................................................................59
4.5.3 Sit: Bed...............................................................................................................60
4.6 Workplace Office Multimodal Interaction Models............................................61
4.6.1 Pull (hand): Toilet flush valve.............................................................................62
4.6.2 Stand up (knee, back): Toilet.............................................................................63
4.7 Infotainment Multimodal Interaction Models.......................................................64
4.7.1 Grasp (hand): Mouse.........................................................................................65
4.7.2 See: Computer screen.......................................................................................66
4.8 Personal healthcare Multimodal Interaction Models...........................................67
4.8.1 Push (hand): Medical device button...................................................................67
4.8.2 Read: Message on touch screen.......................................................................68
4.8.3 Press: OK button on the touch screen...............................................................69
4.9 Conclusions.......................................................................................................70
5 Multimodal Interfaces Manager.......................................................................70
5.1 Architecture........................................................................................................71
5.2 Modality Compensation and Replacement Module............................................72
5.3 Multimodal Toolset Manager..............................................................................72
5.3.1 Speech Recognition Module..............................................................................74
5.3.2 Speech Synthesis Module.................................................................................75
5.3.3 Haptics Module..................................................................................................78
5.3.4 Sign Language Synthesis Module.....................................................................78
5.3.5 Symbolic Module................................................................................................79
Future Work.......................................................................................................79
References.........................................................................................................80
Appendix: Supplementary Multimodal Interaction Models.................................92
A. Automotive Area.................................................................................................92
A.1 Sit: Car seat..........................................................................................................92
A.2 Swing (legs): Inside car........................................................................................93
December 2011 iv CERTH/ITI

A.3 Grasp (hand): Interior door handle.......................................................................95

A.4 Pull (left hand): Interior door handle.....................................................................96
A.5 Push (right hand): Lock button.............................................................................97
A.6 Press (right hand): Eject button on belt buckle.....................................................98
A.7 Grasp (right hand): Interior door handle.............................................................100
A.8 Push (left hand): Interior door side.....................................................................101
A.9 Pull down (hands): Sun shield............................................................................102
A.10 Grasp (hand): Steering wheel...........................................................................103
A.11 Push (left foot): Gear pedal...............................................................................104
A.12 Push (right foot): Accelerator pedal..................................................................105
A.13 Push (right foot): Brake pedal...........................................................................107
A.14 Push (thumb): Parking brake release button....................................................108
A.15 Pull (hand): Hand brake....................................................................................109
A.16 Grasp (hand): Light switch................................................................................110
A.17 Turn (hand): Light switch...................................................................................112
A.18 Move up/down (hand): Direction indicator........................................................113
A.19 Grasp (hand): Radio knob.................................................................................115
A.20 Turn (hand): Radio knob...................................................................................116
A.21 Push (hand): Radio button................................................................................117
A.22 Push (hand): Window button............................................................................118
A.23 Grasp (hand): Window handle..........................................................................119
A.24 Turn (hand): Window handle............................................................................120
A.25 Turn (hand): Rear mirror...................................................................................121
A.26 Push (hand): Rear mirror..................................................................................122
A.27 Grasp (right hand): Gear handle.......................................................................123
A.28 Push (right hand): Gear handle........................................................................124
A.29 Push (hand): Navigation system buttons..........................................................125
A.30 Push (right foot): Rear brake pedal..................................................................126
A.31 Listen: Navigation system audio cues..............................................................127
B. Smart Living Spaces........................................................................................128
B.1 Grasp (hand): Faucet controls............................................................................128
B.2 Grasp (hand): Hob gas control knob..................................................................129
B.3 Push (hand): Stove knob....................................................................................130
B.4 Pull (hand): Washing machine porthole handle..................................................131
B.5 Turn (hand): Dishwasher knob...........................................................................132
B.6 Push (hand): Hood button..................................................................................133
B.7 Pull (hand): Oven door handle............................................................................134
C. Workplace Office...........................................................................................135
C.1 Twist (hand): Faucet control...............................................................................135
C.2 Sit (knee, back): On toilet...................................................................................136
D. Infotainment.....................................................................................................138
D.1 Push (hand): Mouse...........................................................................................138
D.2 Move (hand): mouse...........................................................................................139
D.3 Press (hand): Keyboard key...............................................................................140
December 2011 v CERTH/ITI

List of Figures
Figure 1: Design space for multimodal systems [9]......................................................................4
Figure 2: An architecture of multimodal user interfaces. Adapted from [16]..................................6
Figure 3: Obrenovic et al framework [21]......................................................................................8
Figure 4: Multimodal Interaction Models architecture overview diagram....................................44
Figure 5: Generic structure of a Multimodal Interaction Model...................................................45
Figure 6: Multimodal Interaction Model "grasp door" example....................................................45
Figure 7: Enabling relationship - Indicative explanatory case.....................................................46
Figure 8: Enabling relationship with information passing Indicative explanatory case............47
Figure 9: Choice relationship Indicative explanatory case.......................................................47
Figure 10: Concurrency relationship - Indicative explanatory case.............................................47
Figure 11: Concurrency with information passing relationship Indicative explanatory case.....48
Figure 12: Order independency Indicative explanatory case..................................................48
Figure 13: Disabling relationship Indicative explanatory case................................................48
Figure 14: Suspend/Resume relationship Indicative explanatory case....................................49
Figure 15: Walk Multimodal Interaction Model relationships.....................................................50
Figure 16: See Multimodal Interaction Model relationships......................................................51
Figure 17: Hear Multimodal Interaction Model relationships.....................................................52
Figure 18: Grasp: Door handle Multimodal Interaction Model relationships.............................53
Figure 19: Pull (hand): Door handle Multimodal Interaction Model relationships......................54
Figure 20: Walk: To car seat Multimodal Interaction Model relationships.................................55
Figure 21: Grasp (right hand): Steering wheel Multimodal Interaction Model relationships......56
Figure 22: Grasp (hand): Door handle Multimodal Interaction Model relationships..................58
Figure 23: Pull (hand): Window bar Multimodal Interaction Model relationships......................59
Figure 24: Sit: Bed Multimodal Interaction Model relationships................................................60
Figure 25: Pull (hand): Toilet flush valve Multimodal Interaction Model relationships...............62
Figure 26: Stand up (knee, back): Toilet Multimodal Interaction Model relationships...............63
Figure 27: Grasp (hand): Mouse Multimodal Interaction Model relationships...........................65
Figure 28: See: Computer screen Multimodal Interaction Model relationships.........................66
Figure 29: Push (hand): Medical device button Multimodal Interaction Model relationships.. . .67
Figure 30: Read: Message on touch screen Multimodal Interaction Model relationships.........69
Figure 31: Press: OK button on the touch screen Multimodal Interaction Model relationships. 70
Figure 32: Multimodal Interfaces Manager architecture..............................................................71
Figure 33: Data input and output in Modality Compensation and Replacement Module.............72
Figure 34: The Multimodal Toolset Manager and its modules. Some modules are used for
getting input from users, while others for passing information to them. The modality type of
each tool is also indicated..................................................................................................... 73
Figure 35: Data flow concerning the Speech Recognition Module. Both real-time and pre-
recorded speech-audio is supported and converted to text by the module............................74
December 2011 vi CERTH/ITI

Figure 36: Data flow regarding the Speech Synthesis Module...................................................76

Figure 37: User and Simulation platform exchange information via the haptic device and the
Haptics Module...................................................................................................................... 78
Figure 38: Data flow of the Sign Language Module....................................................................78
Figure 39: Data flow of the Symbolic Module.............................................................................79
Figure 40: Sit: Car seat Multimodal Interaction Model relationships.........................................92
Figure 41: Swing (legs): Inside car Multimodal Interaction Model relationships.......................94
Figure 42: Grasp (hand): Interior door handle Multimodal Interaction Model relationships......95
Figure 43: Pull (left hand): Interior door handle Multimodal Interaction Model relationships.. . .97
Figure 44: Push (right hand): Lock button Multimodal Interaction Model relationships.............98
Figure 45: Press (right hand): Eject button on belt buckle Multimodal Interaction Model
relationships.......................................................................................................................... 99
Figure 46: Grasp (right hand): Interior door handle Multimodal Interaction Model relationships.
............................................................................................................................................ 100
Figure 47: Push (left hand): Interior door side Multimodal Interaction Model relationships.. . .101
Figure 48: Pull down (hands): Sun shield Multimodal Interaction Model relationships...........102
Figure 49: Grasp (hand): Steering wheel Multimodal Interaction Model relationships............103
Figure 50: Push (left foot): Gear pedal Multimodal Interaction Model relationships...............104
Figure 51: Push (right foot): Accelerator pedal Multimodal Interaction Model relationships.. .105
Figure 52: Push (right foot): Brake pedal Multimodal Interaction Model relationships............107
Figure 53: Push (thumb): Parking brake release button Multimodal Interaction Model
relationships........................................................................................................................ 109
Figure 54: Pull (hand): Hand brake Multimodal Interaction Model relationships.....................110
Figure 55: Grasp (hand): Light switch Multimodal Interaction Model relationships.................111
Figure 56: Turn (hand): Light switch Multimodal Interaction Model relationships...................112
Figure 57: Move up/down (hand): Direction indicator Multimodal Interaction Model
relationships......................................................................................................................... 114
Figure 58: Grasp (hand): Radio knob Multimodal Interaction Model relationships..................115
Figure 59: Turn (hand): Radio knob Multimodal Interaction Model relationships....................116
Figure 60: Push (hand): Radio button Multimodal Interaction Model relationships.................117
Figure 61: Push (hand): Window button Multimodal Interaction Model relationships..............118
Figure 62: Grasp (hand): Window handle Multimodal Interaction Model relationships...........119
Figure 63: Turn (hand): Window handle Multimodal Interaction Model relationships.............120
Figure 64: Turn (hand): Rear mirror Multimodal Interaction Model relationships....................121
Figure 65: Push (hand): Rear mirror Multimodal Interaction Model relationships...................122
Figure 66: Grasp (right hand): Gear handle Multimodal Interaction Model relationships........123
Figure 67: Push (right hand): Gear handle Multimodal Interaction Model relationships.........124
Figure 68: Push (hand): Navigation system buttons Multimodal Interaction Model relationships.
............................................................................................................................................ 125
Figure 69: Push (right foot): Rear brake pedal Multimodal Interaction Model relationships....126
Figure 70: Listen: Navigation system audio cues Multimodal Interaction Model relationships.
............................................................................................................................................ 127
December 2011 vii CERTH/ITI

Figure 71: Grasp (hand): Faucet controls Multimodal Interaction Model relationships...........128
Figure 72: Grasp (hand): Hob gas control knob Multimodal Interaction Model relationships..129
Figure 73: Push (hand): Stove knob Multimodal Interaction Model relationships...................130
Figure 74: Pull (hand): Washing machine porthole handle Multimodal Interaction Model
relationships........................................................................................................................ 131
Figure 75: Turn (hand): Dishwasher knob Multimodal Interaction Model relationships..........132
Figure 76: Push (hand): Hood button Multimodal Interaction Model relationships.................133
Figure 77: Pull (hand): Oven door handle Multimodal Interaction Model relationships...........134
Figure 78: Twist (hand): Faucet control Multimodal Interaction Model relationships..............135
Figure 79: Stand up (knee, back): Toilet Multimodal Interaction Model relationships.............137
Figure 80: Push (hand): Mouse Multimodal Interaction Model relationships..........................138
Figure 81: Move (hand): mouse Multimodal Interaction Model relationships..........................139
Figure 82: Press (hand): Keyboard key Multimodal Interaction Model relationships..............140
December 2011 viii CERTH/ITI

List of Tables
Table 1: Different senses and their corresponding modalities [5]..................................................2
Table 2: Interaction modalities described using the Obrenovic et al framework [21].....................9
Table 3: Disabilities and their constraints (from [21])..................................................................10
Table 4: Constraints introduced by driving a car (from [97])........................................................11
Table 5: Walk Multimodal Interaction Model definition..............................................................49
Table 6: See Multimodal Interaction Model definition..............................................................51
Table 7: Hear Multimodal Interaction Model definition..............................................................51
Table 8: Grasp: Door handle Multimodal Interaction Model definition.......................................52
Table 9: Pull (hand): Door handle Multimodal Interaction Model definition...............................53
Table 10: Walk: To car seat Multimodal Interaction Model definition.........................................54
Table 11: Grasp (right hand): Steering wheel Multimodal Interaction Model definition.............56
Table 12: Grasp (hand): Door handle Multimodal Interaction Model definition.........................58
Table 13: Pull (hand): Window bar Multimodal Interaction Model definition..............................59
Table 14: Sit: Bed Multimodal Interaction Model definition.......................................................60
Table 15: Pull (hand): Toilet flush valve Multimodal Interaction Model definition......................62
Table 16: Stand up (knee, back): Toilet Multimodal Interaction Model definition.......................63
Table 17: Grasp (hand): Mouse Multimodal Interaction Model definition..................................65
Table 18: See: Computer screen Multimodal Interaction Model definition................................66
Table 19: Push (hand): Medical device button Multimodal Interaction Model definition............67
Table 20: Read: Message on touch screen Multimodal Interaction Model definition................68
Table 21: Press: OK button on the touch screen Multimodal Interaction Model definition........69
Table 22: Multimodal Toolset Manager's modules and their modality requirements from the
target users........................................................................................................................... 74
Table 23: Features of the chosen speech recognition software, CMU Sphinx............................75
Table 24: Feature of the speech synthesizers used by the Speech Synthesis Module...............77
Table 25: Sit: Car seat Multimodal Interaction Model definition................................................92
Table 26: Swing (legs): Inside car Multimodal Interaction Model definition...............................93
Table 27: Grasp (hand): Interior door handle Multimodal Interaction Model definition..............95
Table 28: Pull (left hand): Interior door handle Multimodal Interaction Model definition............96
Table 29: Push (right hand): Lock button Multimodal Interaction Model definition....................97
Table 30: Press (right hand): Eject button on belt buckle Multimodal Interaction Model
definition................................................................................................................................ 98
Table 31: Grasp (right hand): Interior door handle Multimodal Interaction Model definition....100
Table 32: Push (left hand): Interior door side Multimodal Interaction Model definition............101
Table 33: Pull down (hands): Sun shield Multimodal Interaction Model definition..................102
Table 34: Grasp (hand): Steering wheel Multimodal Interaction Model definition...................103
Table 35: Push (left foot): Gear pedal Multimodal Interaction Model definition.......................104
Table 36: Push (right foot): Accelerator pedal Multimodal Interaction Model definition...........105
December 2011 ix CERTH/ITI

Table 37: Push (right foot): Brake pedal Multimodal Interaction Model definition...................107
Table 38: Push (thumb): Parking brake release button Multimodal Interaction Model definition.
............................................................................................................................................ 108
Table 39: Pull (hand): Hand brake Multimodal Interaction Model definition............................109
Table 40: Grasp (hand): Light switch Multimodal Interaction Model definition........................110
Table 41: Turn (hand): Light switch Multimodal Interaction Model definition...........................112
Table 42: Move up/down (hand): Direction indicator Multimodal Interaction Model definition. 113
Table 43: Grasp (hand): Radio knob Multimodal Interaction Model definition.........................115
Table 44: Turn (hand): Radio knob Multimodal Interaction Model definition...........................116
Table 45: Push (hand): Radio button Multimodal Interaction Model definition........................117
Table 46: Push (hand): Window button Multimodal Interaction Model definition.....................118
Table 47: Grasp (hand): Window handle Multimodal Interaction Model definition...................119
Table 48: Turn (hand): Window handle Multimodal Interaction Model definition.....................120
Table 49: Turn (hand): Rear mirror Multimodal Interaction Model definition...........................121
Table 50: Push (hand): Rear mirror Multimodal Interaction Model definition..........................122
Table 51: Grasp (right hand): Gear handle Multimodal Interaction Model definition...............123
Table 52: Push (right hand): Gear handle Multimodal Interaction Model definition.................124
Table 53: Push (hand): Navigation system buttons Multimodal Interaction Model definition.. 125
Table 54: Push (right foot): Rear brake pedal Multimodal Interaction Model definition...........126
Table 55: Listen: Navigation system audio cues Multimodal Interaction Model definition.......127
Table 56: Grasp (hand): Faucet controls Multimodal Interaction Model definition..................128
Table 57: Grasp (hand): Hob gas control knob Multimodal Interaction Model definition.........129
Table 58: Push (hand): Stove knob Multimodal Interaction Model definition..........................130
Table 59: Pull (hand): Washing machine porthole handle Multimodal Interaction Model
definition.............................................................................................................................. 131
Table 60: Turn (hand): Dishwasher knob Multimodal Interaction Model definition..................132
Table 61: Push (hand): Hood button Multimodal Interaction Model definition.........................133
Table 62: Pull (hand): Oven door handle Multimodal Interaction Model definition..................134
Table 63: Twist (hand): Faucet control Multimodal Interaction Model definition......................135
Table 64: Sit (knee, back): On toilet Multimodal Interaction Model definition..........................136
Table 65: Push (hand): Mouse Multimodal Interaction Model definition..................................138
Table 66: Move (hand): mouse Multimodal Interaction Model definition.................................139
Table 67: Press (hand): Keyboard key Multimodal Interaction Model definition.....................140
December 2011 x CERTH/ITI

List of Code Snippets

CodeSnippet 1: Multimodal Interaction Model example (Grasp door handle) - UsiXML source
code....................................................................................................................................... 46
CodeSnippet 2: Walk Multimodal Interaction Model (UsiXML source code).............................50
CodeSnippet 3: See Multimodal Interaction Model (UsiXML source code)..............................51
CodeSnippet 4: Hear Multimodal Interaction Model (UsiXML source code).............................52
CodeSnippet 5: Grasp: Door handle Multimodal Interaction Model (UsiXML source code)......53
CodeSnippet 6: Pull (hand): Door handle Multimodal Interaction Model (UsiXML source code).
.............................................................................................................................................. 54
CodeSnippet 7: Walk: To car seat Multimodal Interaction Model (UsiXML source code).........55
CodeSnippet 8: Grasp (right hand): Steering wheel Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 57
CodeSnippet 9: Grasp (hand): Door handle Multimodal Interaction Model (UsiXML source
code)..................................................................................................................................... 58
CodeSnippet 10: Pull (hand): Window bar Multimodal Interaction Model (UsiXML source code).
.............................................................................................................................................. 59
CodeSnippet 11: Sit: Bed Multimodal Interaction Model (UsiXML source code)......................61
CodeSnippet 12: Pull (hand): Toilet flush valve Multimodal Interaction Model (UsiXML source
code)..................................................................................................................................... 62
CodeSnippet 13: Stand up (knee, back): Toilet Multimodal Interaction Model (UsiXML source
code)..................................................................................................................................... 64
CodeSnippet 14: Grasp (hand): Mouse Multimodal Interaction Model (UsiXML source code). 65
CodeSnippet 15: See: Computer screen Multimodal Interaction Model (UsiXML source code).
.............................................................................................................................................. 66
CodeSnippet 16: Push (hand): Medical device button Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 68
CodeSnippet 17: Read: Message on touch screen Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 69
CodeSnippet 18: Press: OK button on the touch screen Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 70
CodeSnippet 19: Sit: Car seat Multimodal Interaction Model (UsiXML source code)...............93
CodeSnippet 20: Swing (legs): Inside car Multimodal Interaction Model (UsiXML source code).
.............................................................................................................................................. 94
CodeSnippet 21: Grasp (hand): Interior door handle Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 96
CodeSnippet 22: Pull (left hand): Interior door handle Multimodal Interaction Model (UsiXML
source code).......................................................................................................................... 97
CodeSnippet 23: Push (right hand): Lock button Multimodal Interaction Model (UsiXML source
code)..................................................................................................................................... 98
CodeSnippet 24: Press (right hand): Eject button on belt buckle Multimodal Interaction Model
(UsiXML source code)........................................................................................................... 99
CodeSnippet 25: Grasp (right hand): Interior door handle Multimodal Interaction Model
(UsiXML source code)......................................................................................................... 101
December 2011 xi CERTH/ITI

CodeSnippet 26: Push (left hand): Interior door side Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 102
CodeSnippet 27: Pull down (hands): Sun shield Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 103
CodeSnippet 28: Grasp (hand): Steering wheel Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 104
CodeSnippet 29: Push (left foot): Gear pedal Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 105
CodeSnippet 30: Push (right foot): Accelerator pedal Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 106
CodeSnippet 31: Push (right foot): Brake pedal Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 108
CodeSnippet 32: Push (thumb): Parking brake release button Multimodal Interaction Model
CodeSnippet 33: Pull (hand): Hand brake Multimodal Interaction Model (UsiXML source code).
............................................................................................................................................. 110
CodeSnippet 34: Grasp (hand): Light switch Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 111
CodeSnippet 35: Turn (hand): Light switch Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 113
CodeSnippet 36: Move up/down (hand): Direction indicator Multimodal Interaction Model
CodeSnippet 37: Grasp (hand): Radio knob Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 115
CodeSnippet 38: Turn (hand): Radio knob Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 116
CodeSnippet 39: Push (hand): Radio button Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 117
CodeSnippet 40: Push (hand): Window button Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 118
CodeSnippet 41: Grasp (hand): Window handle Multimodal Interaction Model (UsiXML source
code).................................................................................................................................... 119
CodeSnippet 42: Turn (hand): Window handle Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 120
CodeSnippet 43: Turn (hand): Rear mirror Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 121
CodeSnippet 44: Push (hand): Rear mirror Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 122
CodeSnippet 45: Grasp (right hand): Gear handle Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 123
CodeSnippet 46: Push (right hand): Gear handle Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 124
CodeSnippet 47: Push (hand): Navigation system buttons Multimodal Interaction Model
CodeSnippet 48: Push (right foot): Rear brake pedal Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 127
December 2011 xii CERTH/ITI

CodeSnippet 49: Listen: Navigation system audio cues Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 128
CodeSnippet 50: Grasp (hand): Faucet controls Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 129
CodeSnippet 51: Grasp (hand): Hob gas control knob Multimodal Interaction Model (UsiXML
source code)........................................................................................................................ 130
CodeSnippet 52: Push (hand): Stove knob Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 131
CodeSnippet 53: Pull (hand): Washing machine porthole handle Multimodal Interaction Model
CodeSnippet 54: Turn (hand): Dishwasher knob Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 133
CodeSnippet 55: Push (hand): Hood button Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 134
CodeSnippet 56: Pull (hand): Oven door handle Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 135
CodeSnippet 57: Twist (hand): Faucet control Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 136
CodeSnippet 58: Sit (knee, back): On toilet Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 138
CodeSnippet 59: Push (hand): Mouse Multimodal Interaction Model (UsiXML source code).139
CodeSnippet 60: Move (hand): mouse Multimodal Interaction Model (UsiXML source code).
............................................................................................................................................ 140
CodeSnippet 61: Press (hand): Keyboard key Multimodal Interaction Model (UsiXML source
code)................................................................................................................................... 141
List of Abbreviations
Abbreviation Explanation
AT Assistive Technologies
GPS Global Positioning System
PDA Personal digital assistant
SP Sub-Project
UsiXML USer Interface eXtensible Markup Language
December 2011 xiii CERTH/ITI

Executive Summary
This manuscript is devoted to the research and development of tools and their
respective data structures that concern the design of multimodal interfaces specifically
devoted to the demands and expectations of persons with special needs, people with
disabilities and deficiencies. The focus has been made on categories of users with a
certain degree of impairments either physical or cognitive (blind and low vision users,
motor impairment users, mild cognitive impairment users, speech impairment users,
hearing impairment users), including older people. The generated models are able to
simulate the various steps of the interaction process of mono- and multimodal
interfaces and link them to the sensorial capabilities of the users.
Initially, in Section 1, an introduction to the multimodal interaction concept is made. A
detailed analysis of interaction mechanisms of the current HCI solutions (e.g. touch,
voice control, speech output, gesture, GUIs, etc.) is presented. This analysis is needed
to assess the gaps in the usability of each of the considered HCI solutions by the
selected user groups.
In Section 2, the widely used assistive devices are identified and analysed in terms of
several parameters like their input and output modalities, mobility, robustness, etc. The
State of the Art of multimodal interfaces for older people and people with disabilities
(visual, hearing, motor and cognitive domains) is also analysed.
In Section 3, two lists of entities are defined: one regarding the interfaces and one that
describes various (impaired) user groups and the interfaces that can be used in order
to address their deficiencies.
In Section 4, the manuscript presents the specifications of the Multimodal Interaction
Models, which include combinations of interfacing modalities most suited for the target
user groups and connect the Virtual User Models, developed in SP1, to the Task
Models and the virtual prototype to be tested. Solution cases for five application
domains are presented: automotive, smart living places, workplace/office, infotainment
and personal healthcare.
Section 5 describes the aspects that concern the Multimodal Interfaces Manager
implementation. An architecture analysis is made about the manager and the features
of its two main components are presented.
December 2011 1 CERTH/ITI

1 Introduction
1.1 Defining Multimodal Interaction

There are two views on multimodal interaction. The first one is heavily rooted in
psychology and focuses on the human side: perception and control. There, the word
modality refers to human input and output channels. The second view is that of the
most computer scientists and focuses on using two or more computer input or output
modalities to build systems that make synergistic use of parallel input or output of these
modalities. These two views are discussed in this section in an attempt to define
multimodal interaction.
1.1.1 Multimodal interaction: a human-centered view

In the human-centered view of multimodal interaction the focus is on multimodal
perception and control, that is, human input and output channels. Perception means
the process of transforming sensory information to higher level representations. A
communication channel consists of the sense organs, the nerve tracts, the cerebrum,
and the muscles [1]. A great deal of research has been carried out on the process of
multimodal audio-visual perception [2][3]. A talking head, three-dimensional animated
human-like figure using speech synthesis [4], is an artifact that combines both visual
and auditory computer outputs to create a multimodal audio-visual stimulus for the
user.
In this view the concept of modality is tightly coupled with human senses. Silbernagel
[5] lists the different senses and their corresponding modalities that are presented in
Table 1. The table shows a somewhat simplified version of perception. For example,
tactile feedback is perceived through nerve endings that reside in deep tissue in
addition to nerve terminals which are in the skin.
Sensory Perception Sense organ Modality
Sense of sight Eyes Visual
Sense of hearing Ears Auditive
Sense of touch Skin Tactile
Sense of smell Nose Olfactory
Sense of taste Tongue Gustatory
Sense of balance Organ of equilibrium Vestibular
Table 1: Different senses and their corresponding modalities [5].
If we observe the modalities from a neurobiological point of view [6][7], we can divide
the modalities in seven groups:
internal chemical (blood oxygen, glucose, pH),
external chemical (taste, smell),

somatic senses (touch, pressure, temperature, pain),

muscle sense (stretch, tension, joint position),
sense of balance,
hearing,
vision.
Since internal chemical senses are not obviously applicable in seamless user
interfaces, we can safely ignore them in this context. The other neurobiological
modalities are included in the modalities presented in Table 1, but muscle sense
(kinesthesia) cannot be found in the table. Kinesthesia is very important in determining
where our hands and other body parts are at a given moment. Sometimes we want to
distinguish between bimodal and multimodal use of modalities. The word bimodal is
used to nominate multimodal perception or control that makes use of exactly two
modalities.
1.1.2 Multimodal interaction: a system-centered view

In computer science multimodal user interfaces have been defined in many ways.
Chatty [8] gives a summary of definitions for multimodal interaction by explaining that
most authors who recently wrote on that subject considered it as defining systems that
feature multiple input devices (multi-sensor interaction) or multiple interpretations of
input issued through a single device. Chattys explanation of multimodal interaction is
the one that most computer scientists use. By the term multimodal user interface,
they mean a system that accepts many different inputs that are combined in a
meaningful way. For a long time computer output developed much faster than the ways
that we can use to control the systems. If we compare the current multimedia
presentation capabilities of the computers to the input devices that we use to control
these machines, there is a great difference in the sophistication and diversity of
available input and output channels. Nigay and Coutaz [9] define multimodality in the
following way: Multimodality is the capacity of the system to communicate with a user
along different types of communication channels and to extract and convey meaning
automatically.
Both multimedia and multimodal systems use multiple communication channels. Nigay
and Coutaz differentiate between these systems by stating that a multimodal system is
able to automatically model the content of the information at a high level of abstraction.
A multimodal system strives for meaning. For example, an electronic mail system that
supports voice and video clips is not multimodal if it only transfers them to another
person and does not interpret the inputs. According to Nigay and Coutaz the two main
features of multimodal interfaces are the following:
the fusion of different types of data from/to different input or output devices, and
the temporal constraints imposed on information processing from/to input or
output devices.
They have used these features to define a design space for multimodal systems, which
is presented in Figure 1.

Figure 1: Design space for multimodal systems [9].
The different dimensions of the design space define eight possible classes of systems.
By definition [9], multimodal systems require the value Meaning in the Levels of
abstraction. Thus, there are four distinct classes of multimodal systems: exclusive,
alternate, concurrent and synergistic. Use of modalities refers to the temporal
dimension. Parallel operation can be achieved at different levels of abstraction [10].
The most important level is the task level, the level that the user interacts with. It must
seem to be concurrent in order to make the user perceive the system as parallel.
Low-level, or physical level, concurrency is not a requirement for a multimodal system,
but needs to be considered in the implementation. Most current computing systems are
able to offer an imitation of concurrent processing even though the processing is in fact
sequential.
Fusion is the most demanding criterion in the design space [11]. Here Combined
means that different modalities are combined in synergistic higher-level input tokens.
Independent means that there are parallel but not linked modalities in the interface.
The synergistic use of modalities implies the fusion of data from different modelling
techniques. Nigay and Coutaz [9] have identified three levels of fusion: lexical,
syntactic and semantic. They can be mapped to the three conceptual design levels
defined in [12]:
Lexical fusion corresponds to conceptual Binding level. It happens when
hardware primitives are bound to software events. An example of lexical fusion
is selecting multiple objects when the shift key is down.
Syntactic fusion corresponds to conceptual Sequencing level. It involves the
combination of data to obtain a complete command. Sequencing of events is
important at this level. An example of syntactic fusion is synchronizing speech
and pen input in a map selection task.
Semantic fusion corresponds to conceptual Functional level. It specifies the
detailed functionality of the interface: what information is needed for each
operation on the object, how to handle the errors, and what the results of an
operation are. Semantic fusion defines meanings but not the sequence of
actions or the devices with which the actions are conducted. An example of
semantic fusion is a flight route selection task that requires at least two airports
as its input through either touch or speech and draws the route on a map.
Still, having defined that multimodal systems make use of multiple input or output
modalities does not define the properties of the actual systems very well. There are two

main categories of systems that have very different underlying paradigms:

Multimodal input modalities are used to enhance direct manipulation behavior of the
system. The machine is a passive tool and tries to understand the user through all
different input modalities that the system recognizes. The user is always responsible for
initiating the operations. Thus, if the user does not know what to do, nothing gets done.
The goal is to use the computer as a tool and it follows Ben Shneidermans [13][14]
principles of direct manipulation interfaces.
The multiple modalities are used to increase anthropomorphism in the user interface:
the computer as a dialogue partner. This paradigm makes multimodal output important,
and the systems in this category often make use of talking heads, speech recognition
and other human-like modalities. This kind of multimodal system can often be
described as an agent-based conversational user interface.
The selection between these two main categories or other types of systems must be
done on the basis of specific system requirements. These categories are not exclusive:
A tool-like multimodal user interface may have a tutor that is presented as a dialogue
partner, whereas a conversational user interface may have multimodal interaction
techniques that directly respond to the users manual commands.
1.2 Modelling Multimodal Interaction

There are several problems in designing a multimodal user interface. First, the set of
input modalities must be selected right. This is no trivial problem and depends greatly
on the task the interface will be used for. Another, even greater problem is how to
combine different input channels. In so doing, we examine some proposed models for
multimodal interaction. We firstly take a look at a model put forward by Bernsen [15]
followed by one suggested by Maybury and Wahlster [16]. Some models for the fusion
of input modalities have been developed by Nigay and Coutaz [9][17] and Moran et al.
[18]. These models will be described later in this section.
1.2.1 Bersens taxonomy

Bernsen created a generative taxonomy of output modalities, in which simple elements
combine to create more complex elements. His taxonomy is unique in that
representational information (data structures) rather than a device is the primary
object of analysis. Bernsen wanted to attack the complexity of multimodal interaction by
investigating how the various modalities can meaningfully combine. He therefore
examined modalities within the context of interaction style.
Bernsen defines a pure external modality as uncombined representational information.
It is external as opposed to the internal (mental) representations that people use. He
classifies modalities according to whether they are linguistic or non-linguistic, analog or
non-analog, arbitrary or non-arbitrary, static or dynamic. Analog representations are
iconic, non-arbitrary representations rely on already existing systems, and dynamic
representations (such as music and video) require the passage of time.
Bernsen also considers modalities across three media of expression: graphics, sounds
and touch. The taxonomy is hierarchical, with 28 modalities grouped into three large
divisions at the top of the hierarchy:

linguistic and nonarbitrary modalities such as everyday spoken language,

written language, text, musical notation, Braille, and nonarbitrary tables of
information;
nonlinguistic, analog, and nonarbitrary modalities such as photographs, movies,
real world sounds, sounds generated through visualizations,pie charts, touch
sequences, maps, and cartoons; and ,
nonlinguistic and nonanalog modalities, including diagrams with geometrical
elements, trees, windows, scroll bars, arbitrary lists, and arbitrary sounds in
sequences.
1.2.2 Architecture of multimodal user interfaces

Maybury and Wahlster [16] describe a high-level architecture of intelligent user
interfaces. As their notion of intelligent user interfaces includes multimodal interaction,
this model can be used for modeling multimodal interfaces. Figure 2 shows an
adaptation of their model that presents the different levels of processes that are
contained in the complete architecture.
Figure 2: An architecture of multimodal user interfaces. Adapted from [16].
The presented architecture contains many models for different processes in the
system. Each of these models can be refined to fulfil the requirements of a given
system. Specifically, user and discourse models are highly important in a multimodal
interface. There can be one or more user models in a given system. If there are several
user models, the system can also be described as an adaptable user interface. The
discourse model handles the user interaction in a high level, and uses media analysis
and media design processes to understand what the user wants and to present the
information with the appropriate output channels. [19] discusses user and discourse
models for multimodal communication when he describes an intelligent multimodal

interface to expert systems.
1.2.3 Fusion of input modalities

A multiagent architecture seems to be appropriate for building complex multimodal
interfaces and has been used in both of the systems that are presented next. This
architecture structures the input nicely by assigning one or more agents to each input
modality and by providing dialogue control and fusion as separate agents in the
system. A multiagent system inherently supports parallel processing and may also
provide means for distributed computing.
1.2.3.1 PAC-Amodeus
Nigay and Coutaz [9][17] present a fusion mechanism that is used with their PAC-
Amodeus software architecture model. They use a melting pot metaphor to model a
multimodal input event. PAC agents [10] act in this multiagent system and are
responsible for handling these events, in a system similar to Bolts Put-That-There
[20]. In their 1995 paper Nigay and Coutaz explain the fusion mechanism in detail and
give the metrics and rules that they use in the fusion process. They divide fusion in
three classes. Microtemporal fusion is used to combine related informational units
produced in a parallel or pseudo-parallel manner. It is performed when the structural
parts of the input melting pots are complementary and when their time intervals
overlap. Macrotemporal fusion is used to combine related information units that are
produced or processed sequentially. It is performed when the structural parts of the
input melting pots are complementary and their time intervals belong to the same
temporal window but do not overlap. Contextual fusion is used to combine related
information units without attention to temporal constraints. Their algorithm favours
parallelism and is thus based on an eager strategy: it continuously attempts to combine
input data. This may possibly lead to incorrect fusion and requires an undo mechanism
to be implemented.
1.2.3.2 Open Agent Architecture

Moran et al. [18] present another agent-based model for synchronizing input modes in
multimodal interaction. They have implemented multimodal interfaces as an application
of the Open Agent Architecture (OAA). The OAA is a general agent platform that
supports distributed and mobile agents that can react on multiple input modalities. A
modality coordination agent (MC agent) is responsible for combining the input in the
different modalities to produce a single meaning that matches the users intention. The
MC agent uses a simpler fusion mechanism than in [17]. If information is missing from
the given input, the MC agent searches the conversation context for an appropriate
reference. If it cannot be found, the agent waits for a certain amount of time (2-3
seconds) before asking the user to give additional information. This time delay is
designed to handle the synchronization of different input modalities that require a
different amount of time to process the input.
The OAA has been used to implement a wide range of systems, for example, an office
assistant that helps the user in information retrieval, a map-based tourist information
system that uses speech and gesture, and an air travel information system that is used
through a Web and telephone interface.

Modeling the fusion of multiple modalities is an important problem in multimodal

interaction. Both PAC-Amodeus and the Open Agent Architecture have been
successfully used to implement prototype systems, and these architectures can
support a wide range of modalities. However, the fusion of different input modalities
offers plenty of research opportunities.
1.2.3.3 Multimodal Architecture proposed by Obrenovic et al.

For the purposes of this document, we believe that the multimodal framework proposed
by Obrenovic et al. [21] is particularly suitable. This, as it is particularly geared towards
universal accessibility needs, covered under the Inclusive Design or Design for All
paradigms [22]. Specifically, it recognises that while multimodal interaction research
focuses on adding more natural human communication channels into HCI, accessibility
research is looking for substitute ways of communication when some of these
channels, due to various restrictions, are of limited bandwidth.
The proposed unified framework (Figure 3) describes user interfaces as a set of
communication channels, and connects these descriptions with user, environment, and
device profiles that describe limitations in usage of these channels. The proposed
model allows the flexible definition of various simple and complex constraints of
different types. The resulting constraint in a particular situation will be a combination of
the users state, abilities, and preferences, as well as various external factors relevant
to that situation. The proposed framework can be then be used (Table 2) to describe
various interaction modalities and interaction constraints. In particular, depending on
the particular disability of the person, and the resulting constraints (Table 3),
appropriate multimodal interfaces can be designed. Given these constraints, and the
particular context in which they take place, a particular activity (such as driving a car)
can also have particular effects on the modalities in use (Table 4).
Figure 3: Obrenovic et al framework [21]

Modality Composed of Effect which the modality uses Effect type

Pixel Visual sensory processing
Correct central field vision sensory
Letter Normal vision sharpness
Shape recognition of grouped pixels
Simple textual
Grouping of letters by proximity
presentation Word
Shape recognition of words perceptual
Text line Grouping of words by good continuation
Grouping of lines by proximity
Paragraph
Highlighting by the shape of the first line
Hand movement Hand movement
motor
(input modality) Pressure
Aimed hand Highlighting by shape of the cursor
movement Highlighting by motion perceptual
Visual feedback
(cursor) Highlighting by depth (cursor shadow)
Attention cognitive
Selection See aimed hand movement -
Grouping by surrounding of menu
borders
Grouping of items by proximity perceptual
Visual menu Menu
Highlighting by shape and color
presentation
(selected item)
Visual reading of item text
linguistic
Understanding of the menu language
Speech input Speaking
Simple speech linguistic
Listening
interaction Speech output
Attention cognitive
Table 2: Interaction modalities described using the Obrenovic et al framework [21]

Disability Effects reduced by a disability

Blindness Absence of all visual stimulus processing
Poor acuity (poor sharpness)
Reduced visual sharpness
Clouded vision
Tunnel vision Reduced peripheral vision
Central field loss Reduced central vision
Color blindness Reduced color sensation and contrast processing
Deafness Absence of all audio stimulus processing
Hard of hearing Reduced audio sensory processing
Weakness
Reduced movement and pressure
Limitations of muscular control
Limitations of sensation Reduced pressure
Joint problems
Reduced movement
Pain associated with movement
Dyslexia Reduced linguistic effects
Attention Deficit Disorder Reduced attention
Memory Impairments Reduced memory processes
Table 3: Disabilities and their constraints (from [21])

Constraints Situations (constraint Influence on the usage of effect

subclasses)
Car stopped No specific reductions.
It is not convenient to require the user to
Normal traffic situation move or use hands. Also, user's central
Traffic situation
field vision is directed toward the road.
(environmental
constraint) In addition to the normal traffic situation,
additional limitation is usage of user
Traffic jam
attention, as the user is more focused
and stressed.
Insignificant noise level No specific reductions.
Noise level A user's audio perception, audio 3D
(environmental Normal noise level cues, and speech can be used provided
constraint) that they are of significant intensity.
All audio effects are significantly
High noise level
reduced.
Day No specific reductions.
Visual conditions
(environmental Night Driving conditions are tougher; user is
constraint) more focused and stressed.
Fog
Dry No specific reductions
Weather condition
(environmental Rain Driving conditions are tougher; user is
constraint) more focused and stressed.
Snow
User current state The driver is relaxed No specific reductions
(emotional Limited usage of attention requests and
context) The driver is stressed
complex interaction modalities.
The driver is alone No specific reductions
Number of people
in the car (social Other user can use the application. Can
context) The driver is not alone affect the noise level and linguistic
effects.
Table 4: Constraints introduced by driving a car (from [97])
1.3 Multimodal Applications

Multimodal interfaces have been shown to have many advantages [23]: they prevent
errors, bring robustness to the interface, help the user to correct errors or recover from
them more easily, bring more bandwidth to the communication, and add alternative
communication methods to different situations and environments. Disambiguation of
error-prone modalities using multimodal interfaces is one important motivation for the
use of multiple modalities in many systems. As shown by Oviatt in [24], error-prone
technologies can compensate each other, rather than bring redundancy to the interface

and reduce the need for error correction. It should be noted, however, that multiple
modalities alone do not bring benefits to the interface: the use of multiple modalities
may be ineffective or even disadvantageous. In this context, Oviatt [11][25] has
presented the common misconceptions (myths) of multimodal interfaces, most of them
related to the use of speech as an input modality.
The types of modalities used, as well as the integration models vary widely from
application to application. The literature on applications that use multimodal interaction
is vast and could well deserve a survey of its own [26][27][28][29][30][31][32].
Therefore, we do not attempt a complete survey of multimodal applications. Instead we
give a general overview of some of the major areas by focusing on specific application
areas in which interesting progress has been made. In particular, we focus on the
areas below.
1.3.1 Ambient spaces

Computing is expanding beyond the desktop, integrating with everyday objects in a
variety of scenarios. As our discussions show, this implies that the model of user
interface in which a person sits in front of a computer is no longer the only model [33]
[34]. One of the implications of this is that the actions or events to be recognized by the
interface are not necessarily explicit commands. In smart conference room
applications, for instance, multimodal analysis has been applied mostly for video
indexing [35] (see [36] and [37] for social analysis applications). Although such
approaches are not meant to be used in real-time, they are useful in investigating how
multiple modalities can be fused in interpreting communication. It is easy to foresee
applications in which smart meeting rooms actually react to multimodal actions in the
same way that intelligent homes should [38]. Projects in the video domain include
MVIEWS [39], a system for annotating, indexing, extracting, and disseminating
information from video streams for surveillance and intelligence applications.
An analyst watching one or more live video feeds is able to use pen and voice to
annotate the events taking place. The annotation streams are indexed by speech and
gesture recognition technologies for later retrieval, and can be quickly scanned using a
timeline interface, then played back during review of the film. Pen and speech can also
be used to command various aspects of the system, including image processing
functions, with multimodal utterances such as Track this or If any object enters this
area, notify me immediately. In [40], the authors present a multimodal attentive
cookbook, which combines eye tracking and speech. In [41], human interaction events
are detected in a nursing home environment. The authors of [42] use multimodal input
in a smart home environment and the authors of [43] propose a design studio that uses
multimodal input. Interestingly, techniques used for video analysis can also be used in
the context of MMHCI. Examples include human activity recognition [44][45][46], work
in the context of meeting video analysis [47][35], event detection [48], surveillance [49],
and others. One of the main differences between some of the approaches developed
for video analysis and techniques for multimodal applications are the requirements in
processing speed. Nonetheless, much of what can be learned from video analysis
applications can be applied in the context of multimodal applications.
1.3.2 Mobile/wearable
The recent drop in costs of hardware has led to an explosion in the availability of

mobile computing devices. One of the major challenges is that while devices such as
PDAs and mobile phones have become smaller and more powerful, there has been
little progress in developing effective interfaces to access the increased computational
and media resources available in such devices. Mobile devices, as well as wearable
devices, constitute a very important area of opportunity for research in multimodal
applications, because natural interaction with such devices can be crucial in
overcoming the limitations of current interfaces. Several researchers have recognized
this, and many projects exist on mobile and wearable multimodal applications. The
authors of [50] integrate pen and speech input for PDA interaction. The use of
computer vision, however, is also being explored in projects such as [51], in which a
tourist can take photographs of a site to obtain additional information about the site. In
[52], the authors present two techniques (head tilt and gesture with audio feedback) to
control a mobile device. The authors of [53] use MMHCI to augment human memory:
RFID tags are used in combination with a head mounted display and a camera to
capture video and information of all the objects the user touches. The authors of [54]
combine eye tracking with video, head tracking, and hand motion information. The
authors of [55] use eye tracking to understand eye-hand coordination in natural tasks,
and in [56] eye tracking is used in a video blogging application, a very interesting area.
1.3.3 Virtual environments

Virtual reality has been a very active research area at the crossroads of computer
graphics, computer vision, and humancomputer interaction. One of the major
difficulties of VR systems is the HCI component, and many researchers are currently
exploring the use of multimodal applications to enhance the user experience. One
reason multimodal applications are very attractive in VR environments is that they help
disambiguate communication between users and the machine (in some cases virtual
characters, the virtual environment, or even other users represented by virtual
characters [57]). The authors of [58] integrate speech and gesture recognition for
interaction in an immersive environment. Speech and gesture inputs are also used in
[59], where the user communicates with an autonomous farmer.
1.3.4 Art
Perhaps one of the most exciting application areas of multimodal applications is art.
Vision techniques can be used to allow audience participation [60] and influence a
performance. In [61], the authors use multiple modalities (video, audio, pressure
sensors) to output different emotional states for Ada, an intelligent space that
responds to multimodal input from its visitors. In [62], a wearable camera pointing at
the wearers mouth interprets mouth gestures to generate MIDI sounds (so a musician
can play other instruments while generating sounds by moving his mouth). In [63], limb
movements are tracked to generate music. Multimodal applications can also be used in
museums to augment exhibitions [64].
1.3.5 Users with disabilities

People with disabilities can benefit greatly from multimodal technologies [65]. The
authors of [66] propose a component-based smart wheel chair system and discuss
other approaches that integrate various types of sensors (not only vision). The authors
of [65] also present a wheelchair navigation system. In [67], computer vision is used to

interpret facial gestures for wheel chair navigation. The authors of [68] introduce a
system for presenting digital pictures non-visually (multimodal output), and the
techniques in [69] can be used for interaction using only eye blinks and eye brow
movements. Some of the approaches in other application areas (e.g., [70]) could also
be beneficial for people with disabilitiesmultimodal applications have great potential
in making computers and other resources accessible to people with disabilities.
1.3.5.1 Users with disabilities automotive

We have already alluded above to various applications which have as their focus the
driver of a car. Crucial in this context is the ability of the multimodal system to track
large-scale body movements (head, arms, torso, and legs). There are three important
issues in articulated motion analysis [71]: representation (joint angles or motion of all
the sub-parts), computational paradigms (deterministic or probabilistic), and
computation reduction. Body posture analysis is important in many multimodal
applications. For example, in [72], the authors use a stereo and thermal infra-red video
system to estimate the drivers posture for deployment of smart air bags. The authors
of [73] propose a method for recovering articulated body pose without initialization and
tracking (using learning). The authors of [44] use pose and velocity vectors to
recognize body parts and detect different activities, while the authors of [45] use
temporal templates.
The authors of [74][75][72] present various approaches to monitor vehicle occupants,
including the driver. This is an interesting application area due to the constraints: since
the driver must focus on the driving task, traditional computer interfaces (e.g., GUIs)
are not suitable. Thus, it is an important area of opportunity for multimodal applications
research, particularly because depending on the particular deployment, vehicle
interfaces can be considered safety-critical.
As far as the rich set of multimodal interfaces found in most modern cars [76] provides
an excellent taxonomy, in which eight different input modalities (button, button with
haptic feedback, discrete knob, continuous knob, stalk control, multifunctional knobs,
slider, touchscreen, pedals, thumbwheel) and seven output modalities (analog
speedometer, digital speedometer, virtual analog speedometer, indicator lamp, shaped
indicator lamp, multifunctional display, digital display) are distinguished. How such
multimodal interfaces assist older users, with potentially limited mobility and cognitive
ability, is the focus of [77], which describes a companion contextual car driver interface
that pro-actively assists the driver in managing information and communication. Eye-
gaze interaction is also possible here researchers have developed a minivan that is
controlled with eye tracking, using a drivers eye movement to steer the vehicle [78].
1.3.5.2 Users with disabilities smart living spaces

The general idea behind applications within this category is that living spaces use
ambient intelligence and pervasive/wireless technologies in addition to multimodal
interaction, to enhance the quality of life of users with disabilities. In this context, [79]
[80] provide a review of several projects on both sides of the North Atlantic, which have
as their goal the creation of applications which support users in their activities either at
home, or, indeed, at work. In this vein, [81] describes the Millenium Homes project, the
result of which was a system providing appropriate, context-sensitive information to the
resident of the home, without the need for locating/carrying I/O devices for interaction.

For a particular type of user activity and location, appropriate multimodal interfaces
were designed, incorporating visual, speech-audio, appliance and environmental
activity to drive I/O actions. [82] also describes a smart home with a difference here
users employ a multimodal VR application (modes employed: visual, acoustic, tactile,
haptic) to check and change online the status of real devices (e.g. lights, window
blinds, heating) in the smart home. [83] describes the AutonomaMente project, which
has developed a highly customizable application based on multimodal communication
(speech, icons, text) to support autonomous living of persons with cognitive disabilities
in special apartments fitted with domotic sensors. Last but not least, the issue of
context awareness and multimodality in a smart home is also the subject of [84].
1.3.5.3 Users with disabilities workplace design

In some emerging multimodal applications, group and non-command actions play an
important role. In [35], visual features are extracted from head and hand/forearm blobs:
the head blob is represented by the vertical position of its centroid, and hand blobs are
represented by eccentricity and angle with respect to the horizontal. These features
together with audio features (e.g., energy, pitch, and speaking rate, among others) are
used for segmenting meeting videos according to actions such as monologue,
presentation, white-board, discussion, and note taking. The authors of [85] use only
computer vision, but make a distinction between body movements, events, and
behaviours, within a rule-based system framework.
1.3.5.4 Users with disabilities infotainment

There are quite a few multimodal applications which are designed to allow people with
various disabilities to access information. In [86], a multimodal application, based on
haptic and acoustic interaction is described which allows visually impaired people to
access media with high spatial information content. In related work, [87] describes
MultiAbile, a multimodal learning environment (comprising tactile feedback, voice,
gesturing, text simplification) geared towards differently abled people, while [88]
describes, using a framework quite similar to Obrenovic et al., Violin, a multimodal and
multichannel system offering enhanced web accessibility to visually or hearing impaired
users. The problem of cost in multimodal web accessibility is the focus of [89], which
describes a low-cost device offering enhanced web access to: quadriplegia users,
Parkinsons/armless users, and reduced vision users. Multimodal accessibility of
documents and computers is also the subject of [90] and [91], respectively. From a
different perspective, [23] tackles the problem of contextual mobility of users,
describing a system which offers differ multimodal interfaces to GIS applications.
Orientation of disabled users is also the theme of [92], which describes a framework for
map image analysis and presentation of the semantic information to blind users using
alternative modalities (haptics and audio, in this particular case).
Although it is not surprising that most multimodal applications developed tend to have a
bias towards the information dimension of the infotainment spectrum, a fresh
approach is the one espoused in [33], which describes a system, under the guise of a
treasure-hunting game, which provides alternative tools and interfaces to blind and
deaf-and-mute persons, enabling their interaction with the computer. In [93] an
infotainment presentation system is described which relies on eyegaze as an intuitive
and unobtrusive input modality. The application consists of a virtual showroom where a

team of two highly realistic 3D agents present product items in an entertaining and
attractive way. The infotainment arena is one (possibly the only) VERITAS application
scenario where research has been done on the use of olfaction thus, the
enhancement of multimedia infotainment content with olfactory data is the subject of
[89].
1.3.5.5 Users with disabilities personal healthcare and well-being

The development of applications which aim to improve personal healthcare and well-
being of individuals is of especial importance in supporting an independent life for older
(and potentially disabled) people. The situation is exacerbated by the fact that elderly
people are prone to accidents in the home, have a tendency to be afraid of falling
victim to crime, and cognitive problems do occur, with one of the most commonly
reported in the literature being an increased level of forgetfulness [81]. These are all
issues which smart homes, described above, have to potentially deal with. Internet-
based communication technologies have all made possible remote patient monitoring
of indicators such as ECG, blood pressure, oxygen saturation levels, heart and breath
sounds, assessments of mental and emotional states, etc. Communication between the
caretaker and the recipient of care in such cases through emotional channels has been
shown to be extremely important to the patient [94]. Existing systems employ avatars,
capable of displaying emotion, to remind the patient to take medication, showing
empathy to the user when a negative emotional state is detected [95]. The importance
of detecting events in a variety of contexts and settings (driving, at home, at work) has
also been previously mentioned. Of particular interest in this case is the detection of
events in a nursing home environment, and this is the subject of [41]. However,
assistive multimodal technologies should not be confined to a persons home, be it
domestic or nursing, and [96] treats the problem of assisting individuals with cognitive
disabilities to use public transportation systems via a multimodal interface incorporating
touch screen display and audio output, and a panic button, the latter initiating person-
to-person communication between mobile users and support communities in times of
fear or uncertainty. Last but not least, interacting with medical data, through multimodal
interfaces (data-glove, speech-recognition, 3D mouse) is also a possibility for disabled
users [97].
1.3.6 Public and private spaces

In this category we place applications in which interfaces are implemented to access
devices used in public or private spaces. One attractive example of implementation in
public spaces is the use of multimodal applications in information kiosks [98][99]. In
some ways these are ideal and challenging applications for natural multimodal
interaction: the kiosks are often intended to be used by a wide audience, thus there
may be few assumptions about the types of users of the system.
The range of tasks may also be wide, providing rich opportunities for multimodal
applications. On the other hand, we have multimodal applications in private spaces,
such as homes and cars, which we have reviewed above.
1.3.7 Other
Other applications include biometrics [100][101], surveillance, remote collaboration

[102], gaming and entertainment [103], education, and robotics ([104] gives a
comprehensive review of socially active robots). Multimodal applications can also play
an important role in safety-critical applications (e.g., medicine, military [105][106], etc.)
and in situations in which a lot of information from multiple sources has to be viewed in
short periods of time. A good example of this is crisis management [107].
2 Assistive Technologies (AT)
2.1 Assistive Technologies for visually impaired users
2.1.1 Electronic reading devices

A closed-circuit television device (CCTV) is a video magnification system that includes
a video camera, a video screen, and a control unit for image adjustment. CCTVs are
electronic reading devices that provide advanced magnification, illumination, and
contrast options for people with low vision, far exceeding what would be possible with a
low-tech optical device such as a hand-held magnifier or non-optical aids such as
large-print materials. A major advantage of CCTVs is that the magnifying power and
image size can be adjusted; the image can also be manipulated to produce high-
contrast output. CCTVs can be configured with television, video monitors, or computer
monitors. Some models, such as the Optelec ClearView 700 Series, can display split
output on one screen for both printed material and computer output.
CCTVs are available as desktop or portable devices. Desktop CCTVs are more
common, providing a 12'' to 19'' monitor or video display mounted above the camera.
Desktop CCTVs have a relatively wide range of features and permit more
customization. Desktop CCTVs are available for both monochrome and color monitors.
Contrast enhancement can be either gray scale or color. The foreground and
background contrast is adjustable or reversible. The maximum magnification exceeds
60x.
Portable CCTVs are lightweight, smaller unit and have a limited battery life. Portable
CCTVs are helpful for people who need to use them in different places within the home
or work setting or who need the device while traveling. Portable CCTVs typically have
less magnifying power (below 20x) than desktop models. Some portable models allow
the camera to be connected to a desktop video monitor or standard TV.
Examples: Optelec ClearView 700 Series, PocketViewer by Pulse Data International
Ltd., QuickLook Video Magnifier by Freedom Vision.
2.1.2 Lasercane
The LaserCane has three laser beam channels projected from the cane to detect
upward, forward, and downward obstacles. When there is an obstacle, it reflects the
laser beam, which is then detected by the receiver on the cane. The reflected signal
results in the LaserCane producing vibration or sounds to warn the user of the
obstruction. Users have the option to inactivate the sounds and use only vibration for
warnings. The LaserCane is used similarly to the standard long cane and operates with

two AA batteries. An advantage of the LaserCane is that it can detect obstacles in a

wide range both horizontally and vertically. LaserCanes can also be used during
orientation and mobility training for people with vision impairments. The LaserCane
does not work well when the user walks through a high-reflection surface, which results
in confusion of the reflective signal received by LaserCane. Problems may also occur
with a glass door, which may allow the laser beam to go through without any reflection.
2.1.3 Braille and refreshable braille

Braille is a system using six to eight raised dots in various patterns to represent letters
and numbers that can be read by the fingertips. Braille systems vary greatly around the
world. Some "grades" of braille include additional codes beyond standard alpha-
numeric characters to represent common letter groupings (e.g., "th," "ble" in Grade II
American English braille) in order to make braille more compact. An 8-dot version of
braille has been developed to allow all ASCII characters to be represented.
Refreshable or dynamic braille involves the use of a mechanical display where dots
(pins) can be raised and lowered dynamically to allow any braille characters to be
displayed. Refreshable braille displays can be incorporated into portable braille devices
with the capabilities of small computers, which can also be used as interfaces to
devices such as information kiosks.
Examples: Focus 40 BLUE Braille Display, Brailliant 32, BrailleConnect 40, PAC Mate
Portable Braille Display 40 Cells.
2.1.4 Screen magnifiers

Screen magnification is software used primarily by individuals with low vision that
magnifies a portion of the screen for easier viewing. At the same time screen
magnifiers make presentations larger, they also reduce the area of the document that
may be viewed, removing surrounding context. Some screen magnifiers offer two views
of the screen: one magnified and one default size for navigation.
Examples: iZoom Web, Magnifying Glass Pro 1.5, Zoomtext 9, LunarPlus 6.5.
2.1.5 Screen readers

Software used by individuals who are blind or who have dyslexia that interprets what is
displayed on a screen and directs it either to speech synthesis for audio output, or to
refreshable braille for tactile output. Some screen readers use the document tree (i.e.,
the parsed document code) as their input. Older screen readers make use of the
rendered version of a document, so that document order or structure may be lost (e.g.,
when tables are used for layout) and their output may be confusing.
Examples: Smart Hal, Jaws, Windows-Eyes, VoiceOver.
2.1.6 Text browsers

Text browsers are an alternative to graphical user interface browsers. They can be
used with screen readers for people who are blind. They are also used by many people
who have low bandwidth connections and do not want to wait for images to download.

2.1.7 Voice browsers

Voice browsers are systems which allow voice-driven navigation, some with both voice-
input and voice-output, and some allowing telephone-based Web access.
Examples: OptimTalk, IVAN.
2.1.8 Haptic devices

Today, haptic technology has become an important component of effectively accessing
information systems. A haptic device interacts with virtual reality interfaces in which
users are allowed to manipulate and obtain mechanical feedback (e.g., vibration) from
three-dimensional objects (e.g., images and graphs). The haptic interface is supported
by a real-time display of a virtual environment where users explore by pushing, pulling,
feeling, and manipulating the virtual objects with a device (e.g., a mouse or stylus).
Users are able to experience simulations of various characteristics of the objects and
the environment, such as mass, hardness, texture, and gravitational fields.
Haptic user interfaces are relatively new, but have been actively applied to the domain
of human-computer interaction in virtual environments since the early 1990s [108].
Haptic technology is widely used across a variety of domains, including medical,
automotive, mobile phone, entertainment, controls, education, training, rehabilitation,
assistive technology, and the scientific study of touch [109][108]. For example,
Immersion Corporation, a company recognized worldwide for developing, licensing,
and marketing haptic technology, reported that 2,000 medical simulators with haptic
technology have been sold worldwide to hospitals and teaching institutions to train
clinicians [110]. Haptic technology is also embedded in mobile phones to enhance
users communication experience related to ringtones, games, messaging, alerts,
dialing cues, and user interfaces for touch screen presses.
Examples: Sensable PHANTOM Desktop, Sensable PHANTOM Omni, Novint Falcon.
2.2 Assistive Technologies for hearing impaired users
2.2.1 Hearing aids

Hearing aids can be categorized into different styles based on where they are worn.
Body-Worn Hearing Aids: Body-worn (BW) hearing aids are larger devices, with most
controls and parts in a case worn on the body. The receiver is in an ear mold in the ear
connected to the body-worn case by wire. BW hearing aids are an earlier style of
hearing aid, used before the miniaturized electronic components and batteries were
developed. Today, BW hearing aids are used only by people who cannot use other
hearing devices, who need very high amplification, and who need to avoid acoustic
feedback [111]. The large size of this device has the advantage of making batteries
easier to replace.
Behind-the-Ear Hearing Aids: Behind-the-ear (BTE) hearing aids sit above and behind
the ear with all components contained in a plastic case. The amplified signal is
transmitted through a clear tube to a customized ear mold inside the ear. BTE hearing
aids are larger than other in-the-ear hearing aids, therefore they allow users to control

gain and the MTO switch. The MTO (microphone/telecoil/off) switch is used to select
different modes, according to hearing tasks and occasions: The Microphone (M) mode
is used for general communication occasions, the telecoil (T) operation mode is used
for telephone and induction system use, and the off (O) mode is used for battery saving
[111]. These additional functions are more effective for persons with severe to profound
hearing loss [112].
In-the-Ear Hearing Aids: In-the-ear (ITE) hearing aids fill the outer part of the ear with
all parts contained in a custom-made shell. ITE hearing aids can be further categorized
into: a) ITE aids, b) partially in-the-canal (ITC) aids, and c) complete-in-the-canal (CIC)
aids [111]. ITE aids are usually equipped with telecoil, where ITC and CIC aids normally
have no telecoil inside due to the small size and the possibility of normal telephone
use. The small size of the ITC and CIC hearing aids makes battery replacement
difficult. The limited battery size also restricts amplification power. However, these two
styles of hearing aid sit deeper in the canal, permitting higher signal pressure levels
(SPLs) [111]. The size of all ITE hearing aids results in greater possibility of feedback
due to the close distance between the microphone and speaker. Sophisticated designs
for ITE hearing aids are necessary to avoid feedback [111]. The controls of these
devices are also difficult to adjust; using a remote control system or automatic volume
control helps solve this problem [111].
Bone-Conduction Hearing Aids: Bone-conduction hearing aids are designed for
individuals with conductive hearing loss or occlusion of the ear canal that cannot be
treated surgically. The only difference between bone-conduction hearing aids and the
above-mentioned hearing aids is the receiver. The bone conduction aids can be BW or
BTE style with receivers (vibrators fitted in a headband) that transmit vibration to the
skull. The vibration is picked up directly by the cochlea.
2.2.2 Visual System for Telephone and Mobile Phone Use

Telephone communication for persons who are deaf, hard of hearing, or speech
impaired is made possible through using visual text display by a device called the
telecommunication device for the deaf (TDD). TDD users can connect their device to
phone line directly or through a coupler. A TDD transforms text messages into sound-
based code and sends out through the phone line. If the other party on the phone also
uses a TDD, he or she can use a TDD to decode the sound code into text. When a
TDD user calls a hearing person, telecommunication relay services (TRS) are needed
with an operator who reads the text message to the hearing person or vice versa [113].
For an individual who wants to speak with their own voice on the phone but cannot
hear the other party, a voice carry-over (VCO) phone allows him or her to talk on the
phone and read text display from the other party through a relay service. If the two
persons on the phone both have VCO phone with TDD keyboard, no relay service is
needed to place the call. Functional options available on TDDs are large print display,
printer, TDD call announcement, and portability. With proper software and hardware
installation, such as TDD software and modem, computers can also be used as a TDD
to communicate with another person.
Problems also exist for persons with hearing impairment or deafness in using a TDD
with mobile phones. Now major mobile service providers have made the network
service compatible for TDD. Newer model mobile phones have built-in TDD
compatibility features and are connected to TDD with a special cable. Interference

problems with using a mobile phone with a TDD have been reported. Kozma-Spytek
[114] further indicated that using a TDD with a mobile phone does not provide better
portability and convenience for its users. An interactive text pager provides an easier
way for people with hearing impairment to communicate with text messaging. This
device has a small QWERTY keyboard for thumb input, and it enables users to send
text message to another persons pager, mobile phone, or computer. Additional
functions available for this device include, for example, emailing, sending faxes, TDD
chat, and instant messaging. Two-way text messaging function is now available on
most mobile phones and service providers. Sending text messages through mobile
phones is very popular for mainstream users; however, using a numeral keypad for text
input is not time-efficient. New designed text-messaging mobile phones (e.g., T-mobile
Sidekick) now provides a compact QWERTY keyboard with large screen. These
phones allow better internet access and instant messaging. In addition to text
messaging, sending email and instant messaging are other popular ways to
communicate through text. Online instant messaging requires both users to be online
and using the same instant messaging program at the same time. However, text-
messaging mobile phones free their users from the restraints of time and place.
2.2.3 Visual notification devices

Visual notification devices allow deaf or hard of hearing users to receive a visual alert
of a warning or error message that might otherwise be issued by sound.
2.2.4 Gesture recognition

Although in human-human communication gestures are often performed using a variety
of body parts (e.g., arms, eyebrows, legs, entire body, etc.), most researchers in
computer vision use the term gesture recognition to refer exclusively to hand gestures.
Psycholinguistic studies of human-to-human communication [115] describe gestures as
the critical link between our conceptualizing capacities and our linguistic abilities.
Humans use a very wide variety of gestures ranging from simple actions of using the
hand to point at objects, to the more complex actions that express feelings and allow
communication with others. Gestures should, therefore, play an essential role in
multimodal human-computer interaction [116][117][102], as they seem intrinsic to
natural interaction between the human and the computer-controlled interface in many
applications, ranging from virtual environments [118] and smart surveillance [119], to
remote collaboration applications [102].
There are several important issues that should be considered when designing a
gesture recognition system [120]. The first phase of a recognition task is choosing a
mathematical model that may consider both the spatial and the temporal characteristics
of the hand and hand gestures. The approach used for modeling plays a crucial role in
the nature and performance of gesture interpretation. Typically, features are extracted
from the images or video, and once these features are extracted, model parameters
are estimated based on subsets of them until a right match is found. For ex-ample, the
system might detect n points and attempt to determine if these n points (or a subset of
them) could match the characteristics of points extracted from a hand in a particular
pose or performing a particular action. The parameters of the model are then a
description of the hand pose or trajectory and depend on the modeling approach used.
Among the important problems involved in the analysis are hand localization [121],

hand tracking [122], and the selection of suitable features [116]. After the parameters
are computed, the gestures represented by them need to be classified and interpreted
based on the accepted model and based on some grammar rules that reflect the
internal syntax of gestural commands. The grammar may also encode the interaction of
gestures with other communication modes such as speech, gaze, or facial expressions.
As an alternative to modeling, some authors have explored the use of combinations of
simple 2D motion based detectors for gesture recognition [123].
In any case, to fully exploit the potential of gestures for an application that support
multimodal interaction, the class of possible recognized gestures should be as broad
as possible and ideally any gesture performed by the user should be unambiguously
interpretable by the interface. However, most of the gesture-based HCI systems allow
only symbolic commands based on hand posture or 3D pointing. This is due to the
complexity associated with gesture analysis and the desire to build real-time interfaces.
Also, most of the systems accommodate only single-hand gestures. Yet, human
gestures, especially communicative gestures, naturally employ actions of both hands.
However, if the two-hand gestures are to be allowed, several ambiguous situations may
appear (e.g., occlusion of hands, intentional vs. unintentional, etc.) and the processing
time will likely increase.
2.3 Assistive Technologies for motor impaired users
2.3.1 Canes and walkers

Canes and walkers can, respectively, support up to 25% and 50% of a persons weight.
These devices can be lightweight if made from aluminum or carbon fiber. Proper fit
relative to posture and positioning is an important consideration with these mobility
devices. There should be 20 to 30 degrees of elbow flexion when holding a cane or
walker [124][125]. An improper fit can result in a stooped posture [126][124][125].
Canes and walkers rank high on non-use and dissatisfaction among older adults [127].
Problems with canes often reported by elders, include heavy weight and fear of falling
while using the device [124]. Proper fit and appropriate training is important for safety
and to meet individual needs. Proper footwear along with avoidance of uneven
surfaces and stairs are also important [125]. While canes and walkers can be
purchased from catalogs or drug stores without prescription, professional guidance
from a physical or occupational therapist will help ensure successful use [126][124].
Cane designs vary in types of handles; types of bases, and types of tips. Handles can
be crooked, ball-topped, straight, T-shaped, or shoveled. A cane with a single-point
base can benefit people with balance deficits, osteoarthritis, or vision impairment.
Typically, people who have mobility impairment as a result of stroke require more
stability and walk more slowly and thus are better candidates for canes with multiple-
point bases [126]. These canes are called quad high profile, quad wide, or quad narrow
[124]. The wider and higher the base, the more support it provides, yet canes with
larger bases are heavier. Cane tips are most commonly made from rubber. They
require good maintenance and timely replacement. Some cane tips are designed for
icy or snowy surfaces. Other cane accessories include seats, wrist straps, cane
holders, forearm cuffs, and forearm platforms. Some canes also integrate a reacher at
the end of the leg, which can help pick up items off the floor [128].

Walker designs include rigid, folding, or wheeled (two, three, or four wheels) [128].
Rigid walkers require lifting to move forward. Wheeled walkers are simply pushed
forward. Folding walkers make transport easier. Most two-wheeled walkers have
automatic breaks and some have an auto-glide feature that skims the surface. Three-
and four-wheeled walkers have hand brakes and even though they are heavier, they
require less strength and energy. Todays walkers include features such as seats, trays,
baskets, and platform arm supports [125].
Walkers are a popular choice of mobility AT because they are readily available, easy to
use, robust, and relatively inexpensive when compared to other alternatives, such as
wheelchairs. They are typically used by people who have difficulties with walking,
balancing, agility, and/or standing for prolonged periods of time. These impairments are
a result of various factors and diseases, such as Parkinsons disease, multiple
sclerosis, arthritis, or the normal aging process. Walkers are used for both rehabilitation
and compensation of impairment(s), with the majority of use focusing on compensation.
There are numerous designs of walkers, with most of them falling into one of three
categories: 1) walking frames; 2) two-wheeled walkers; and 3) four-wheeled walkers
[129].
2.3.2 Wheelchairs
Wheelchairs are used for more severe mobility impairments. In general, wheelchairs
have seats, backs, footrests, and casters. The presence of other familiar features such
as push-handles, wheel locks, and large rear wheels with push-rims depend on the
purpose and specific use of the chair [130]. As with walkers, the goal of this type of AT
is typically rehabilitation or compensation for a particular impairment or disability.
People who use wheelchairs are usually unable to walk, or have difficulty walking or
standing due to various neurological dysfunctions or musculoskeletal diseases or
difficulties (e.g. muscular weakness). Common impairments that often require the use
of a wheelchair include spinal cord injuries, hemiplegia and other types of paralysis,
multiple sclerosis, cerebral palsy, arthritis, and lower limb amputees [131]. There are
three categories of wheelchairs: 1) dependent mobility (wheelchairs that are propelled
by an attendant); 2) independent manual mobility (manually propelled by the user); and
3) independent powered mobility (motor-propelled) [131].
2.3.3 Reachers
Reachers are helpful, low-cost devices for older adults designed to pick up small or
large objects (e.g., cans, pans, dishes, books, CDs), and can be used in a variety of
activities such as dressing, cooking, and gardening [131][132]. These devices can be
used to reach for items stored on high shelves, preventing the more risky approach of
climbing on a chair, stool, or ladder. Reachers extend the range of motion of a person
with disabilities (e.g., low back problems, arthritis, hypertension, stroke) [133]. Older
adults use reachers to pick up remote controls and to take cups and dishes in and out
of cabinets [131][132]. Reachers can be purchased from department stores, ordered
from catalogs or web sites, or prescribed in a medical rehabilitation setting. A study
[133] of reacher use by older adults found that they preferred lightweight reachers with
adjustable length, a lock system for grip, lever action trigger, forearm and wrist support,
life-time guarantee, and one-hand use. One of the most important criteria for reachers
used by elders is that they are lightweight. Self-closing or locking mechanisms are also

important because they eliminate the need to grasp the handle for a prolonged period
of time [133].
2.3.4 Alternative keyboards or switches

Alternate keyboards or switches are hardware or software devices used by people with
physical disabilities, providing an alternate way of creating keystrokes that appear to
come from the standard keyboard. Examples include keyboard with extra-small or
extra-large key spacing, keyguards that only allow pressing one key at a time, on-
screen keyboards, eyegaze keyboards, and sip-and-puff switches. Web-based
applications that can be operated entirely from the keyboard, with no mouse required,
support a wide range of alternative modes of input.
2.3.5 Speech recognition

Speech (or voice) recognition is used by people with some physical disabilities or
temporary injuries to hands and forearms as an input method in some voice browsers.
Applications that have full keyboard support can be used with speech recognition.
2.3.6 Eye tracking

Eye tracking is the process of measuring either the point of gaze ("where we are
looking") or the motion of an eye relative to the head. An eye tracker is a device for
measuring eye positions and eye movement. Eye trackers are used in research on the
visual system, in psychology, in cognitive linguistics and in product design. There are a
number of methods for measuring eye movement. The most popular variant uses video
images from which the eye position is extracted. Other methods use search coils or are
based on the electrooculogram.
Gaze, defined as the direction to which the eyes are pointing in space, is a strong
indicator of attention, and it has been studied extensively since as early as 1879 in
psychology, and more recently in neuroscience and in computing applications [67].
While early eye tracking research focused only on systems for in-lab experiments,
many commercial and experimental systems are available today for a wide range of
applications.
Eye tracking systems can be grouped into wearable or non-wearable, and infra-red-
based or appearance-based. In infrared-based systems, a light shining on the subject
whose gaze is to be tracked creates a red-eye effect: the difference in reflection
between the cornea and the pupil is used to determine the direction of sight. In
appearance-based systems, computer vision techniques are used to find the eyes in
the image and then determine their orientation. While wearable systems are the most
ac-curate (approximate error rates below 1.4 vs. errors below 1.7 for non-wearable
infrared), they are also the most intrusive. Infrared systems are more accurate than
appearance-based, but there are concerns over the safety of prolonged exposure to
infrared lights. In addition, most non-wearable systems require (often cumbersome)
calibration for each individual [134].
Appearance-based systems usually capture both eyes using two cameras to predict
gaze direction. Due to the computational cost of processing two streams
simultaneously, the resolution of the image of each eye is often small. This makes such

systems less accurate, although increasing computational power and lower costs mean
that more computationally intensive algorithms can be run in real time. As an
alternative, in [135], the authors propose using a single high-resolution image of one
eye to improve accuracy. On the other hand, infra-red-based systems usually use only
one camera, but the use of two cameras has been proposed to further increase
accuracy [136].
Although most research on non-wearable systems has focused on desktop users, the
ubiquity of computing devices has allowed for application in other domains in which the
user is stationary (e.g., [75][136]). For example, the authors of [75] monitor driver visual
attention using a single, non-wearable camera placed on a cars dashboard to track
face features and for gaze detection.
Wearable eye trackers have also been investigated mostly for desktop applications (or
for users that do not walk wearing the device). Also, because of advances in hardware
(e.g., reduction in size and weight) and lower costs, researchers have been able to
investigate uses in novel applications (eye tracking while users walk). For example, in
[54], eye tracking data are combined with video from the users perspective, head
directions, and hand motions to learn words from natural interactions with users; the
authors of [55] use a wearable eye tracker to understand hand-eye coordination in
natural tasks, and the authors of [56] use a wearable eye tracker to detect eye contact
and record video for blogging.
The main issues in developing gaze tracking systems are intrusiveness, speed,
robustness, and accuracy. The type of hardware and algorithms necessary, however,
depend highly on the level of analysis desired. Gaze analysis can be performed at
three different levels [70]: a) highly detailed low-level micro-events, b) low-level
intentional events, and c) coarse-level goal-based events. Micro-events include micro-
saccades, jitter, nystagmus, and brief fixations, which are studied for their physiological
and psychological relevance by vision scientists and psychologists. Low-level
intentional events are the smallest coherent units of movement that the user is aware
of during visual activity, which include sustained fixations and revisits. Although most of
the work on HCI has focused on coarse-level goal-based events (e.g., using gaze as a
pointer [137]), it is easy to foresee the importance of analysis at lower levels,
particularly to infer the users cognitive state in affective interfaces (e.g., [138]). Within
this context, an important issue often overlooked is how to interpret eye-tracking data.
In other words, as the user moves his eyes during interaction, the system must decide
what the movements mean in order to react accordingly. We move our eyes 2-3 times
per second, so a system may have to process large amounts of data within a short
time, a task that is not trivial even if processing does not occur in real-time. One way to
interpret eye tracking data is to cluster fixation points and assume, for instance, that
clusters correspond to areas of interest. Clustering of fixation points is only one option,
however, and as the authors of [139] discuss, it can be difficult to determine the
clustering algorithm parameters. Other options include obtaining statistics on measures
such as number of eye movements, saccades, distances between fixations, order of
fixations, and so on.
2.3.7 Application-domain oriented assistive devices

Many assistive devices have been introduced in order to support driving for people with
motor impairments.

Examples: Button activated doors, support bars inside car, leg lifters,
lever/switch/button to eject/release belt tongue from buckle, lever/switch/button to open
door, gear control on steering wheel, accelerator lever, ring accelerator, brake radial
lever, floor mounted braking levers, floor mounted accelerator levers, light switches on
steering wheel, headrest controls, voice activated doors, voice controlled system to
eject/release belt tongue from buckle, voice control gear system, voice control brake
system, light voice controlled system, voice controlled radio, voice controlled window,
voice activated mirror, voice controlled navigation system, etc.
Various assistive devices are also used in workplace and in smart living places.
Examples: foot faucet controls, foot flush valve, sensor enabled faucet control, voice
controlled doors/windows, support bars, etc.
2.4 Assistive Technologies for cognitive impaired users
2.4.1 GPS Wristwatch

In the early stage of Alzheimers disease, people often experience forgetfulness and
disorientation. They may get lost when traveling in an unfamiliar area. The
spatiotemporal disorientation further results in wandering behavior that poses threats to
their safety and increases caregiver burden. More than 60% of people with Alzheimers
disease have exhibited wandering behavior [140]. To keep track of people with
Alzheimers disease at high risk of wandering, use of GPS technology offers promise
[140]. Products available in the market include wrist-wear GPS locator watches (GPS
Locator Watch by Werify, and Digital Angel for Senior Wanderers). The GPS Locator
Watch includes an integrated GPS and wireless transceiver that points out the exact
location of the watch wearer and allows caregivers to remotely communicate with the
Locator Watch. Other functions provided by this device are built-in pager, automatic
clock, removal prevention and alarm, remote safety lock/unlock for watch removal,
basic watch functions to display time and date, and emergency call function activated
by pressing two buttons simultaneously.
While providing slightly different functions than the GPS Locator Watch, Digital
AngelTM has two components: a wristwatch and a pager-sized clip-on wireless
transceiver. The functions included in Digital AngelTM are GPS locator, temperature,
and pulse monitoring by biosensors, emergency call, and an add-on sudden fall
detection feature. For both systems, monitoring services need to be activated and
subscribed with monthly charge in addition to the purchase of GPS devices. A new
product that will be available soon is a GPS Locator Phone by Werify. The new
technology provides a tiny-sized mobile phone with GPS transceiver with one-button
activate call to the local police department. Patterson et al. [141] stated that the
advantages of GPS are wide geographic coverage and automatic activation of the
device. GPS systems are expensive, and GPS technology does not work in buildings
and does not work well in areas with many trees or very tall buildings.
2.4.2 Activity compass

GPS technology was also used to integrate with artificial intelligence in providing
prompts to a destination. Researchers at the University of Washington designed a

prototype of Activity Compass that recorded the GPS reading into a PDA [141]. The
Activity Compass monitors which activity paths stored in the server it believes are in
progress based on the time, user location, and activity path. Visual guidance (arrows
for direction) then were provided on PDA monitor. The arrows guide the user for the
correct route to his/her most likely destination. The technologies used in Activities
Compass are hand-held computer, GPS receiver, and wireless technology. The most
recent version of Activity Compass uses only a smartphone and GPS receiver.
2.4.3 Tabbing through structural elements

Some accessibility solutions are adaptive strategies rather than specific assistive
technologies such as software or hardware. For instance, for people who cannot use a
mouse, one strategy for rapidly scanning through links, headers, list items, or other
structural items on a Web page is to use the tab key to go through the items in
sequence. People who are using screen readers whether because they are blind or
dyslexic may tab through items on a page, as well as people using voice recognition.
3 Multimodal interfaces and addressed user

groups
Sharon Oviatt [142] defines the Multimodal Interfaces as the interfaces that process
two or more combined user input modes (such as speech, pen, touch, manual gesture,
gaze, and head and body movements) in a coordinated manner with multimedia
system output. In this section, two lists of entities are defined: one regarding the
interfaces and one that describes various (impaired) user groups and the interfaces
that can be used in order to address their deficiencies.
3.1 Interfaces definitions

In this section, the definitions of various interfaces is described. For each interface type
the basic input/output is defined along with information about the required modality
presence to the users they address to.
There interfaces are categorized into four main categories, depending on the type of
information they exchange with the user:
Voice/Sound-based
Text-based
Vision-based
Gesture/Sign-based
Several of the interfaces described below may be used together in order to define a
multimodal interface system.
3.1.1 Voice/Sound-based Interfaces

This sub-section includes the interfaces that make use of voice or sound information to

communicate with the user.
3.1.1.1 Speech recognition interfaces

Definition: Speech Recognition is technology that can translate spoken words into
text.
Input from user: voice commands, macro-commands (isolated words, continuous
speech), prerecorded audio data.
Output to user: none, it can be used only as an input interface.
Requires user modalities: speech.
Addressed users: vision impaired, motor impaired.
3.1.1.2 Speech synthesis interfaces

Definition: Speech synthesis is the artificial production of human speech. A text-to-
speech (TTS) system converts normal language text into speech.
Input from user: none, it can be used only as an output interface.
Output to user: speech audio.
Requires user modalities: hearing.
Addressed users: vision impaired.
3.1.1.3 Screen reading interfaces

Definition: A screen reader is a software application that attempts to identify and
interpret what is being displayed on the screen.
Output to user: speech audio.
3.1.1.4 Voice browsing interfaces

Definition: A voice browser is a web browser that presents an interactive voice user
interface to the user. Just as a visual web browser works with HTML pages, a voice
browser operates on pages that specify voice dialogues.
Input from user: voice commands, macro-commands, isolated words.
Output to user: audio file playback or using text-to-speech software to render textual
information as audio.
Requires user modalities: speech (as input interface), hearing (as output interface).

3.1.1.5 Audio playback interfaces

Definition: Audio player refers to a device or application that is capable of playback of
audio data.
Output to user: sound alerts, prerecorded speech audio (explanations, guides,
examples, requests).
Addressed users: vision impaired, cognitive impaired.
3.1.2 Text-based Interfaces

This sub-section includes interfaces which depend on textual information for user
interactions.
3.1.2.1 Pen based interfaces

Definition: refers to a computer user-interface using a pen (or stylus) and tablet, rather
than devices such as a keyboard, joysticks or a mouse.
Input from user: text data, numbers, handwriting text, shapes (gesture recognition).
Requires user modalities: motor, vision.
Addressed users: cognitive impaired.
3.1.2.2 Alternative keyboards/switches interfaces

Definition: alternate keyboards or switches are hardware or software devices used by
people with physical disabilities, providing an alternate way of creating keystrokes that
appear to come from the standard keyboard. Examples include keyboard with extra-
small or extra-large key spacing, on-screen keyboards and sip-and-puff switches.
Input from user: pressure of surfaces (by hands, fingers, feet), keystrokes.
Requires user modalities: motor.
Addressed users: motor impaired, vision impaired, cognitive impaired.
3.1.2.3 Refreshable braille display interfaces

Definition: A refreshable Braille display or Braille terminal is an electro-mechanical
device for displaying Braille characters, usually by means of raising dots through holes
in a flat surface. Blind and vision impaired computer users, who cannot use a normal
computer monitor, use it to read text output.
Output to user: text strings represented by braille dots.


Addressed users: vision impaired, blind users.
3.1.3 Vision-based Interfaces

This sub-section includes interfaces which depend on visual information for
communicating with the users.
3.1.3.1 Gaze/Eye tracking interfaces

Definition: Eye tracking is the process of measuring either the point of gaze ("where
we are looking") or the motion of an eye relative to the head. An eye tracker is a device
for measuring eye positions and eye movement.
Input from user: eye motions.
Requires user modalities: vision.
Addressed users: motor impaired, cognitive impaired, speech impaired.
3.1.3.2 Facial expression recognition interfaces

Definition: A system used to recognise the facial expressions of the user from static
images or video sequences. Facial expressions such as smile, surprise, anger,
disgust, etc. may be assigned into specific system actions.
Input from user: face capture.
Requires user modalities: basic face motor capabilities.
Addressed users: motor impaired, cognitive impaired, speech impaired.
3.1.3.3 Augmented reality interfaces

Definition: Augmented reality is a live view of a physical real-world environment whose
elements are augmented by computer-generated sensory input such as sound, video,
graphics or GPS data.
Input from user: body motions, voice commands.
Output to user: synthetic animation, visual signals, 3d scene annotation,
environmental audio.
Requires user modalities: vision, motor and/or hearing.
Addressed users: cognitive impaired, hearing impaired.
3.1.3.4 Visual notification interfaces

Definition: A visual notification system uses visible stimuli to alert the user of an event
or emergency condition requiring action.


Output to user: visual alerts.
Addressed users: hearing impaired, cognitive impaired.
3.1.4 Gesture/Sign-based Interfaces

This sub-section refers to interfaces that use gestures or signs (symbols) in order to
communicate with the user.
3.1.4.1 Gesture recognition based interfaces

Definition: A gesture recogniser is a system that allows a form of non-verbal human-
computer communication in which visible bodily actions communicate particular
messages. Gestures include movement of the hands, face, or other parts of the body.
Input from user: hand, arm, face gestures.
Addressed users: speech impaired, users with limited motor capabilities.
3.1.4.2 Haptic interfaces

Definition: Haptic interfaces include a tactile feedback technology which takes
advantage of the sense of touch by applying forces, vibrations, or motions to the user.
Input from user: manipulation of the device (in most of the cases, by hands).
Output to user: tactile & force feedback, vibrations, or motions, haptic glyphs (hlyphs).
Requires user modalities: basic hand motor skills and sense of touch.
3.1.4.3 Sign language synthesis interfaces

Definition: interface that synthesises sign language gestures, e.g. using a virtual
avatar, from predefined text.
Output to user: synthesized (hand) gesture sequences.
Addressed users: hearing impaired.
3.2 User Groups Descriptions

In this sub-section several user groups and impairments are described. The possible
affected modalities of the impaired user group are following each description, as well

as the interfaces from the previous section that can be used to address the
deficiencies.
3.2.1 Elderly
Aging is associated with decreases in muscle mass and strength. These decreases
may be partially due to losses of motor neurons. By the age of 70, these losses occur
in both proximal and distal muscles. In biceps brachii and brachialis, old adults show
decreased strength (by 1/3) correlated with a reduction in the number of motor units (by
1/2). Old adults show evidence that remaining motor units may become larger as motor
units innervate collateral muscle fibers [143].
Concerning walking gait, when confronted to an unexpected slip or trip during walking,
compared to young adults, old adults have a less effective balance strategy: smaller
and slower postural muscle responses, altered temporal and spatial organization of the
postural response, agonist-antagonist muscles coactivation and greater upper trunk
instability. Comparing control and slip conditions, after the perturbation, young adults
have a longer stride length, a longer stride duration, and the same walk velocity
whereas old adults have a shorter stride length, the same stride duration, and a lower
walk velocity [144].
For the knee extensors, old adults produce less torque during dynamic or isometric
maximal voluntary contractions than young adults. The mechanisms controlling fatigue
in the elderly during isometric contractions are not the same as those that influence
fatigue during dynamic contractions, while young adults keep the same strategy. The
knee extensors of healthy old adults fatigue less during isometric contractions than do
those of young adults who had similar levels of habitual physical activity [145].
Old adults exhibit reductions in manual dexterity which is observed through changes in
fingertip force when gripping and/or lifting [146].
There are many diseases, disorders, and age-related changes that may affect the eyes
and surrounding structures. As the eye ages certain changes occur that can be
attributed solely to the aging process. Most of these anatomic and physiologic
processes follow a gradual decline. With aging, the quality of vision worsens due to
reasons independent of diseases of the aging eye. While there are many changes of
significance in the nondiseased eye, the most functionally important changes seem to
be a reduction in pupil size and the loss of accommodation or focusing capability
(presbyopia [147]).
Hearing loss is one of the most common conditions affecting older adults. One in three
people older than 60 and half of those older than 85 have hearing loss.
Memory loss is normal when it comes with aging.
Modalities affected: motor (speed, dexterity, fatigue resistance, gait), vision, hearing,
cognitive (memory loss).
Interfaces that address to this user group:
If vision deficiencies are present: Speech recognition interfaces, Speech
synthesis interfaces, Screen reading interfaces, Voice browsing interfaces,
Audio playback interfaces, Alternative keyboards/switches interfaces, Haptic
interfaces.

If motor deficiencies are present: Speech recognition interfaces, Alternative

keyboards/switches interfaces, Gaze/Eye tracking interfaces, Facial expression
recognition interfaces, Gesture recognition based interfaces.
If speech deficiencies are present: Gaze/Eye tracking interfaces, Facial
expression recognition interfaces, Gesture recognition based interfaces.
If hearing deficiencies are present: Visual notification interfaces, Sign language
synthesis interfaces, Augmented reality interfaces.
If cognitive deficiencies are present: Audio playback interfaces, Pen based
interfaces, Alternative keyboards/switches interfaces, Augmented reality
interfaces.
3.2.2 Cerebral Palsy

Cerebral Palsy (CP) is an umbrella term encompassing a group of non-progressive
[148], non-contagious motor conditions that cause physical disability in human
development, chiefly in the various areas of body movement [149].
In the industrialized world, the incidence of cerebral palsy is about 2 per 1000 live
births [150]. The Surveillance of Cerebral Palsy in Europe (SCPE) reported the
following incidence of comorbidities in children with CP (over 4,500 children over age 4
whose CP was acquired during the prenatal or neonatal period were included):
Mental disadvantage (IQ < 50): 31%
Active seizures: 21%
Mental disadvantage (IQ < 50) and not walking: 20%
Blindness: 11% [151].
Cerebral palsy (CP) is divided into four major classifications to describe different
movement impairments. The four major classifications are: spastic, ataxic,
athetoid/dyskinetic and mixed.
Spastic cerebral palsy is by far the most common type of overall cerebral palsy,
occurring in 80% of all cases [152]. People with this type of CP are hypertonic and
have what is essentially a neuromuscular mobility impairment (rather than hypotonia or
paralysis).
Modalities affected: motor (dexterity, aptitude), cognitive (low mentality).
Interfaces that address to this user group: Speech recognition interfaces,
Alternative keyboards/switches interfaces, Gaze/Eye tracking interfaces, Facial
expression recognition interfaces, Gesture recognition based interfaces, Audio
playback interfaces, Pen based interfaces, Alternative keyboards/switches interfaces,
Augmented reality interfaces.
Ataxia type symptoms can be caused by damage to the cerebellum. The forms of
ataxia are less common types of cerebral palsy, occurring in at most 10% of all cases.
Some of these individuals have hypotonia and tremors. Motor skills such as writing,
typing, or using scissors might be affected, as well as balance, especially while
walking. It is common for individuals to have difficulty with visual and/or auditory
processing.

Modalities affected: motor (dexterity, gait, balance), vision, hearing.

expression recognition interfaces, Gesture recognition based interfaces, Speech
recognition interfaces, Speech synthesis interfaces, Screen reading interfaces, Voice
browsing interfaces, Audio playback interfaces, Alternative keyboards/switches
interfaces, Haptic interfaces, Visual notification interfaces, Sign language synthesis
interfaces, Augmented reality interfaces.
Athetoid cerebral palsy or dyskinetic cerebral palsy is mixed muscle tone both
hypertonia and hypotonia mixed with involuntary motions. People with dyskinetic CP
have trouble holding themselves in an upright, steady position for sitting or walking,
and often show involuntary motions. For some people with dyskinetic CP, it takes a lot
of work and concentration to get their hand to a certain spot (like scratching their nose
or reaching for a cup). Because of their mixed tone and trouble keeping a position, they
may not be able to hold onto objects, especially small ones requiring fine motor control
(such as a toothbrush or pencil).
Modalities affected: motor (gait, sitting, hands-eyes coordination, motor skills).
3.2.3 Parkinson's Disease

Parkinson's disease (PD) is a degenerative disorder of the central nervous system.
Parkinson's disease affects movement, producing motor symptoms. Non-motor
symptoms, which include autonomic dysfunction, neuropsychiatric problems (mood,
cognition, behavior or thought alterations), and sensory and sleep difficulties, are also
common [153]. Four motor symptoms are considered cardinal in PD: tremor, rigidity,
slowness of movement, and postural instability.
Tremor is the most apparent and well-known symptom. It is the most common; though
around 30% of individuals with PD do not have tremor at disease onset, most develop
it as the disease progresses [153]. Bradykinesia (slowness of movement) is another
characteristic feature of PD, and is associated with difficulties along the whole course
of the movement process, from planning to initiation and finally execution of a
movement. Rigidity is stiffness and resistance to limb movement caused by increased
muscle tone, an excessive and continuous contraction of muscles. Postural instability is
typical in the late stages of the disease, leading to impaired balance and frequent falls,
and secondarily to bone fractures. Parkinson's disease can cause neuropsychiatric
disturbances which can range from mild to severe. This includes disorders of speech,
cognition, mood, behaviour, and thought.
Modalities affected: motor (tremors, rigidity, bradykinesia), cognition, speech.
Interfaces that address to this user group: Alternative keyboards/switches
interfaces, Facial expression recognition interfaces, Gesture recognition based
interfaces, Audio playback interfaces, Alternative keyboards/switches interfaces,
Augmented reality interfaces, Gaze/Eye tracking interfaces, Facial expression
recognition interfaces.

3.2.4 Rheumatoid Arthritis

Rheumatoid arthritis (RA) is a chronic, systemic inflammatory disorder that may affect
many tissues and organs, but principally attacks flexible joints. Rheumatoid arthritis
usually inflames multiple joints in a symmetrical pattern (both sides of the body
affected). Early symptoms may be subtle. The small joints of both the hands and wrists
are often involved. Symptoms in the hands with rheumatoid arthritis include difficulty
with simple tasks of daily living, such as turning door knobs and opening jars. The small
joints of the feet are also commonly involved, which can lead to painful walking. This
leads to a loss of cartilage and erosion and weakness of the bones as well as the
muscles, resulting in joint deformity, destruction, and loss of function. Rarely,
rheumatoid arthritis can even affect the joint that is responsible for the tightening of our
vocal cords to change the tone of our voice, the cricoarytenoid joint. When this joint is
inflamed, it can cause hoarseness of the voice [154].
Modalities affected: motor (range of motion, gait), speech (rarely).
Interfaces that address to this user group: Alternative keyboards/switches
interfaces, Gaze/Eye tracking interfaces, Facial expression recognition interfaces,
Gesture recognition based interfaces, Facial expression recognition interfaces.
3.2.5 Osteoarthritis
Osteoarthritis (OA) is a group of mechanical abnormalities involving degradation of
joints [155] including articular cartilage and subchondral bone. Symptoms may include
joint pain, tenderness, stiffness, locking, and sometimes an effusion. The main
symptom is pain, causing loss of ability and often stiffness. OA commonly affects the
hands, feet, spine, and the large weight bearing joints, such as the hips and knees,
although in theory, any joint in the body can be affected. As OA progresses, the
affected joints appear larger, are stiff and painful, and usually feel better with gentle use
but worse with excessive or prolonged use, thus distinguishing it from rheumatoid
arthritis.
Modalities affected: motor (range of motion, gait).
3.2.6 Gonarthritis
Gonarthritis, or knee arthritis, results in mechanical pain, in other words, pain that
increases when walking, particularly going up or down stairs. Cracking or instability of
the knee may be noted (the knee seems to give way). Frequently, in addition to
persistent mechanical pain there may be cycles of intense pain accompanied by
inflammation of the joint with effusion. In the most advanced stages there may exist a
decrease in the range of knee mobility (patients are not able to fully extend or bend it)
[156].
Modalities affected: motor (pain while walking, limited knee range of motion).

3.2.7 Coxarthritis
Hip arthritis typically affects patients over 50 years of age. It is more common in people
who are overweight. The most common symptoms of hip arthritis are pain with
activities, limited range of motion, stiffness of the hip, walking with a limp [157].
Modalities affected: motor (gait, hip stiffness, limited hip range of motion).
3.2.8 Adhesive capsulitis of shoulder

Adhesive capsulitis of shoulder (Frozen shoulder) is a disorder in which the shoulder
capsule, the connective tissue surrounding the glenohumeral joint of the shoulder,
becomes inflamed and stiff, greatly restricting motion and causing chronic pain.
Adhesive capsulitis is a painful and disabling condition that often causes great
frustration for patients and caregivers due to slow recovery. Movement of the shoulder
is severely restricted [158].
Modalities affected: motor (limited shoulder range of motion, pain).
3.2.9 Spinal cord injury

Spinal cord injury (SCI) refers to any injury to the spinal cord that is caused by trauma
instead of disease [159]. Depending on where the spinal cord and nerve roots are
damaged, the symptoms can vary widely, from pain to paralysis to incontinence [160]
[161]. Spinal cord injuries are described at various levels of "incomplete", which can
vary from having no effect on the patient to a "complete" injury which means a total loss
of function.
Modalities affected: motor (varies).
3.2.10 Hemiparesis
Hemiparesis is weakness on one side of the body. It is less severe than hemiplegia
the total paralysis of the arm, leg, and trunk on one side of the body. Thus, the patient
can move the impaired side of his body, but with reduced muscular strength.
Depending on the type of hemiparesis diagnosed, different bodily functions can be
affected. People with hemiparesis often have difficulties maintaining their balance due
to limb weaknesses leading to an inability to properly shift body weight. This makes

performing everyday activities such as dressing, eating, grabbing objects, or using the
bathroom more difficult. Hemiparesis with origin in the lower section of the brain
creates a condition known as ataxia, a loss of both gross and fine motor skills, often
manifesting as staggering and stumbling [162].
Right-sided hemiparesis involves injury to the left side of the person's brain, which is
the side of the brain controlling speech and language. People who have this type of
hemiparesis often experience difficulty with talking and understanding what others say
[162].
In addition to problems understanding or using speech, persons with right-sided
hemiparesis often have difficulty distinguishing left from right. When asked to turn left
or right, or to raise a left or right limb, many affected with right-sided hemiparesis will
either turn/raise limb/etc. in the wrong direction or simply not follow the command at all
due to an inability to process the request [162].
Left-sided hemiparesis involves injury to the right side of the person's brain, which
controls learning processes, certain types of behavior, and non-verbal communication.
Injury to this area of a person's brain may also cause people to talk excessively, have
short attention spans, and have problems with short-term memory [162].
Modalities affected: motor (reduced muscular strength, loss of balance).
3.2.11 Hemiplegia
Hemiplegia is total paralysis of the arm, leg, and trunk on the same side of the body
[163]. Hemiplegia is more severe than hemiparesis, wherein one half of the body has
less marked weakness. Hemiplegia is not an uncommon medical disorder. In elderly
individuals, strokes are the most common cause of hemiplegia. In children, the majority
of cases of hemiplegia have no identifiable cause and occur with a frequency of about
one in every thousand births. Experts indicate that the majority of cases of hemiplegia
that occur up to the age of two should be considered to be cerebral palsy until proven
otherwise.
Hemiplegia problems may include: difficulty with gait, balance while standing or
walking, having difficulty with motor activities like holding, grasping or pinching,
increasing stiffness of muscles, muscle spasms, difficulty with speech, difficulty
swallowing food, significant delay in achieving developmental milestones like standing,
smiling, crawling or speaking. The majority of children who develop hemiplegia also
have abnormal mental development. Behavior problems like anxiety, anger, irritability,
lack of concentration or comprehension.
Modalities affected: motor (paralysis, gait, affects facial expressions), speech
(sometimes).
Interfaces that address to this user group: Speech recognition interfaces (when
there is not a speech deficiency present), Alternative keyboards/switches interfaces,
Gaze/Eye tracking interfaces, Facial expression recognition interfaces, Gesture
recognition based interfaces.

3.2.12 Cerebrovascular accident (Stroke)

Cerebrovascular accident (CVA), or stroke, is the rapid loss of brain function(s) due to
disturbance in the blood supply to the brain. As a result, the affected area of the brain
cannot function, which might result in an inability to move one or more limbs on one
side of the body, inability to understand or formulate speech, or an inability to see one
side of the visual field [164].
Stroke can affect patients physically, mentally, emotionally, or a combination of the
three. The results of stroke vary widely depending on size and location of the lesion
[165]. Dysfunctions correspond to areas in the brain that have been damaged. Some of
the physical disabilities that can result from stroke include muscle weakness,
numbness, pressure sores, pneumonia, incontinence, apraxia (inability to perform
learned movements), difficulties carrying out daily activities, appetite loss, speech loss,
vision loss, and pain. Cognitive deficits resulting from stroke include perceptual
disorders, speech problems, dementia, and problems with attention and memory.
Modalities affected: motor, speech, cognition, vision.
Interfaces that address to this user group:
If vision deficiencies are present: Speech recognition interfaces, Speech
synthesis interfaces, Screen reading interfaces, Voice browsing interfaces,
Audio playback interfaces, Alternative keyboards/switches interfaces, Haptic
interfaces.
If motor deficiencies are present: Speech recognition interfaces, Alternative
keyboards/switches interfaces, Gaze/Eye tracking interfaces, Facial expression
recognition interfaces, Gesture recognition based interfaces.
If speech deficiencies are present: Gaze/Eye tracking interfaces, Facial
If cognitive deficiencies are present: Audio playback interfaces, Pen based
interfaces, Alternative keyboards/switches interfaces, Augmented reality
interfaces.
3.2.13 Multiple Sclerosis

Multiple Sclerosis (MS) affects the ability of nerve cells in the brain and spinal cord to
communicate with each other effectively. A person with MS can suffer almost any
neurological symptom or sign, including changes in sensation such as loss of
sensitivity or tingling, pricking or numbness (hypoesthesia and paresthesia), muscle
weakness, clonus, muscle spasms, or difficulty in moving; difficulties with coordination
and balance (ataxia); problems in speech (dysarthria) or swallowing (dysphagia), visual
problems (nystagmus, optic neuritis including phosphenes [166][167], or diplopia),
fatigue, acute or chronic pain, and bladder and bowel difficulties [168]. Cognitive
impairment of varying degrees and emotional symptoms of depression or unstable
mood are also common [168].
Modalities affected: motor (weakness, tremors, balancing, fatigue), vision, cognition.
expression recognition interfaces, Audio playback interfaces, Speech synthesis

interfaces, Screen reading interfaces, Voice browsing interfaces, Audio playback

interfaces, Alternative keyboards/switches interfaces, Haptic interfaces.
3.2.14 Age-related macular degeneration

Age-related macular degeneration (AMD) is a medical condition which usually affects
older adults and results in a loss of vision in the center of the visual field (the macula)
because of damage to the retina. It occurs in dry and wet forms. It is a major cause
of blindness and visual impairment in older adults (>50 years). Macular degeneration
can make it difficult or impossible to read or recognize faces, although enough
peripheral vision remains to allow other activities of daily life [169].
Modalities affected: vision (loss of vision in the center of the visual field).
Interfaces that address to this user group: Speech recognition interfaces, Speech
synthesis interfaces, Screen reading interfaces, Voice browsing interfaces, Audio
playback interfaces, Alternative keyboards/switches interfaces, Haptic interfaces.
3.2.15 Glaucoma
Glaucoma is an eye disease in which the optic nerve is damaged in a characteristic
pattern. This can permanently damage vision in the affected eye(s) and lead to
blindness if left untreated [170]. Glaucoma signs are gradually progressive visual field
loss, and optic nerve changes.
Modalities affected: vision (lvisual field loss).
3.2.16 Color Vision Deficiency

Color Vision Deficiency (color blindness, protanopia / protanomaly, deuteranopia /
deuteranomaly, tritanopia / tritanomaly) is the inability or decreased ability to see color,
or perceive color differences, under lighting conditions when color vision is not normally
impaired [171].
3.2.17 Cataract
Cataract is a clouding that develops in the crystalline lens of the eye or in its envelope
(lens capsule), varying in degree from slight to complete opacity and obstructing the
passage of light. As a cataract becomes more opaque, clear vision is compromised. A
loss of visual acuity is noted. Contrast sensitivity is also lost, so that contours, shadows
and color vision are less vivid. Veiling glare can be a problem as light is scattered by
the cataract into the eye [172].


3.2.18 Diabetic retinopathy

Diabetic retinopathy is damage to the retina, caused by complications of diabetes
mellitus, which can eventually lead to blindness. It is an ocular manifestation of
systemic disease which affects up to 80% of all patients who have had diabetes for 10
years or more [173]. In general, however, a person with macular edema is likely to
have blurred vision, making it hard to do things like read or drive. In some cases, the
vision will get better or worse during the day. As new blood vessels form at the back of
the eye as a part of proliferative diabetic retinopathy (PDR), they can bleed (ocular
hemorrhage) and blur vision.
Modalities affected: vision (blur).
3.2.19 Otitis
Otitis is a general term for inflammation or infection of the ear. It is subdivided into otitis
externa, media and interna. Otitis externa, external otitis, or "swimmer's ear" involves
the outer ear and ear canal. In external otitis, the ear hurts when touched or pulled.
When enough swelling and discharge in the ear canal is present to block the opening,
external otitis may cause temporary conductive hearing loss .
Otitis media or middle ear infection involves the middle ear. In otitis media, the ear is
infected or clogged with fluid behind the ear drum, in the normally air-filled middle-ear
space. Children with recurrent episodes of acute otitis media and those suffering from
otitis media with effusion or chronic otitis media, have higher risks of developing
conductive and sensorineural hearing loss [174].
Otitis interna is an inflammation of the inner ear and is usually considered synonymous
with labyrinthitis. It results in severe vertigo lasting for one or more days. In rare cases,
hearing loss accompanies the vertigo in labyrinthitis.
Modalities affected: hearing.
Interfaces that address to this user group: Visual notification interfaces, Sign
language synthesis interfaces, Augmented reality interfaces.
3.2.20 Otosclerosis
Otosclerosis is an abnormal growth of bone near the middle ear. It can result in hearing
loss. Otosclerosis can result in conductive and/or sensorineural hearing loss. The
primary form of hearing loss in otosclerosis is conductive hearing loss (CHL) whereby
sounds reach the ear drum but are incompletely transferred via the ossicular chain in
the middle ear, and thus partly fail to reach the inner ear (cochlea). On audiometry, the

hearing loss is characteristically low-frequency, with higher frequencies being affected

later. Sensorineural hearing loss (SNHL) has also been noted in patients with
otosclerosis; this is usually a high-frequency loss, and usually manifests late in the
disease [175].
Modalities affected: hearing (low and high frequency hearing loss).
3.2.21 Noise induced hearing loss

Noise induced hearing loss (NIHL) is an increasingly prevalent disorder that results
from exposure to high-intensity sound, especially over a long period of time. NIHL is a
preventable hearing disorder that affects people of all ages and demographics. The ear
can be exposed to short periods in excess of 120 dB without permanent harm albeit
with discomfort and possibly pain; but long term exposure to sound levels over 80 dB
can cause permanent hearing loss [176].
3.2.22 Profound hearing loss

Profound hearing loss (deafness) is a disability wherein the ability to detect certain
frequencies of sound is completely or partially impaired. Deafness can mean the same
thing, but is more commonly applied to the case of severe or complete hearing
impairment. The severity of a hearing impairment is ranked according to the additional
intensity above a nominal threshold that a sound must be before being detected by an
individual; it is measured in decibels of hearing loss, or dB HL. In cases where the
subject suffers of 91dB HL, it is considered as profound hearing loss [177].
3.2.23 Presbycusys
Presbycusys (age-related hearing loss) is the cumulative effect of aging on hearing.
Also known as presbyacusis, it is defined as a progressive bilateral symmetrical age-
related sensorineural hearing loss [178]. The hearing loss is most marked at higher
frequencies. Hearing loss that accumulates with age but is caused by factors other
than normal aging is not presbycusis, although differentiating the individual effects of
multiple causes of hearing loss can be difficult.

3.2.24 Stuttering
Stuttering (alalia syllabaris) affects approximately 1% of the adult population. also
known as stammering (alalia literalis or anarthria literalis), is a speech disorder in which
the flow of speech is disrupted by involuntary repetitions and prolongations of sounds,
syllables, words or phrases, and involuntary silent pauses or blocks in which the
stutterer is unable to produce sounds [179]. The term stuttering is most commonly
associated with involuntary sound repetition, but it also encompasses the abnormal
hesitation or pausing before speech, referred to by stutterers as blocks, and the
prolongation of certain sounds, usually vowels and semivowels.
Modalities affected: speech.
Interfaces that address to this user group: Gaze/Eye tracking interfaces, Facial
3.2.25 Cluttering
Cluttering (tachyphemia) is a speech disorder and a communication disorder
characterized by speech that is difficult for listeners to understand due to rapid
speaking rate, erratic rhythm, poor syntax or grammar, and words or groups of words
unrelated to the sentence. Cluttering has in the past been viewed as a fluency disorder
[180].
3.2.26 Muteness
Muteness (mutism) is complete inability to speak. Those who are physically mute may
have problems with the parts of the human body required for speech (the throat, vocal
cords, lungs, mouth, or tongue, etc.). Being mute is often associated with deafness as
people who have been unable to hear from birth may not be able to articulate words
correctly (see Deaf-mute). A person can be born mute, or become mute later in life as a
result of injury or disease [181].
3.2.27 Dysarthria
Dysarthria is a weakness or paralysis of speech muscles caused by damage to the
nerves and/or brain. Dysarthria is often caused by strokes, parkinsons disease, ALS,
head or neck injuries, surgical accident, or cerebral palsy [182].

3.2.28 Dementia
Dementia is a serious loss of global cognitive ability in a previously unimpaired person,
beyond what might be expected from normal aging. Dementia is not a single disease,
but rather a non-specific illness syndrome (i.e., set of signs and symptoms) in which
affected areas of cognition may be memory, attention, language, and problem solving
[183].
Modalities affected: cognitive (attention and memory loss).
Interfaces that address to this user group: Audio playback interfaces, Pen based
interfaces, Alternative keyboards/switches interfaces, Augmented reality interfaces.
3.2.29 Alzheimer's disease

Alzheimer's disease (AD) is the most common form of dementia. The symptoms will
progress from mild cognitive problems, such as memory loss through increasing stages
of cognitive and non-cognitive disturbances, eliminating any possibility of independent
living [184].
Modalities affected: cognitive (memory loss).
Interfaces that address to this user group: Audio playback interfaces, Pen based
interfaces, Alternative keyboards/switches interfaces, Augmented reality interfaces.
4 Multimodal Interaction Models

The main aim of this section is to provide the specifications of the multimodal
interaction models, which will include combinations of interfacing modalities most
suited for the target user groups and will connect the Virtual User Models, developed in
SP1, to the Task Models and the virtual prototype to be tested.
The multimodal interaction models describe three things:
the alternative ways of a primitive tasks execution with respect to the different
target user groups
the replacement modalities
the usage of assistive devices for each application sector
In the following subsections, an overview of the Multimodal Interaction Models will by
given and their specifications will be defined. The Multimodal Interaction Models are
separated into: a) generic models that are independent of the application sector and b)
specific models that can be applied only to a specific application domain. The specific
models will be further analysed for five domains: automotive, smart living spaces,
workplace, infotainment and personal healthcare.
4.1 Overview
The Interaction Models have to cover three basic dimensions concerning the
interaction between the Virtual User and the virtual prototype to be tested:

Multiple Users: denote the support for users with different disabilities
Multiple Modalities: identify the need to support different interaction styles in
different situations.
Multiple Assistive Devices: reflect the need to support multiple interaction
resources and assistive devices.
The multimodal interaction models will have flexibility to allow the adjustment of the
level of sensorial capabilities needed for the interaction, in order to evaluate the
usability level and the interaction effectiveness in relation to the level and type of
impairment of the virtual user.
Figure 4: Multimodal Interaction Models architecture overview diagram.
As depicted in Figure 4, the Multimodal Interaction Models will describe the execution
of the primitive tasks using different modalities and assistive devices, with respect to
the different target groups.
Task Models describe how a complex task can be analyzed into primitive tasks, in an
abstract way, without taking into account alternative ways of tasks execution using
different modalities and/or assistive devices.
The Multimodal Interaction Models intend to fill this gap. More specifically, the
Multimodal Interaction Models will describe the alternative ways of a primitive tasks
execution, using different modalities and/or assistive devices, with respect to the
disabilities of the target user groups. Then, the Task Models in conjunction with the
Multimodal Interaction Models and the Virtual User Models will be used by the Modality
Compensation and Replacement Module included in the Simulation Platform. The
Modality Compensation and Replacement Module will utilize the characteristics of the
simulated Virtual User Model, in order to convert, whenever possible, modalities that
are not perceived, due to a specific disability, from one sensory channel into another
normally perceived communication channel (e.g. aural information could be

dynamically transformed into text or sign language for hearing impaired users).
Additionally, it will decide if assistive devices should be used during the simulation
process performed by the Simulation Platform.
For every primitive task (with regard to each application sector) that supports different
ways of successful execution, a multimodal interaction model has to be defined.
UsiXML [2] will be used for the definition of the multimodal interaction models, similarly
to the definition of the Task and Simulation Models.
4.2 Generic structure

The generic structure of a Multimodal Interaction Model is presented in Figure 5. For an
initial primitive task, there can be one or more alternative primitive tasks or alternative
execution ways, which may include more than one primitive tasks which must be
executed in a specific order. For instance, as depicted in Figure 5, the successful
execution of Alternative_primitive_task_n1, Alternative_primitive_task_n2 and
Alternative_primitive_task_nN in sequence is equal to the successful execution of the
initial Primitive task.
Figure 5: Generic structure of a Multimodal Interaction Model
Figure 6 presents an example of a Multimodal Interaction Model. The root node is a

primitive task and the leaves describe alternative primitive tasks that can be performed
using different modalities (motor, speech control) and different objects/parts of the
virtual prototype that may be assistive devices (voice activated doors, button that
opens doors) or not (door handle). The UsiXML [185] source code that describes the
interaction model of Figure 6, is depicted in CodeSnippet 1.
Figure 6: Multimodal Interaction Model "grasp door" example.

<?xml version="1.0" encoding="UTF-8"?>

<taskmodel>
<task id="st0task0" name="Grasp_door_handle" type="abstraction">
<task id="st0task1" name="Grasp(modality:motor)(means:hand)
(object:door_handle)" type="interaction"/>
<task id="st0task2" name="Speak(modality:speech_control)
(object:voice_activated_doors)" type="interaction"/>
<task id="st0task3" name="Push(modality:motor)(means:hand,elbow)
(object:button_that_opens_doors)" type="interaction"/>
</task>
<deterministicChoice>
<source sourceId="st0task1"/>
<target targetId="st0task2"/>
</deterministicChoice>
</taskmodel>
CodeSnippet 1: Multimodal Interaction Model example (Grasp door handle) - UsiXML

source code
The execution order of the alternative tasks can be defined by the following temporal
operators [186]:
Enabling: specify that a target task cannot begin until source task is finished
(Figure 7).
Figure 7: Enabling relationship -

Indicative explanatory case
Enabling with information passing: specifies that a target task cannot be

performed until the source task is performed and the information produced by
the source task is used by the target task (Figure 8).

Figure 8: Enabling relationship with

information passing Indicative
explanatory case.
Choice: such relationships specify that after the successful execution of a

source task, anyone of the two target tasks linked with choice relationship could
start its execution (Figure 9).
Figure 9: Choice relationship Indicative

explanatory case.
Concurrency: this relationship specifies that two tasks are executed

concurrently and independently and there is no information interchange (Figure
10).
Figure 10: Concurrency relationship -

Indicative explanatory case.
Concurrency with information passing: such relationships specify that two

tasks are executed concurrently and there is information interchange between
them (Figure 11).

Figure 11: Concurrency with information

passing relationship Indicative explanatory
case.
Order independency: specifies that two tasks are independent of the order of
execution (Figure 12).
Figure 12: Order independency Indicative

explanatory case.
Disabling: refers to a source task that is completely interrupted by a target task

(Figure 13).
Figure 13: Disabling relationship Indicative

explanatory case.
Suspend/Resume: this kind of relationship refers to a source task that can be

partially interrupted by a target task and after the target task is completed, the
source task will finish its execution (Figure 14).

Figure 14: Suspend/Resume relationship

Indicative explanatory case.
4.3 Generic Multimodal Interaction Models

In this section, the specifications of some generic Multimodal Interaction Models, which
are independent of the application sector, are presented.
4.3.1 Walk
The definition of the walk generic Multimodal Interaction Model is presented in Table
5. A visual representation of the relationships is given through Figure 15. The source
code is contained in CodeSnippet 2.
Task Modality Task Disability Alternative Alternative Alternative
object task(s) modality task object
/ assistive
device
Walk Motor Wheelchair Roll (hands) Motor Wheelchair
users
Lower limb Grasp Motor Support bar
impaired
Shuffle (feet) Motor
Table 5: Walk Multimodal Interaction Model definition.

Figure 15: Walk Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Walk" type="abstraction">
<task id="st0task1" name="Walk(modality:motor)" type="interaction"/>
<task id="st0task2" name="Roll(modality:motor)(means:hands)
(object:wheelchair)" type="interaction"/>
<task id="st0task3" name="Use_support_bar" type="abstraction">
<task id="st0task4" name="Grasp(modality:motor)(object:support_bar)"
type="interaction"/>
<task id="st0task5" name="Shuffle(modality:motor)(means:feet)"
</task>
</task>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 2: Walk Multimodal Interaction Model (UsiXML source code).
4.3.2 See
The definition of the see generic Multimodal Interaction Model is presented in Table 6.
A visual representation of the relationships is given through Figure 16. The source code
is contained in CodeSnippet 3.


/ assistive
device
See Vision Visually Touch (hand) Motor
impaired
Table 6: See Multimodal Interaction Model definition.
Figure 16: See Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="See" type="abstraction">
<task id="st0task1" name="See(modality:vision)" type="interaction"/>
<task id="st0task2" name="Touch(modality:motor)(means:hand)"
</task>
</taskmodel>
CodeSnippet 3: See Multimodal Interaction Model (UsiXML source code).
4.3.3 Hear
The definition of the hear generic Multimodal Interaction Model is presented in Table
7. A visual representation of the relationships is given through Figure 17. The source
code is contained in CodeSnippet 4.
/ assistive
device
Hear Audition Audio Hearing See Vision Visual cues
cues impaired
Table 7: Hear Multimodal Interaction Model definition.

Figure 17: Hear Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Hear" type="abstraction">
<task id="st0task1" name="Hear(modality:audition)
(object:audio_cues)" type="interaction"/>
<task id="st0task2" name="See(modality:vision)(object:visual_cues)"
</task>
</taskmodel>
CodeSnippet 4: Hear Multimodal Interaction Model (UsiXML source code).
4.4 Automotive Multimodal Interaction Models

In this subsection, four indicative Multimodal Interaction Models which concern the
automotive application area will be presented. Their definitions are included in Tables 8
to 11, their respective relationships are presented in Figures 18 to 21 and their
implementation (code) can be found in CodeSnippets 5 to 8. Supplementary
Multimodal Interaction Models can be found in the Appendix Section, paragraphs A.1 to
A.31.
4.4.1 Grasp: Door handle

/ assistive
device
Grasp Motor Door Upper Speak Voice Voice
handle limb control activated
impaired door
Push (hand, Motor Button that
elbow) opens
doors
Table 8: Grasp: Door handle Multimodal Interaction Model definition.

Figure 18: Grasp: Door handle Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Grasp_door_handle" type="abstraction">
<task id="st0task1" name="Grasp(modality:motor)(object:door_handle)"
<task id="st0task2" name="Speak(modality:voice_control)
(object:voice_activated_door)" type="interaction"/>
(object:button_that_opens_door)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 5: Grasp: Door handle Multimodal Interaction Model (UsiXML source
code).
4.4.2 Pull (hand): Door handle

Task Modality Task Disability Alternative Alternativ Alternative
object task(s) e modality task object /
assistive
device
Pull Motor Door Upper Push (foot) Motor Door
(hand) handle limb
impaired
Table 9: Pull (hand): Door handle Multimodal Interaction Model definition.

Figure 19: Pull (hand): Door handle Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Pull_door_handle" type="abstraction">
<task id="st0task1" name="Pull(modality:motor)(means:hand)
<task id="st0task2" name="Push(modality:motor)(means:foot)
(object:door)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 6: Pull (hand): Door handle Multimodal Interaction Model (UsiXML
source code).
4.4.3 Walk: To car seat

Task Modalit Task Disability Alternativ Alternative Alternative
y object e task(s) modality task object /
assistive
device
Walk Motor Car Wheelchair Roll Motor Car seat
seat users (hands)
Lower limb Shuffle Motor Car seat
impaired (feet)
Table 10: Walk: To car seat Multimodal Interaction Model definition.

Figure 20: Walk: To car seat Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Walk_to_car_seat" type="abstraction">
<task id="st0task1" name="Walk(modality:motor)(object:car_seat)"
<task id="st0task2" name="Roll(modality:motor)(means:hands)
(object:car_seat)" type="interaction"/>
<task id="st0task3" name="Shuffle(modality:motor)(means:feet)
(object:car_seat)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 7: Walk: To car seat Multimodal Interaction Model (UsiXML source
code).

4.4.4 Grasp (right hand): Steering wheel

/ assistive
device
Grasp Motor Steer Wheelchair Pull (hand) Motor Support bar
(right ing user / Portable
hand) wheel handle
Swing Motor Car seat /
(torso) Swivel seat
Lower limb Bend (back, Motor Car seat /

impaired legs) Swivel seat
Table 11: Grasp (right hand): Steering wheel Multimodal Interaction Model definition.
Figure 21: Grasp (right hand): Steering wheel Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Grasp_right_hand_steering_wheel"
type="abstraction">
<task id="st0task1" name="Grasp(modality:motor)(means:right_hand)
(object:steering_wheel)" type="interaction"/>
<task id="st0task2" name="Pull_Swing" type="abstraction">
(object:support_bar,portable_handle)" type="interaction"/>
<task id="st0task4" name="Swing(modality:motor)(means:torso)
(object:car_seat,swivel_seat)" type="interaction"/>
</task>
<task id="st0task5" name="Bend(modality:motor)(means:back,legs)
(object:car_seat_swivel_seat)" type="interaction"/>
</task>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 8: Grasp (right hand): Steering wheel Multimodal Interaction Model
(UsiXML source code).
4.5 Smart living spaces Multimodal Interaction Models

In this subsection, four indicative cases of Multimodal Interaction Models which
concern the smart living spaces domain will be presented. Their definitions are included
in Tables 12 to 14, their respective relationships are presented in Figures 22 to 24 and
their implementation (code) can be found in CodeSnippets 9 to 11. Supplementary
Multimodal Interaction Models for Smart living spaces may be found in the Appendix
Section, paragraphs B.1 to B.7.

4.5.1 Grasp (hand): Door handle

object task(s) modality task object /
assistive
device
Grasp Motor Door Upper Push Motor Door
(hand) handle limb (upper
impaired body)
Wheel Speak Voice Voice
chair user control controlled
door
Table 12: Grasp (hand): Door handle Multimodal Interaction Model definition.
Figure 22: Grasp (hand): Door handle Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Grasp_hand_door_handle"
type="abstraction">
<task id="st0task2" name="Push(modality:motor)(means:upper_body)
(object:door)" type="interaction"/>
(object:voice_controlled_door)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 9: Grasp (hand): Door handle Multimodal Interaction Model (UsiXML
source code).

4.5.2 Pull (hand): Window bar

Task Modality Task Disability Alter Alternative Alternative
object native modality task object /
task(s) assistive
device
Pull Motor Window Upper Push Motor Switch that
(hand) bar limb (upper opens window
impaired body)
Wheel Speak Voice control Voice
chair user controlled
window
Table 13: Pull (hand): Window bar Multimodal Interaction Model definition.
Figure 23: Pull (hand): Window bar Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Pull_hand_window_bar" type="abstraction">
(object:window_bar)" type="interaction"/>
<task id="st0task2" name="Push(modality:motor)(means:upper_body)
(object:switch_that_opens_window)" type="interaction"/>
(object:voice_controlled_window)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 10: Pull (hand): Window bar Multimodal Interaction Model (UsiXML
source code).

4.5.3 Sit: Bed

assistive
device
Sit Motor Bed Wheel Grasp Motor Handrail
chair user (hand)
Push Motor Wheelchair
(hand)
Sit Motor Bed
Table 14: Sit: Bed Multimodal Interaction Model definition.
Figure 24: Sit: Bed Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Sit_bed" type="abstraction">
<task id="st0task1" name="Sit(modality:motor)(object:bed)"
<task id="st0task2" name="Use_handrail" type="abstraction">
(object:handrail)" type="interaction"/>
<task id="st0task4" name="Push(modality:motor)(means:hand)
(object:wheelchair)" type="interaction"/>
<task id="st0task5" name="Sit(modality:motor)(object:bed)"
</task>
</task>
<enabling>
</enabling>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 11: Sit: Bed Multimodal Interaction Model (UsiXML source code).
4.6 Workplace Office Multimodal Interaction Models

In this subsection, two Multimodal Interaction Models which concern the
workplace/office scenarios will be presented. Their definitions are included in Tables
15, 16, their respective relationships are presented in Figures 25, 26 and their
implementation (code) can be found in CodeSnippets 12, 13.
It must be noted that the cases that will be presented, concern the usage of the Toilet,
and may rise questions about how much indicative are for the workplace application.
However, almost all input from the respective Multidimensional Task Analysis work
(activity A1.7.1) uses the same modalities for the impaired users for the rest office
actions. The only exception regards the vision impaired users, where the object-
locating primitive task, See, is replaced by the Touch modality, which is in fact the
generic case presented in 4.3.2. Two more models are presented in the Appendix
Section, in paragraphs C.1 & C.2.

4.6.1 Pull (hand): Toilet flush valve

assistive
device
Pull Motor Toilet Upper Push (foot) Motor Toilet foot
(hand) flush limb flush valve
valve impaired
Table 15: Pull (hand): Toilet flush valve Multimodal Interaction Model definition.
Figure 25: Pull (hand): Toilet flush valve Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Pull_hand_toilet_flush_valve"
type="abstraction">
(object:toilet_flush_valve)" type="interaction"/>
(object:toilet_foot_flush_valve)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 12: Pull (hand): Toilet flush valve Multimodal Interaction Model (UsiXML
source code).

4.6.2 Stand up (knee, back): Toilet

object task(s) modality task
object /
assistive
device
Stand Motor Toilet Wheel Reach Motor Side/rear
up chair user (hands) support bar
(knee, Grasp Motor Side/rear
back) (hands) support bar
Pull Motor Side/rear
(hands) support bar
Lift (hands) Motor Over
wheelchair
Lower Motor On
(hands) wheelchair
Table 16: Stand up (knee, back): Toilet Multimodal Interaction Model definition.
Figure 26: Stand up (knee, back): Toilet Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Stand_up_toilet" type="abstraction">
<task id="st0task1" name="Stand_up(modality:motor)
(means:knee,back)(object:toilet)" type="interaction"/>
<task id="st0task2" name="Stand_up_using_support_bar"
type="abstraction">
<task id="st0task3" name="Reach(modality:motor)(means:hands)
(object:side_rear_support_bar)" type="interaction"/>
<task id="st0task4" name="Grasp(modality:motor)(means:hands)
<task id="st0task5" name="Pull(modality:motor)(means:hands)
<task id="st0task6" name="Lift(modality:motor)(means:hands)
(object:over_wheelchair)" type="interaction"/>
<task id="st0task7" name="Lower(modality:motor)(means:hands)
(object:on_wheelchair)" type="interaction"/>
</task>
</task>
<enabling>
</enabling>
<enabling>
</enabling>
<enabling>
</enabling>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 13: Stand up (knee, back): Toilet Multimodal Interaction Model (UsiXML
source code).
4.7 Infotainment Multimodal Interaction Models

In this subsection, indicative Multimodal Interaction Models which concern the
infotainment application domain will be presented. Their definitions are included in
Tables 17 to 18, their respective relationships are presented in Figures 27 to 28 and
their implementation (code) can be found in CodeSnippets 14 to 15.

4.7.1 Grasp (hand): Mouse

Task Modal Task Disability Alternative Alternative Alternative
ity object task(s) modality task object /
assistive
device
Grasp Motor Mouse Upper limb Grasp Motor Joystick
(hand) impaired (hand)
Visually Grasp Motor Haptic device
impaired (hand)
Table 17: Grasp (hand): Mouse Multimodal Interaction Model definition.
Figure 27: Grasp (hand): Mouse Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Grasp_hand_mouse" type="abstraction">
(object:mouse)" type="interaction"/>
(object:joystick)" type="interaction"/>
(object:haptic_device)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 14: Grasp (hand): Mouse Multimodal Interaction Model (UsiXML source
code).

4.7.2 See: Computer screen

assistive
device
See Vision Compu Visually Touch Motor Braille
ter impaired (hand) display
screen Hear Audition Text-to-
speech
software
Table 18: See: Computer screen Multimodal Interaction Model definition.
Figure 28: See: Computer screen Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="See_computer_screen" type="abstraction">
<task id="st0task1" name="See(modality:vision)
(object:computer_screen)" type="interaction"/>
<task id="st0task2" name="Touch(modality:motor)(means:hand)
(object:braille_display)" type="interaction"/>
<task id="st0task3" name="Hear(modality:audition)
(object:text_to_speech_software)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 15: See: Computer screen Multimodal Interaction Model (UsiXML
source code).

4.8 Personal healthcare Multimodal Interaction Models

In this subsection, the Multimodal Interaction Models which concern the personal
healthcare domain will be presented. Their definitions are included in Tables 19 to 21,
their respective relationships are presented in Figures 29 to 31 and their
implementation (code) can be found in CodeSnippets 16 to 18.
4.8.1 Push (hand): Medical device button

assistive
device
Push Motor Medi Upper Move Motor Medical
(hand) cal limb (hand) device lever
device impaired, control
button cognitive Press Motor Medical
impaired, (hand) device touch
elderly, control
visually Turn Motor Medical
impaired. (hand) device blade
knob
Table 19: Push (hand): Medical device button Multimodal Interaction Model definition.
Figure 29: Push (hand): Medical device button Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Push_hand_medical_device_button"
type="abstraction">
(object:medical_device_button)" type="interaction"/>
<task id="st0task2" name="Move(modality:motor)(means:hand)
(object:medical_device_lever_control)" type="interaction"/>
<task id="st0task3" name="Press(modality:motor)(means:hand)
(object:medical_device_touch_control)" type="interaction"/>
<task id="st0task4" name="Turn(modality:motor)(means:hand)
(object:medical_device_blade_knob)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 16: Push (hand): Medical device button Multimodal Interaction Model
4.8.2 Read: Message on touch screen

Task Modality Task Dis Alternative Alternative Alternative
object ability task(s) modality task object /
assistive
device
Read Vision Message Low Hear Audition Message
on touch vision from
screen loudspeaker
Table 20: Read: Message on touch screen Multimodal Interaction Model definition.

Figure 30: Read: Message on touch screen Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Read_message_on_touch_screen"
type="abstraction">
<task id="st0task1" name="Read(modality:vision)(means:eyes)
(object:message_on_touch_screen)" type="interaction"/>
<task id="st0task2" name="Hear(modality:audition)(means:ears)
(object:message_from_loudspeaker)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 17: Read: Message on touch screen Multimodal Interaction Model
4.8.3 Press: OK button on the touch screen

Task Mod Task Disability Alternative Alternative Alternative
ality object task(s) modality task object /
assistive
device
Press Motor OK Upper Press (style Motor OK button on
(finger) button limb pen) touch screen
on impaired
touch
screen
Table 21: Press: OK button on the touch screen Multimodal Interaction Model
definition.

Figure 31: Press: OK button on the touch screen Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Press_OK_on_touch_screen"
type="abstraction">
<task id="st0task1" name="Press(modality:motor)(means:finger)
(object:ok_button_on_touch_screen)" type="interaction"/>
<task id="st0task2" name="Press(modality:motor)(means:style_pen)
(object:ok_button_on_touch_screen)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 18: Press: OK button on the touch screen Multimodal Interaction Model
4.9 Conclusions
The current section provided the specifications of the multimodal interaction models,
which include combinations of interfacing modalities most suited for the target user
groups and connect the Virtual User Models, developed in SP1, to the Task Models and
the virtual prototype to be tested.
The multimodal interaction models describe the alternative ways of a primitive tasks
execution with respect to the different target user groups, the replacement modalities
and the usage of assistive devices for each application sector (automotive, smart living
spaces, office workplace, infotainment, personal healthcare).
5 Multimodal Interfaces Manager

In this section, the Multimodal Interfaces Manager and its components will be analysed.
The Multimodal Interfaces Manager has a two-fold purpose:
first, to manage the various the multimodal interaction models and provide
better alternatives (modalities), to the user (virtual or real-immersed), and
second, to support various assistive devices and tools that will increase the

quality of the user interaction and make the product more accessible, taking into
account the virtual user's disabilities.
5.1 Architecture
The architecture of the Multimodal Interaces Manager is presented in Figure 32. As it
depicted the Multimodal Interfaces Manager is consisted of two main components:
the Modality Compensation and Replacement Module, which is responsible
for managing the multimodal interaction models and providing alternative tasks,
modalities and assistive devices.
the Multimodal Toolset Manager, which is the provider of various input/output
tools to the user in order to enhance his/her interaction based on his/her
capabilities.
Multimodal Interfaces
Manager
Interaction Modality Compensation and

Adaptor Replacement Module
Core Simulation
Platform
Multimodal Toolset Manager
Immersive
Simulation
Runtime Engine
Figure 32: Multimodal Interfaces Manager architecture.
As it is shown in Figure 32, the Multimodal Interfaces Manager exchanges information

with the interaction adaptor and the core simulation platform (and hence with the
immersive simulation runtime engine). The interaction adaptor provides to Multimodal
Interfaces Manager the needed for simulation files (task & simulation models and
virtual user models) and via the Modality Compensation and Replacement Module, a
selection of alternative tasks is generated and can be passed to the end-user via the
Core Simulation Platform. Additionally, the Multimodal Toolset Manager by cooperating
with the Modality Compensation and Replacement Module provides to the Core
Simulation Platform alternative tools that can replace the scenario task's modalities
with other more appropriate to the user's capabilities.

5.2 Modality Compensation and Replacement Module

The Modality Compensation and Replacement Module is responsible for providing
better alternatives to the scenario task actions, by taking into consideration the reduced
capability or elderly users.
The various Multimodal Interaction Models that were presented in Section 4 are
organized by the Modality Compensation and Replacement Module. The Modality
Compensation and Replacement Module chooses the right and more appropriate
Multimodal Interaction Model based on three factors:
a) the task description,
b) the application area, and
c) the specification of the virtual user capabilities.
Virtual User Model Alternative tasks
Modality Compensation and

Task+Simulation Model Alternative modalities
Replacement Module
Alternative task objects

Multimodal Interaction Models
and assistive devices
Figure 33: Data input and output in Modality Compensation and Replacement Module.
The first two factors can be found in the Task Model and Simulation Model files. These
two files can be provided to the Modality Compensation and Replacement Module by
the Interaction Adaptor. The third factor is specified in the loaded Virtual User Model
(again provided by the Interaction Adaptor). Having knowledge of the task, application
area and virtual user deficiencies, the compensation module selects the appropriate
Multimodal Interaction Model from its pool.
Then, it analyses the various primitive tasks and provides alternative ones, that are
based on a different, more appropriate to the virtual user. The new task definition may
based on a different modality, or in the same modality but include the usage of an
assistive device, or a combination of these two. The whole input/output data is shown
in Figure 33.
5.3 Multimodal Toolset Manager

The Multimodal Toolset Manager is responsible for providing the user (virtual or
immersed) the needed tools that will increase the quality of the interaction with the
simulated product. There have been implemented a total of five tool modules. Each tool
uses different modalities to interact with the user and can be used to either provide
feedback to the user or take input from him/her.
As it is shown in Figure 34, the five modules that are managed by the Multimodal
Toolset Manager are:
the Speech Recognition Module: converts user speech to text,

the Speech Synthesis Module: generates speech from text,

the Haptics Module: supports various haptic devices and can be used to
provided to the user tactile feedback,
the Sign Language Synthesis Module: generates sequences of sign language
gestures,
the Symbolic Module: can be used to provide to the users audio-visual
feedback (text, symbols, sound alerts).
Speech Recognition
Module
ch
pee
s
Speech Synthesis
User Input
m Module
ot
or
g
Haptics Multimodal Toolset

in
ar
Module Manager
he
or
ot
m
vision Sign Language

Output
Synthesis Module
au
d
co io-v
gn isu
itiv al
e
Symbolic
Module
Figure 34: The Multimodal Toolset Manager and its modules. Some modules are used
for getting input from users, while others for passing information to them. The modality
type of each tool is also indicated.
The implementation, the modalities and the characteristics of these tools are discussed
in the paragraphs 5.3.2 to 5.3.5. The modality requirements of the tools are presented
in Table 22. It must be noted that the Multimodal Toolset Manager is implemented in
such way to support extra modules, thus increasing its capabilities.

Tool User Modality user input output /

Requirements feedback
Speech Recognition speech
Speech Synthesis hearing
Haptics motor, touch
Sign Language Synthesis vision, cognitive
Symbolic hearing and/or vision
Table 22: Multimodal Toolset Manager's modules and their modality requirements from
the target users.
5.3.1 Speech Recognition Module

The Speech Recognition Module has been implemented to receive speech and convert
it to text. As shown in Figure 35, the speech recognition module supports two input
methods:
an online method, which can be used in an immersive environment, and the
user is a real person. In this method, the speech-audio is captured in real time
by a microphone. The module is able to convert the spoken phrases to text in
real time and provide the output text to the simulation engine.
an offline method, in which a pre-recorder speech-audio file can be used. This
method is suitable for simulating virtual users, i.e. avatars.
Real time Immersive

Immersed
speech Simulation
User
Platform
Speech Recognition Text

Module
Prerecorded Core
Avatar speech file Simulation
Platform
Figure 35: Data flow concerning the Speech Recognition Module. Both real-time and
pre-recorded speech-audio is supported and converted to text by the module.
The speech recognition implementation is based on the CMU Sphinx toolkit [187]. The
Sphinx architecture is based on two models: a) the language mode, which contains the
dictionary and grammar, b) the acoustic model, which contains the respective audio
information. In its current state, the Speech Recognition module supports English,
French and Russian. However, the Sphinx framework support training of new language
and acoustic models. Table 23 presents summarises the features of the CMU Sphinx
toolkit.

Speech Recognition via CMU Sphinx Description

Software used CMU Sphinx v0.7 (PocketSphinx)
Licence OpenSource, BSD
English, French, Russian (other
Supported languages languages can be supported via training
or plugins).
Real-time (via microphone) or by loading
Speech transcription
a pre-recorded audio file.
C++ library (Java language is also
Implementation
supported).
Supported operating systems Microsoft Windows, Linux, Unix
Extra features Supports speaker adaptation (via training)
iATROS [188]: not currently active, not a
pure speech recognition toolkit as it
supports handwritten text recognition.
Julius [189]: until recently it supported

only Japanese language models.
Comparison to other open source speech
recognition toolkits
Simon [190]: uses Julius and HTK [191]
libraries, currently is not as complete as
Sphinx.
RWTH ASR [192]: language and acoustic

models are not included in the distribution.
Table 23: Features of the chosen speech recognition software, CMU Sphinx.
The output of the Speech Recognition Module is a text string and is sent to the
Immersive or Core Simulation Platform for further processing.
5.3.2 Speech Synthesis Module

The Speech Synthesis Module receives a text string and converts it into speech audio.
The speech audio is then passed via headphones or speakers to the user. The data
flow is presented in Figure 36. The Speech Synthesis Module can be used for vision
impaired users. It can also be used in cases where the user must not be distracted
from his current task, i.e. in parallel tasking.

Immersive
Simulation
Platform
Speech Synthesis Speech Immersed
Text
Module audio User
Core
Simulation
Platform
Figure 36: Data flow regarding the Speech Synthesis Module.
The Speech Synthesis Module uses two speech synthesis engines: eSpeak [193] and
Festival [194]. Generally, in terms of processing speed, the eSpeak provides faster the
output with low CPU usage. The eSpeak speech is clear, and can be used at high
speeds, but is not as natural or smooth as larger synthesizers which are based on
human speech recordings. This is the reason why another speech synthesizer, the
Festival Speech Synthesis System, is included to the Speech Synthesis Module.
Festival is a general multilingual speech synthesis system originally developed at the
University of Edinburgh. Substantial contributions have also been provided by Carnegie
Mellon University and other sites. It is distributed under a free software license similar
to the BSD License. It offers a full text to speech system with various APIs, as well as
an environment for development and research of speech synthesis techniques. It is
written in C++ with a Scheme-like command interpreter for general customization and
extension. Festival is designed to support multiple languages, and comes with support
for English (British and American pronunciation), Welsh, and Spanish. Voice packages
exist for several other languages, such as Castilian Spanish, Czech, Finnish, Hindi,
Italian, Marathi, Polish, Russian and Telugu.
A feature comparison between the two speech synthesizers is presented in Table 24.

Feature eSpeak Festival

License GNU GPLv3+. Free software license similar to
the MIT License.
Version 1.45.04 2.1
Implementation Shared C++ library. Static C++ library.
Supported O/S Linux, Windows, Mac OS X, Cross-platform.
RISC OS, FreeBSD,
Windows Mobile.
CPU usage low high
Natural speech no yes
output
Data size Few MB (~2.5MB) Normally, large: around 1GB of
data is needed for natural
language synthesis.
However, much smaller amount
of data is needed for the excellent
HTS voices [195], which produce
almost natural speech-audio.
Speech speed supported supported
manipulation
Gender Male/Female voices. Male/Female voices.
Language support Afrikaans, Albanian, English (British and American
Armenian, Cantonese, pronunciation), Welsh, Spanish,
Catalan, Croatian, Czech, Castilian Spanish, Czech,
Danish, Dutch, English, Finnish, Hindi, Italian, Marathi,
Esperanto, Estonian, Polish, Russian and Telugu.
Finnish, French, Georgian, Other languages can be easily
German, Greek, Hindi, added via the Festival tools.
Hungarian, Icelandic,
Indonesian, Italian,
Kannada, Kurdish, Latvian,
Lojban, Macedonian,
Malayalam, Mandarin,
Norwegian, Polish,
Portuguese, Romanian,
Russian, Serbian, Slovak,
Spanish, Swahili, Swedish,
Tamil, Turkish, Vietnamese,
Welsh
Table 24: Feature of the speech synthesizers used by the Speech Synthesis Module.

5.3.3 Haptics Module

The Haptics Module has been implemented in order to allow the Multimodal Interfaces
Manager to support various haptic devices. The user interacts with the haptic device,
and information, such as the position and orientation of the device stylus, is passed
through the Haptics Module to the simulated environment. After the information is
processed the simulation platform can send various force feedback to the user's hand.
The whole process is depicted in Figure 37.
position,
orientation
Immersed Haptics Simulation
User Module Platform
force
feedback
Haptic device
Figure 37: User and Simulation platform exchange information via the haptic device
and the Haptics Module.
The Haptics Module makes use of the CHAI3D API [196] in order to manipulate various
haptic devices. CHAI 3D is an open source set of C++ libraries for computer haptics,
visualization and interactive real-time simulation. CHAI 3D supports several
commercially-available three-, six- and seven-degree-of-freedom haptic devices, and
makes it simple to support new custom force feedback devices. Multiple haptic devices
can be connected to the same computer in order to support both hands interactions.
The Haptics Module has been tested and fully supports the: Sensable's PHANTOM
Desktop [197] and PHANTOM Omni [198] Haptic Devices, and the Novint Falcon
device [199].
5.3.4 Sign Language Synthesis Module

The Sign Language Synthesis Module is used to convert text into sign language
gestures and is targeted to usage by deaf users (Figure 38).
Immersive
Simulation
Platform
Text Sign Language Immersed
Synthesis Module User
Core
Simulation
Platform
Figure 38: Data flow of the Sign Language Module.
The processing is based on the ATLAS [200] (Automatic TransLAtion into Sign
language) tool and is currently under a demo (testing only usage). The text must have
special annotation (xml language is used). The output is a 3D rendered virtual actor
who performs the sign language gestures. In the current phase, only the Italian
language is supported.

5.3.5 Symbolic Module

The symbolic Module has been constructed in order to provide to the developer various
alerting tools. The Symbolic Module, tries to solve computer-to-human communication
issues that appear on cognitive impaired users. Text, audio, graphics (basic shapes,
lines, boxes, circles and ellipses) alerts can been combined in order to output
multimodal information to the immersed user (Figure 39). Basic time manipulation and
window management are also features of the Symbolic Module.
Text alert
Immersive
Simulation
Platform
Symbolic Graphics alert Immersed
Module User
Core
Simulation
Platform Sound alert
Figure 39: Data flow of the Symbolic Module.
Future Work
Despite its unique features, The Multimodal Interface Tool-set in its current state cannot
be characterised as a complete product.
The most important missing element is the integration with the Veritas simulation
framework. Thus, the future work will be concentrated to this field. The automatic
selection of different modalities, task tools or assistive devices will offer to the virtual or
immersed users a more accessible environment to test specific-for-them product
designs. The selection of the most appropriate Multimodal Interaction Model will be
based on prioritization rules and this automatic task translator tool will be part of the
Modality Compensation and Replacement Module.
In the future, the Core Simulation Platfrom (A2.1.1) will support simulation cascade
mode, i.e. sequential testing of various virtual user models around different product
designs. A similar approach is needed for the Multimodal Interface Toolset, where the
cascades will support different modalities and different assistive devices will be tested
automatically.
Additionally, future plans include the implementation of a Multimodal Interfaces Models
parser for the Modality Compensation and Replacement Module, which will parse the
various usixml files and convert them into machine-friendly code.
Also, an investigation will be made concerning the support of mobile phone touch-
screens and if necessary a corresponding input module for the Multimodal Toolset
Manager will be implemented.

References
[1] Charwat, H. J. (1992) Lexicon der Mensch-Maschine-Kommunikation.
Oldenbourg.
[2] MacDonald, J. and McGurk, H. (1978) Visual influences on speech perception
process. Perception and Psychophysics, 24 (3), 253-257.
[3] McGurk, H. and MacDonald, J. (1976). Hearing lips and seeing voices. Nature,
264, 746-748.
[4] Parke, F.I. and Waters, K. (1996). Computer Facial Animation. A K Peters.
[5] Silbernagel, D. (1979) Tatchenatlas der Physiologie. Thieme.
[6] Kandel, E. R. and Schwartz, J. R. (1981) Principles of Neural Sciences. Elsevier
Science Publishers (North Holland).
[7] Shepherd, G. M. (1988) Neurobiology, 2nd edition. Oxford University Press.
[8] Chatty, S. (1994), Extending a graphical toolkit for two-handed interaction. ACM
UIST 94 Symposium on User Interface Software and Technology, ACM Press,
195-204.
[9] Nigay, L. and Coutaz, J. (1993) A design space for multimodal systems:
concurrent processing and data fusion. Human Factors in Computing Systems,
INTERCHI 93 Conference Proceedings, ACM Press, 172-178.
[10] Coutaz, J. (1987) PAC: An object-oriented model for dialog design. Proceedings
of INTERACT 87: The IFIP Conference on Human Computer Interaction, 431-
436.
[11] Huang, X. and Oviatt, S. (2006) Toward Adaptive Information Fusion in Multimodal
Systems, in S. Renals and S. Bengio (Eds): MLMI 2005, LNCS 3869, 15-27.
[12] Foley, J.D., van Dam, A., Feiner, S.K. and Hughes, J.F. (1990) Computer
Graphics: principles and practice, 2nd edition. Addision-Wesley.
[13] Shneiderman, B. (1982) The future of interactive systems and the emergence of
direct manipulation. Behaviour and Information Technology 1 (3), 237-256.
[14] Shneiderman, B. (1983) Direct manipulation: a step beyond programming
languages. IEEE Computer, 16 (8), 57-69.
[15] Bernsen, N.O. The Structure of the Design Space, in Computers, Communication
and Usability: Design Issues, Research, and Methods for lntegrated Services, P.F.
Byerley, P.]. Barnard, and J. May, eds., North Holland, Amsterdam, 1993, pp. 221
-244.
[16] Maybury, M. T. and Wahlster, W. (1998) Readings in Intelligent User Interfaces.
Morgan Kaufmann Publishers.
[17] Nigay, L. and Coutaz, J. (1995) A generic platform for addressing the multimodal
challenge. Human Factors in Computing Systems, CHI 95 Conference
Proceedings, ACM Press, 98-105.
[18] Moran, D.B., Cheyer, A.J., Julia, L.E., Martin, D.L., and Park, S. (1997) Multimodal
user interfaces in the Open Agent Architecture. Proceedings of the 1997

International Conference on Intelligent User Interfaces (IUI 97), ACM Press, 61-
68.
[19] Wahlster, W. (1991) User and discourse models for multimodal communication.
In: Intelligent User Interfaces, J. W. Sullivan and S. W. Tyler (Eds.), ACM Press,
45-67.
[20] Bolt, R. (1980) Put-That-There: voice and gesture at the graphics interface,
Computer Graphics, 14, 3, 262270.
[21] Obrenovic, Z., Abascal, J. and Starcevic, D. (2007) Universal accessibility as a
multimodal design issue. Commun. ACM 50, 5, 83-88.
[22] Burzagli, L., Emiliani, P.L., and Gabbanini, F. (2009). Design for All in action: An
example of analysis and implementation, Expert Systems with Applications, 36,
985-994.
[23] Doyle, J., Bertolotto, M., and Wilson, D. (2008) Multimodal Interaction
Improving Usability and Efficiency in a Mobile GIS Context, 1st Int. Conf. on
Advances in Computer-Human Interaction, 63-68.
[24] Oviatt, S.L. (1999) Mutual Disambiguation of recognition errors in a multimodal
architecture. Proceedings of the SIGCHI conference on Human factors in
computing systems: the CHI is the limit (CHI '99). ACM, New York, NY, USA, 576-
583.
[25] Oviatt, S.L. (1999) Ten myths of multimodal interaction, Communications of the
ACM 42, 11, 7481.
[26] Cobb, S.V.G and Sharkey, P.M. (2007) A Decade of Research and Development
in Disability, Virtual Reality and Associated Technologies: Review of ICDVRAT
1996-2006, The International Journal of Virtual Reality, 6(2): 51-68.
[27] Edwards, A. D. N. (2002). Multimodal interaction and people with disabilities. (in)
Multimodality in Language and Speech Systems. B. Granstrm, D. House and I.
Karlsson, (Eds.). Dordrecht, Kluwer, pp. 73-92.
[28] Emiliani, P. L. and Stephanidis C. (2005) Universal access to ambient intelligence
environments: Opportunities and challenges for people with disabilities, IBM
Systems Journal, 44, 3, 605-619.
[29] B. Kisacanin, V. Pavlovic and T.S. Huang, Editors, Real-Time Vision for Human
Computer Interaction , Springer-Verlag (2005).
[30] Jaimes, A. and Sebe, N. (2007) Multimodal human-computer interaction: A survey,
Computer Vision and Image Understanding, 108, 116-134.
[31] Oviatt, S.L. (2001). Designing robust multimodal systems for universal access. In
Proceedings of the 2001 EC/NSF workshop on Universal accessibility of
ubiquitous computing: providing for the elderly (WUAUC'01). ACM, New York, NY,
USA, 71-74.
[32] Sharma, R., Pavlovic, V., and Huang T.S. (1998) Toward Multimodal Human
Computer Interface, Proceedings of the IEEE, 86, 5, 853-869.
[33] Argyropoulos, S., Moustakas, K., Karpov, A.A., Aran, O., Tzovaras, D., Tsakiris, T.,
Varni, G., and Kwon, B. (2008). Multimodal user interface for the communication
of the disabled, Journal on Multimodal User Interfaces, 2, 105116.

[34] Intille, S., Larson, K. , Beaudin, J., Nawyn, J. , Tapia, E., and Kaushik, P. (2004) A
living laboratory for the design and evaluation of ubiquitous computing
technologies, ACM Conference on Human Factors in Computing Systems (CHI),
1941-1944.
[35] McCowan, L., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M. and Zhang,
D. (2005) Automatic analysis of multimodal group actions in meetings, IEEE
Transactions on PAMI 27, 3, 305317.
[36] Gatica-Perez, D. (2006) Analyzing group interactions in conversations: a survey,
IEEE International Conference on Multisensor Fusion and Integration for
Intelligent Systems, 4146.
[37] Pentland, A. (2005) Socially aware computation and communication, IEEE
Computer 38, 3, 33-40.
[38] Meyer S. and Rakotonirainy, A. (2003) A Survey of research on context-aware
homes, Australasian Information Security Workshop Conference on ACSW
Frontiers.
[39] Cheyer, A. and Julia, L. (1998) MVIEWS: multimodal tools for the video analyst,
Conference on Intelligent User Interfaces (IUI), ACM, New York, NY, USA, 55-62.
[40] Bradbury, J.S., Shell, J.S. and Knowles, C.B. (2003) Hands on cooking: towards
an attentive kitchen, ACM Conference Human Factors in Computing Systems
(CHI), 996-997.
[41] Chen, D., Malkin, R., and Yang, J. (2004). Multimodal detection of human
interaction events in a nursing home environment. In Proceedings of the 6th
international conference on Multimodal interfaces (ICMI '04). ACM, New York, NY,
USA, 82-89.
[42] Lauruska V. and Serafinavicius, P. (2003) Smart home system for physically
disabled persons with verbal communication difficulties, Assistive Technology
Research Series (AAATE), 579583.
[43] Adler, A., Eisenstein, J., Oltmans, M., Guttentag, L., and Davis, R. (2004) Building
the design studio of the future, AAAI Fall Symposium on Making Pen-Based
Interaction Intelligent and Natural (2004).
[44] Ben-Arie, J., Wang, Z., Pandit, P. and Rajaram, S. (2002) Human activity
recognition using multidimensional indexing, IEEE Transactions on PAMI 24, 8,
2002, 10911104.
[45] Bobick A.F. and Davis, J. (2001) The recognition of human movement using
temporal templates, IEEE Transaction on PAMI 23, 3, 257267.
[46] Pentland, A. (2000), Looking at people, Communications of the ACM 43, 3, 35
44.
[47] Chen, L.S., Travis Rose, R., Parrill, F., Han, X., Tu, J., Huang, Z., Harper, M.,
Quek, F., McNeill, D., Tuttle, R., and Huang, T.S. (2005) VACE multimodal
meeting corpus, MLMI.
[48] Garg, A., Naphade, M. and Huang, T.S. (2003) Modeling video using input/output
Markov models with application to multi-modal event detection, Handbook of
Video Databases: Design and Applications.

[49] Hu, W., Tan, T., Wang, L. and Maybank, S. (2004) A survey on visual
surveillance of object motion and behaviors, IEEE Transactions on Systems, Man,
and Cybernetics 34, 3, 334 352.
[50] Dusan, S., Gadbois, G.J. and Flanagan, J. (2003) Multimodal interaction on
PDAs integrating speech and pen inputs, Eurospeech.
[51] Fritz, G., Seifert, C., Luley, P., Paletta L., and Almer, A. (2004) Mobile vision for
ambient learning in urban environments, International Conference on Mobile
Learning (MLEARN).
[52] Brewster, S., Lumsden, J., Bell, M., Hall, M. and Tasker, S. (2003). Multimodal
'eyes-free' interaction techniques for wearable devices. In Proceedings of the
SIGCHI conference on Human factors in computing systems (CHI '03). ACM, New
York, NY, USA, 473-480.
[53] Kono, Y., Kawamura, T., Ueoka, T., Murata S., and Kidode, M. (2004) Real world
objects as media for augmenting human memory, Workshop on Multi-User and
Ubiquitous User Interfaces(MU3I), 3742.
[54] Yu, C. and Ballard, D.H. (2004) A multimodal learning interface for grounding
spoken language in sensorimotor experience, ACM Transactions on Applied
Perception, 1, 1, 57-80.
[55] Pelz, J.B. (2004) Portable eye-tracking in natural behavior, Journal of Vision 4,11.
[56] Dickie, C., Vertegaal, R., Fono, D., Sohn, C., Chen, D., Cheng, D., Shell J.S., and
Aoudeh, O. Augmenting and sharing memory with eye, Blog in CARPE (2004).
[57] Nijholt, A. and Heylen, D. (2002) Multimodal communication in inhabited virtual
environments, International Journal of Speech Technology 5, 343354.
[58] Malkawi, A.M. and Srinivasan, R.S. (2004) Multimodal humancomputer
interaction for immersive visualization: integrating speech-gesture recognition
and augmented reality for indoor environments, International Association of
Science and Technology for Development Conference on Computer Graphics
and Imaging.
[59] Paggio, P. and Jongejan, B. (2005). Multimodal communication in the virtual farm
of the staging Project. In: O. Stock and M. Zancanaro, Editors, Multimodal
Intelligent Information Presentation, Kluwer Academic Publishers, Dordrecht , 27
46.
[60] Maynes-Aminzade, D., Pausch, R. and Seitz, S. (2002) Techniques for interactive
audience participation, ICMI, 15-20.
[61] Wassermann, K.C., Eng, K., Verschure, P.F.M.J. Manzolli, and J. (2003). Live
soundscape composition based on synthetic emotions, IEEE Multimedia
Magazine 10, 4, 82-90.
[62] Lyons, M.J., Haehnel, M., and Tetsutani, N. (2003) Designing, playing, and
performing, with a vision-based mouth Interface, Conference on New Interfaces
for Musical Expression, 116-121.
[63] Paradiso, J. and Sparacino, F. (1997) Optical tracking for music and dance
performance, Optical 3-D Measurement Techniques IV, A. Gruen, H. Kahmen,
eds., 1118.

[64] Sparacino, F. (2002) The museum wearable: real-time sensor-driven

understanding of visitors interests for personalized visually-augmented museum
experiences, Museums and the Web.
[65] Kuno, Y., Shimada, N., and Shirai, Y. (2003) Look where youre going: a robotic
wheelchair based on the integration of human and environmental observations,
IEEE Robotics and Automation 10, 1, 2634.
[66] Simpson, R., LoPresti, E., Hayashi, S., Nourbakhsh, I., and Miller, D. (2004) The
smart wheelchair component system, Journal of Rehabilitation Research and
Development, 41, 3B, 429-442.
[67] Duchowski, A.T. (2002) A breadth-first survey of eye tracking applications,
Behavior Research Methods, Instruments, and Computing 34, 4, 455470.
[68] Roth P. and Pun, T. (2003) Design and evaluation of a multimodal system for the
non-visual exploration of digital pictures, INTERACT.
[69] Grauman, K. Betke, M., Lombardi, J., Gips, J. and Bradski, G. (2003)
Communication via eye blinks and eyebrow raises: video-based human
computer interfaces, Universal Access in the Information Society, 2,4, 359373.
[70] Campbell, C.S. and Maglio, P.P. (2001) A robust algorithm for reading detection,
ACM Workshop on Perceptive User Interfaces (2001).
[71] Wu, Y., Hua, G., and Yu, T. (2003) Tracking articulated body by dynamic Markov
network, ICCV, 10941101.
[72] Trivedi, M.M., Cheng, S.Y., Childers, E.M.C. and Krotosky, S.J. (2004) Occupant
posture analysis with stereo and thermal infrared video: algorithms and
experimental evaluation, IEEE Transactions on Vehicular Technology. 53, 6,
16981712.
[73] Rosales, R. and Sclaroff, S. (2001) Learning body pose via specialized maps,
NIPS, 14 12631270.
[74] Ji, Q. and Yang, X. (2002) Real-time eye, gaze, and face pose tracking for
monitoring driver vigilance, Real-Time Imaging, 8, 357377.
[75] Smith, P., Shah M. and Lobo, N.d.V. (2003) Determining driver visual attention
with one camera, IEEE Transactions on Intelligent Transportation Systems, 4, 4,
205-218.
[76] Kern, D. and Schmidt, A., (2009) Design Space for Driver-based Automotive User
Interfaces Proceedings of the First International Conference on Automotive User
Interfaces and Interactive Vehicular Applications (AutomotiveUI 2009), 3-10,
Essen, Germany.
[77] Siewiorek, D., Smailagic, A., and Hornyak, M., (2002), Multimodal Contextual Car-
Driver Interface, pp.367, Fourth IEEE International Conference on Multimodal
Interfaces (ICMI'02).
[78] Eye Tracking Update (2010), http://eyetrackingupdate.com/2010/06/17/steering-
car-eye-tracking-future/, last accessed 23rd December 2010.
[79] Carbonell, N. (2006). Ambient multimodality: towards advancing computer
accessibility and assisted living, Univ. Access Inf. Soc., 5, 96-104.
[80] Richter. K. and Hellenschmidt, M. (2004) Interacting with the Ambience:

Multimodal Interaction and Ambient Intelligence, Position Paper to the W3C

Workshop on Multimodal Interaction, 19-20 July 2004.
[81] Perry M, Dowdall A, Lines L, Hone K. (2004) Multimodal and ubiquitous
computing systems: supporting independent-living older users, IEEE Trans Inf
Technol Biomed., 8, 3, 258-70.
[82] Jimenez-Mixco, V., de las Heras, R., Villalar, J.L., and Arredondo, M.T. (2009) A
New Approach for Accessible Interaction within Smart Homes through Virtual
Reality, in C. Stephanidis (ed.), Universal Access in HCI, Part II, HCI 2009, LNCS
5615, 75-81.
[83] Barbieri, T., Fraternali, P., Bianchi, A., and Tacchella, C. (2010). Autonomamente:
Using Goal Attainment Scales to Evaluate the Impact of a Multimodal Domotic
System to Support Autonomous Life of People with Cognitive Impairment, Proc.
ICCHP (1) 2010: 324-331.
[84] Feki, M.A., Renouard, S., Abdulrazak, B. Chollet, G., and Mokhtari, M., (2004).
Coupling Context Awareness and Multimodality in Smart Homes Concept, in K.
Miesenberger et al. (eds), ICCHP 2004, LNCS 3118, 906-913.
[85] Hakeem, A. and Shah, M. (2004) Ontology and taxonomy collaborated framework
for meeting classification. Proceedings of the 17th International Conference on
Pattern Recognition, 4, 219-222.
[86] De Felice, F., Renna, F., Attolico, G., Distante, A. (2007) "A haptic/acoustic
application to allow blind the access to spatial information," World Haptics
Conference, pp. 310-315, Second Joint EuroHaptics Conference and Symposium
on Haptic Interfaces for Virtual Environment and Teleoperator Systems (WHC'07),
2007.
[87] Barbieri, T., Bianchi, A., Sbatella, L., Carella, F. and Ferra, M. (2005). MultiAbile: a
multimodal learning environment for the inclusion of impaired e-Learners using
tactile feedbacks, voice, gesturing, and text simplification, Assistive Technology:
From Virtuality to Reality, 16, 1, 406-410.
[88] Barbieri, T., Bianchi, A. and Sbatella, L. (2004) Multimodal Communication for
Vision and Hearing Impairments. Conference and Workshop on Assistive
Technologies for Vision an Hearing Impairment (CVHI 2004) Granada, Spain.
[89] Ghinea, G. and Ademoye, O. (2009) Olfaction-Enhanced Multimedia: Bad for
Information Recall?, IEEE International Conference on Multimedia and Expo, New
York, USA, 970 973.
[90] Kouroupetroglou, G. and Tsonos, D. (2008) Multimodal Accessibility of
Documents, in S. Pinder (ed.), Advances in Human Computer Interaction, Shane
Pinder, InTech.
[91] Smith, A., Dunaway, J., Demasco, P., and Peischl, D. (1996) Multimodal Input for
Computer Access and Augmentative Communication, Proc. ASSETS 96, 80-85,
Vancouver, Canada.
[92] Kostopoulos, K., Moustakas, K., Tzovaras, D., Nikolakis, G., Thillou, C., and
Gosselin, B. (2007) Haptic Access to Conventional 2D Maps for the Visually
Impaired, Journal on Multimodal User Interfaces, 1, 2, 13-19.
[93] Prendinger, H., Eichner, T., Andre, E., and Ishizuka, M. (2007) Gaze-based

infotainment agents. In Proceedings of the international conference on Advances

in computer entertainment technology (ACE '07). ACM, New York, NY, USA, 87-
90.
[94] Lisetti, C., Nasoz, F., LeRouge, C., Ozyer, O., and Alvarez. K. 2003. Developing
multimodal intelligent affective interfaces for tele-home health care. Int. J. Hum.-
Comput. Stud. 59, 1-2, 245-255.
[95] Prendinger, H. and Ishizuka, M. (2004) What Affective Computing and Life-like
Character Technology Can Do for Tele-Home Health Care. Workshop on HCI and
Homecare: Connecting Families and Clinicans (Online Proceedings), in conj. with
CHI-04, Vienna, Austria
[96] Carmien, S., Dawe, M., Fishcer, G., Gorman, A., Kintsch, A. and Sullivan, J.F. Jr.
(2005) Socio-Technical Environments Supporting People with Cognitive
Disabilities Using Public Transportation, ACM Transactions on Computer-Human
Interaction, 12, 2, 233262.
[97] Krapichler, C., Haubner, M., Losch, A., Schuhmann, D., Seemann, M.,
Englmeier, K-H. (1999). Physicians in virtual environments -- multimodal human-
computer interaction, Interacting with Computers, 11, 4, 427-452.
[98] Johnston, M. and Bangalore, S. (2002) Multimodal Applications from Mobile to
Kiosk, W3C Workshop on Multimodal Interaction.
[99] Rehg, J., Loughlin M. and Waters, K.(1997) Vision for a Smart Kiosk,
Conference on Computer Vision and Pattern Recognition (CVPR), 9096.
[100] Pattern Recognition Letters (2003), Special Issue on Multimodal Biometrics, 24,
13.
[101] Ross, A. and Jain, A.K. (2003). Information Fusion in Biometrics, Pattern
Recognition Letters, Special Issue on Multimodal Biometrics 24, 13, 21152125.
[102] Fussell, S., Setlock, L., Yang, J., Ou, J., Mauer, E., and Kramer, A. (2004)
Gestures over video streams to support remote collaboration on physical tasks,
HumanComputer Interaction 19, 3, 273309.
[103] Salen, K. and Zimmerman, E. (2003) Rules of Play: Game Design Fundamentals,
MIT Press, Cambridge, MA.
[104] Fong, T., Nourbakhsh, I., Dautenhahn, K. (2003) A survey of socially interactive
robots, Robotics and Autonomous Systems, 42, 3-4, 143166.
[105] Cohen, I., Sebe, N., Cozman, F., Cirelo, M. and Huang, T.S.(2004) Semi-
supervised learning of classifiers: theory, algorithms, and their applications to
humancomputer interaction, IEEE Transactions on PAMI 22, 12, 15531567.
[106] Cohen, P.R. and McGee, D.R. (2004) Tangible multimodal interfaces for safety-
critical applications, Communications of the ACM 47, 1, 4146.
[107] Sharma, R., Yeasin, M., Krahnstoever, N., Rauschert, I., Cai, G., Brewer, I.,
MacEachren, A. and Sengupta, K. (2003 Speechgesture driven multimodal
interfaces for crisis management, Proceedings of the IEEE, 91,9, 13271354.
[108] Kortum, P. (2008). HCI beyond the GUI: Design for haptic, speech, olfactory, and
other nontraditional interfaces. Burlington, MA: Morgan Kaufmann.
[109] Hayward, v., Astley, O., Cruz-Hernandez, M., Grant, D., & Robles-De-La-Torre, G.

(2004). Haptic interfaces and devices. Sensor Review, 24, 16-29.

[110] Immersion (2008). Fact sheet.
http://www.immersion.com/corporate/fact_sheet.php [On-line].
[111] Campbell, D. (2003). Hearing-aid principles and technology. In: Assistive
Technology for the Hearingimpaired, Deaf and Deafblind, Hersh, M. A., and
Johnson, M. A., eds. London: Springer, pp. 71116.
[112] e-Michigan Deaf and Hard of Hearing (n.d.). Hearing aids. Retrieved September
1, 2004, from http://www.michdhh.org/assistive_devices/hearing_aids.html
[113] e-Michigan Deaf and Hard of Hearing. (n.d.). TTY or TDD: The text telephone.
Retrieved September 13, 2004, from
http://www.michdhh.org/assistive_devices/text_telephone.html
[114] Kozma-Spytek, L. (2002). Accessing the world of telecommunications. Retrieved
August 15, 2004, from http://tap.gallaudet.edu/AccessTelecomLKS/AcTel.htm
[115] McNeill, D., Hand and Mind: What Gestures Reveal About Thought, Univ. of
Chicago Press, 1992.
[116] Kirishima, T., Sato, K. and Chihara, K., Real-time gesture recognition by learning
and selective con-trol of visual interest points, IEEE Trans. on PAMI, 27(3):351
364, 2005.
[117] Wu, Y. and Huang, T.S., Vision-based gesture recognition: A review, 3rd
Gesture Workshop, 1999.
[118] Kettebekov, S., and Sharma, R., Understanding gestures in multimodal human
computer interaction, Int. J. on Artificial Intelligence Tools, 9(2):205-223, 2000.
[119] Turk, M., Gesture recognition, Handbook of Virtual Environment Technology, K.
Stanney (ed.), 2001.
[120] Pavlovic, V.I., Sharma, R. and Huang, T.S., Visual interpretation of hand gestures
for human-computer interaction: A review, IEEE Trans. on PAMI, 19(7):677-695,
1997.
[121] Wu, Y. and Huang, T.S., Human hand modeling, analysis and animation in the
context of human computer interaction, IEEE Signal Processing, 18(3):51-60,
2001.
[122] Yuan, Q., Sclaroff, S. and Athitsos, V., Automatic 2D hand tracking in video
sequences, IEEE Workshop on Applications of Computer Vision, 2005.
[123] Jaimes, A. and Liu, J., Hotspot components for gesture-based interaction, IFIP
Interact, 2005.
[124] Mann, W. C., Granger, C., Hurren, D., Tomita, M., and Charvat, B. (1995). An
analysis of problems with canes encountered by elderly persons. Physical and
Occupational Therapy in Geriatrics 13(1/2), 2549.
[125] Mann, W. C., Hurren, D., Tomita, M., and Charvat, B. (1995). An analysis of
problems with walkers encountered by elderly persons. Physical and
Occupational Therapy in Geriatrics 13(1/2), 123.
[126] Axtell, L. A., and Yasuda, Y. L. (1993). Assistive devices and home modifications
in geriatric rehabilitation. Geriatric Rehabilitation 9(4), 803821.

[127] Mann, W. C., Goodall, S., Justiss, M. D., and Tomita, M. (2002). Dissatisfaction
and non-use of assistive devices among frail elders. Assistive Technology 14(2),
130139.
[128] Canes and Walkers. In: Helpful Products for Older Persons (booklet series).
University at Buffalo, NY: Center for Assistive Technology, Rehabilitation
Engineering Research Center on Aging.
[129] Fernie, G. (1997). Assistive Devices. Handbbook of Human Factors and the Older
Adult. A. D. Fisk and N. Rogers. London, Academic Press: 289-310.
[130] Blesedell-Crepeau, E., Cohn, E. S. et al. (2003). Willard and Spackman's
Occupational Therapy. Philadelphia, PA, Lippincott, Willaims & Wilkins.
[131] Cook, A. M. and Hussey, S. M. (2002). Assistive Technologies: Principles and
Practice. Toronto, Mosby.
[132] Mann, W. C. (2002). Assistive devices and home modifications. In: Encyclopedia
of Aging, E. D. J., et al., eds. New York: Macmillan.
[133] Chen, L.K.P., Mann, W. C., Tomita, M. and Burford, T. (1998). An evaluation of
reachers for use by older persons with disabilities. Assistive Technology 10(2),
113125.
[134] Moriyama, T., Kanade, T., Xiao, J. and Cohn, J., Meticulously Detailed Eye
Region Model and Its Application to Analysis of Facial Images, IEEE Trans. on
PAMI, 28(5):738-752, 2006.
[135] Wang, J.G., Sung, E. and Venkateswarlu, R., Eye gaze estimation from a single
image of one eye, ICCV, pp. 136-143, 2003.
[136] Ruddaraju, R., Haro, A., Nagel, K., Tran, Q., Essa, I., Abowd, G. and Mynatt, E.,
Perceptual user inter-faces using vision-based eye tracking, ICMI, 2003.
[137] Sibert, L.E. and Jacob, R.J.K., Evaluation of eye gaze interaction, ACM Conf.
Human Factors in Computing Systems (CHI), pp. 281-288, 2000.
[138] Heishman, R., Duric, Z. and Wechsler, H., Using eye region biometrics to reveal
affective and cogni-tive states, CVPR Workshop on Face Processing in Video,
2004.
[139] Santella, A. and DeCarlo, D., Robust clustering of eye movement recordings for
quantification of visual interest, Eye Tracking Research and Applications (ETRA),
pp. 27-34, 2004.
[140] Parnes, R. B. (2003). GPS technology and Alzheimers disease: Novel use for an
existing technology. Retrieved August 15, 2004, from
http://www.cs.washington.edu/assistcog/NewsArticles/HealthGate/GPS
%20Technology%20and%20Alzheimers%20Disease%20Novel%20Use%20for
%20an%20Existing%20Technology%20CHOICE%20For%20HealthGate.htm
[141] Patterson, D. J., Etzioni, O., and Kautz, H. (2002). The activity compass.
Presented at UbiCog 02: First International Workshop on Ubiquitous Computing
for Cognitive Aids, Gteborg, Sweden.
[142] Sharon Oviatt, Multimodal Interfaces, Handbook of Human-Computer
Interaction, (ed. by J. Jacko & A. Sears), Lawrence Erlbaum: New Jersey, 2002.
[143] Doherty T.J., Vandervoort A.A., Taylor A.W., and Brown W.F., Effects of motor unit

losses on strength in older men and women; 1993.

[144] Pei-Fang Tang and Marjorie H. Woollacott, Inefficient postural responses to
unexpected slips during walking in older adults; 1998.
[145] Damien M. Callahan, Stephen A. Foulis, Jane A. Kent-Braun, Age-related fatigue
resistance in the knee extensor muscles is specific to contraction mode; 2009.
[146] Cole KJ, Rotella DL, & Harper JG, Mechanisms for age-related changes of
fingertip forces during precision gripping and lifting in adults; 1999.
[147] Robert P. Rutstein, Kent M. Daum, Anomalies of Binocular Vision: Diagnosis &
Management, Mosby, 1998.
[148] National Center on Birth Defects and Developmental Disabilities, October 3, 2002,
"Cerebral Palsy", www.cdc.gov.
[149] Beukelman, David R.; Mirenda, Pat (1999). Augmentative and Alternative
Communication: Management of severe communication disorders in children and
adults (2nd ed.). Baltimore: Paul H Brookes Publishing Co. pp. 246249. ISBN
1557663335.
[150] "Thames Valley Children's Centre Cerebral Palsy Causes and Prevalence",
http://web.archive.org/web/20070823084944/http://www.tvcc.on.ca/gateway.php?
id=167&cid=2
[151] Johnson, Ann (2002). "Prevalence and characteristics of children with cerebral
palsy in Europe". Developmental medicine and child neurology 44 (9): 63340.
[152] Stanley F, Blair E, Alberman E. Cerebal Palsies: Epidemiology and Causal
Pathways. London, United Kingdom: MacKeith Press; 2000.
[153] Jankovic J (April 2008). "Parkinson's disease: clinical features and diagnosis". J.
Neurol. Neurosurg. Psychiatr. 79 (4): 36876.
[154] Majithia V, Geraci SA (2007). "Rheumatoid arthritis: diagnosis and management".
Am. J. Med. 120 (11): 9369.
[155] "Osteoarthritis" at Dorland's Medical Dictionary.
[156] Nuez M, Nuez E, Sastre S, Del-Val JL, Segur JM, Macule F., Prevalence of
knee osteoarthritis and analysis of pain, rigidity, and functional incapacity,
Department of Reumathology, Hospital Clnic Barcelona, Unidad de Rodilla,
Departamento COT, ICEMEQ, c/Villarroel 170, 08036 Barcelona, Spain, 2008
[157] Jonathan Cluett, M.D., Hip Arthritis, Information about hip arthritis and available
treatments, About.com Guide, 2010.
[158] National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS),
"Shoulder Problems", National Institutes of Health, Department of Health and
Human Services.
[159] Taber, Clarence Wilbur; Venes, Donald (2009). Taber's cyclopedic medical
dictionary. F.A. Davis. pp. 21734. ISBN 0-8036-1559-0.
[160] Lin VWH; Cardenas DD; Cutter NC; Frost FS; Hammond MC (2002). Spinal Cord
Medicine: Principles and Practice. Demos Medical Publishing.
[161] Kirshblum S; Campagnolo D; Delisa J (2001). Spinal Cord Medicine. Lippincott
Williams & Wilkins.

[162] Thomas C. Weiss, Hemiparesis Facts and Information, 2010, Disabled World,
Neurological Disorders.
[163] Hemiplegia Treatments Definition, symptoms and treatments,
http://www.hemiplegiatreatment.net/
[164] Donnan GA, Fisher M, Macleod M, Davis SM (May 2008). "Stroke". Lancet 371
(9624): 161223. doi:10.1016/S0140-6736(08)60694-7.
[165] Stanford Hospital & Clinics, Cardiovascular Diseases: Effects of Stroke.
[166] Davis FA, Bergen D, Schauf C, McDonald I, Deutsch W (November 1976).
"Movement phosphenes in optic neuritis: a new clinical sign". Neurology 26 (11):
11004.
[167] Page NG, Bolger JP, Sanders MD, January 1982, "Auditory evoked phosphenes
in optic nerve disease". J. Neurol. Neurosurg. Psychiatr. 45 (1): 712.
[168] Compston A, Coles A., October 2008, "Multiple sclerosis". Lancet 372 (9648):
150217.
[169] AgingEye Times, "Macular Degeneration types and risk factors". Agingeye.net.
[170] Merck Manual, Home Edition, "Glaucoma", Merck.com.
[171] Better Health Channel, "Colour blindness",
http://www.betterhealth.vic.gov.au/bhcv2/bhcarticles.nsf/pages/Colour_blindness
[172] World Health Organization, Priority eye diseases, Prevention of Blindness and
Visual Impairment
[173] Kertes PJ, Johnson TM, ed. (2007). Evidence Based Eye Care. Philadelphia, PA:
Lippincott Williams & Wilkins. ISBN 0-7817-6964-7.
[174] Da Costa SS; Rosito, Letcia Petersen Schmidt; Dornelles, Cristina (February
2009). "Sensorineural hearing loss in patients with chronic otitis media". Eur Arch
Otorhinolaryngol 266 (2): 2214. doi:10.1007/s00405-008-0739-0.
[175] Dorland's Medical Dictionary, "Otosclerosis", http://en.wikipedia.org/wiki/Dorland
%27s_Medical_Dictionary.
[176] Occupational Safety and Health Standards (OSHA), Occupational noise
exposure, OSHA 29,
http://www.osha.gov/pls/oshaweb/owadisp.show_document?
p_table=STANDARDS&p_id=9735
[177] Kral A, O'Donoghue GM. Profound Deafness in Childhood. New England J
Medicine 2010: 363; 1438-50.
[178] D.W. Robinson and G.J. Sutton "Age effect in hearing - a comparative analysis of
published threshold data." Audiology 1979; 18(4): 320-334.
[179] World Health Organization ICD-10 F95.8 Stuttering
[180] Daly, David A.; Burnett, Michelle L. (1999). Curlee, Richard F.. ed. Stuttering and
Related Disorders of Fluency. New York: Thieme. p. 222. ISBN 0-86577-764-0.
[181] Leonard, Laurence B. (1998). Children with specific language impairment.
Cambridge, Mass: The MIT Press. ISBN 0-262-62136-3.
[182] O'Sullivan, S. B., & Schmitz, T. J. (2007). Physical rehabilitation. (5th ed.).

Philadelphia (PA): F. A. Davis Company.

[183] Fadil, H., Borazanci, A., Haddou, E. A. B.,Yahyaoui, M., Korniychuk, E., Jaffe, S.
L., Minagar, A. (2009). "Early Onset Dementia". International Review of
Neurobiology. International Review of Neurobiology 84: 245262.
[184] Frstl H, Kurz A. Clinical features of Alzheimer's disease. European Archives of
Psychiatry and Clinical Neuroscience. 1999;249(6):288290.
[185] USer Interface eXtensible Markup Language, http://www.usixml.org
[186] Guerrero, J., Vanderdonckt, J., Gonzalez Calleros, J.M., FlowiXML: a Step
towards Designing Workflow Management Systems, Journal of Web Engineering,
Vol. 4, No. 2, 2008, pp. 163-182.
[187] Carnegie Mellon University (CMU) Sphinx, Open Source Toolkit For Speech
Recognition, http://cmusphinx.sourceforge.net/
[188] Pattern Recognition and Human Language Technologies, iATROS Speech
Recogniser, http://prhlt.iti.upv.es/page/projects/multimodal/idoc/iatros
[189] Julius project team, Nagoya Institute of Technology,
http://julius.sourceforge.jp/en_index.php
[190] Simon Listens - non profit organisation for research and apprenticeship ,
http://simon-listens.org/index.php?id=122&L=1
[191] Cambridge University Engineering Department, Hidden Markov Model Toolkit
(HTK), http://htk.eng.cam.ac.uk/
[192] RWTH ASR, The RWTH Aachen University Speech Recognition System,
http://www-i6.informatik.rwth-aachen.de/rwth-asr/
[193] Jonathan Duddington, eSpeak text to speech, http://espeak.sourceforge.net/
[194] Alan W. Black, Centre for Speech Technology Research, University of Edinburgh,
Festival Speech Synthesis System, http://www.cstr.ed.ac.uk/projects/festival/
[195] Keiichi Tokuda, Keiichiro Oura, Kei Hashimoto, Sayaka Shiota, Heiga Zen, Junichi
Yamagishi, Tomoki Toda, Takashi Nose, Shinji Sako, Alan W. Black, HMM-based
Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
[196] Conti F, Barbagli F, Morris D, Sewell C. "CHAI 3D: An Open-Source Library for the
Rapid Development of Haptic Scenes", IEEE World Haptics, Pisa, Italy, March
2005.
[197] Sensable Technologies, PHANTOM Desktop haptic device,
http://www.sensable.com/haptic-phantom-desktop.htm
[198] Sensable Technologies, PHANTOM Omni haptic device,
http://www.sensable.com/haptic-phantom-omni.htm
[199] Novint Technologies, Inc. (NVNT), Novint Falcon,
http://www.novint.com/index.php/products/novintfalcon
[200] ATLAS, Automatic Translation into Sign Language, Bando Converging
Technologies 2007, Regione Piemonte, http://www.atlas.polito.it/

Appendix: Supplementary Multimodal

Interaction Models
A. Automotive Area
A.1 Sit: Car seat

assistive
device
Sit Motor Car Wheelchair Sit Motor Sliding/Swiv
seat user, elling aid
Lower Swing/Slide Motor Car interior
impaired (arms,
user hands)
Sit Motor Car seat /
Swivel seat
Table 25: Sit: Car seat Multimodal Interaction Model definition.
Figure 40: Sit: Car seat Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Sit_on_car_seat" type="abstraction">
<task id="st0task1" name="Sit(modality:motor)(object:car_seat)"
<task id="st0task2" name="Use_assistive_devices" type="abstraction">
<task id="st0task3" name="Sit(modality:motor)
(object:sliding_aid,swivelling_aid)" type="interaction"/>
<task id="st0task4" name="Swing_Slide(modality:motor)
(means:arms,hands)(object:car_interior)" type="interaction"/>
<task id="st0task5" name="Sit(modality:motor)
</task>
</task>
<enabling>
</enabling>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 19: Sit: Car seat Multimodal Interaction Model (UsiXML source code).
A.2 Swing (legs): Inside car

/ assistive
device
Swing Motor Car Wheelchair Grasp Motor Leg lifters
(legs) interior user (hands)
Lift (hands) Motor Leg lifters
Lower limb Grasp Motor Legs
impaired (hands)
Lift (hands) Motor Legs
Table 26: Swing (legs): Inside car Multimodal Interaction Model definition.

Figure 41: Swing (legs): Inside car Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Swing_legs_inside_car" type="abstraction">
<task id="st0task1" name="Swing(modality:motor)(means:legs)
(object:car_interior)" type="interaction"/>
<task id="st0task2" name="Use_leg_lifters" type="abstraction">
(object:leg_lifters)" type="interaction"/>
(object:leg_lifters)" type="interaction"/>
</task>
<task id="st0task3" name="Lift_legs_with_hands" type="abstraction">
(object:legs)" type="interaction"/>
(object:legs)" type="interaction"/>
</task>
</task>
<enabling>
</enabling>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 20: Swing (legs): Inside car Multimodal Interaction Model (UsiXML
source code).

A.3 Grasp (hand): Interior door handle

/ assistive
device
Grasp Motor Interior Wheelchair Pull Motor Support
(hand) door user (hands) bar/Portable
handle handle
Swing Motor Car
(torso) seat/Swivel
seat
Lower limb Bend Motor Car seat/
impaired (back, legs) Swivel seat
Table 27: Grasp (hand): Interior door handle Multimodal Interaction Model definition.
Figure 42: Grasp (hand): Interior door handle Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Grasp_interior_door_handle"
type="abstraction">
(object:interior_door_handle)" type="interaction"/>
<task id="st0task2" name="Pull_Swing" type="abstraction">
(object:support_bar,portable_handle)" type="interaction"/>
<task id="st0task5" name="Swing(modality:motor)(means:torso)
</task>
<task id="st0task3" name="Bend(modality:motor)(means:back,legs)
</task>
<enabling>
</enabling>
</taskmodel>
CodeSnippet 21: Grasp (hand): Interior door handle Multimodal Interaction Model
A.4 Pull (left hand): Interior door handle

assistive
device
Pull Motor Interior Upper Speak Voice Voice
(left door limb control activated
hand) handle impaired doors
Push Motor Push button
(hand, that closes
elbow) door
Table 28: Pull (left hand): Interior door handle Multimodal Interaction Model definition.

Figure 43: Pull (left hand): Interior door handle Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Pull_left_hand_interior_door_handle"
type="abstraction">
<task id="st0task1" name="Pull(modality:motor)(means:left_hand)
(object:voice_activated_doors)" type="interaction"/>
(object:button_that_closes_door)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 22: Pull (left hand): Interior door handle Multimodal Interaction Model
A.5 Push (right hand): Lock button

assistive
device
Push Motor Lock Upper Speak Voice Voice
(right button limb control activated door
hand) impaired lock
Table 29: Push (right hand): Lock button Multimodal Interaction Model definition.

Figure 44: Push (right hand): Lock button Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Push_right_hand_lock_button"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:right_hand)
(object:lock_button)" type="interaction"/>
(object:voice_activated_door_lock)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 23: Push (right hand): Lock button Multimodal Interaction Model
A.6 Press (right hand): Eject button on belt buckle

assistive
device
Press Motor Eject Upper Press Motor Lever / switch /
(right button limb (hand) button that
hand) on belt impaired ejects belt
buckle tongue from
buckle
Speak Voice Voice controlled
control system that
ejects belt
tongue from
buckle
Table 30: Press (right hand): Eject button on belt buckle Multimodal Interaction Model
definition.

Figure 45: Press (right hand): Eject button on belt buckle Multimodal Interaction
Model relationships.

<taskmodel>
<task id="st0task0"
name="Press_right_hand_eject_button_on_belt_buckle"
type="abstraction">
<task id="st0task1" name="Press(modality:motor)(means:right_hand)
(object:eject_button_on_belt_buckle)" type="interaction"/>
(object:lever,switch,button_that_ejects_belt_tongue_from_buckle)"
(object:voice_controlled_system_that_ejects_belt_tongue_from_buckle)
" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 24: Press (right hand): Eject button on belt buckle Multimodal
Interaction Model (UsiXML source code).

A.7 Grasp (right hand): Interior door handle

assistive
device
Grasp Motor Interior Upper Press Motor Lever / switch
(right door limb (hand) / button that
hand) handle impaired opens door
Speak Voice Voice
control controlled
system that
opens doors
Table 31: Grasp (right hand): Interior door handle Multimodal Interaction Model
definition.
Figure 46: Grasp (right hand): Interior door handle Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Grasp_right_hand_interior_door_handle"
type="abstraction">
(object:lever,switch,button_that_opens_door)" type="interaction"/>
(object:voice_controlled_system_that_opens_doors)"
</task>
</taskmodel>
CodeSnippet 25: Grasp (right hand): Interior door handle Multimodal Interaction
Model (UsiXML source code).
A.8 Push (left hand): Interior door side

assistive
device
Push Motor Interior Upper Push (feet) Motor Interior door
(left door limb side
hand) side impaired
Table 32: Push (left hand): Interior door side Multimodal Interaction Model definition.
Figure 47: Push (left hand): Interior door side Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Push_left_hand_interior_door_side"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:left_hand)
(object:interior_door_side)" type="interaction"/>
<task id="st0task2" name="Push(modality:motor)(means:feet)
(object:interior_door_side)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 26: Push (left hand): Interior door side Multimodal Interaction Model
A.9 Pull down (hands): Sun shield

Task Moda- Task Disability Alternative Alternative Alternative
lity object task(s) modality task object /
assistive
device
Pull Motor Sun Upper Move Motor Headrest
down shield limb (head) control to lower
(hand) impaired sun shield
Table 33: Pull down (hands): Sun shield Multimodal Interaction Model definition.
Figure 48: Pull down (hands): Sun shield Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Pull_down_sun_shield" type="abstraction">
<task id="st0task1" name="Pull_down(modality:motor)(means:hand)
(object:sun_shield)" type="interaction"/>
<task id="st0task2" name="Move(modality:motor)(means:head)
(object:headrest_control_for_sun_shield)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 27: Pull down (hands): Sun shield Multimodal Interaction Model
A.10 Grasp (hand): Steering wheel

Task Moda- Task Disability Alternative Alternative Alternative
lity object task(s) modality task object /
assistive
device
Grasp Motor Steerin Upper Grasp Motor Grip fitted to
(hand) g wheel limb (hand) the steering
impaired wheel
Table 34: Grasp (hand): Steering wheel Multimodal Interaction Model definition.
Figure 49: Grasp (hand): Steering wheel Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Grasp_steering_wheel" type="abstraction">
(object:steering_wheel)" type="interaction"/>
(object:grip_fitted_to_the_steering_wheel)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 28: Grasp (hand): Steering wheel Multimodal Interaction Model
A.11 Push (left foot): Gear pedal

assistive
device
Push Motor Gear Lower Push Motor Button gear
(left pedal limb (hand) selector
foot) impaired
Table 35: Push (left foot): Gear pedal Multimodal Interaction Model definition.
Figure 50: Push (left foot): Gear pedal Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Push_left_foot_gear_pedal"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:left_foot)
(object:gear_pedal)" type="interaction"/>
(object:button_gear_selector)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 29: Push (left foot): Gear pedal Multimodal Interaction Model (UsiXML
source code).
A.12 Push (right foot): Accelerator pedal

Task Modality Task Disability Alternative Altern. Alternative
/ assistive
device
Push Motor Accelerator Lower Pull (hand) Motor Lever/ring
(right pedal limb accelerator
foot) impaired Push Motor Floor
(hand) mounted
levers
Push down Motor Radial lever
(hand)
Speak Voice Voice
control control gear
system
Table 36: Push (right foot): Accelerator pedal Multimodal Interaction Model definition.
Figure 51: Push (right foot): Accelerator pedal Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Push_right_foor_accelerator_pedal"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:right_foot)
(object:accelerator_pedal)" type="interaction"/>
(object:lever_accelerator,ring_accelerator)" type="interaction"/>
(object:floor_mounted_accelerator_lever)" type="interaction"/>
<task id="st0task4" name="Push_down(modality:motor)(means:hand)
(object:radial_accelerator_lever)" type="interaction"/>
(object:voice_control_gear_system)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 30: Push (right foot): Accelerator pedal Multimodal Interaction Model

A.13 Push (right foot): Brake pedal

Task Moda- Task Disability Alternative Alternative Alternative task
lity object task(s) modality object / assistive
device
Push Motor Brake Lower Push Motor Radial brake lever
(right pedal limb (hand)
foot) impaired Pull (hand) Motor Floor mounted
brake lever
Lower Speak Voice Voice controlled
and control brake system
upper
limb
impaired
Table 37: Push (right foot): Brake pedal Multimodal Interaction Model definition.
Figure 52: Push (right foot): Brake pedal Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Push_right_foot_brake_pedal"
type="abstraction">
(object:brake_pedal)" type="interaction"/>
(object:radial_brake_lever)" type="interaction"/>
(object:floor_moounted_brake_lever)" type="interaction"/>
(object:voice_controlled_brake_system)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 31: Push (right foot): Brake pedal Multimodal Interaction Model (UsiXML
source code).
A.14 Push (thumb): Parking brake release button

/ assistive
device
Push Motor Parking Lower Pull (hand) Motor Brake lever
(thumb) brake limb
release impaired
button
Table 38: Push (thumb): Parking brake release button Multimodal Interaction Model
definition.

Figure 53: Push (thumb): Parking brake release button Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Push_parking_brake_release_button"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:right_thumb)
(object:parking_brake_release_button)" type="interaction"/>
(object:brake_lever)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 32: Push (thumb): Parking brake release button Multimodal Interaction
A.15 Pull (hand): Hand brake

-ity object task(s) modality task object /
assistive
device
Pull Motor Hand Lower Pull (hand) Motor Brake lever
(hand) brake limb
impaired
Table 39: Pull (hand): Hand brake Multimodal Interaction Model definition.

Figure 54: Pull (hand): Hand brake Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Pull_hand_brake" type="abstraction">
<task id="st0task1" name="Pull(modality:motor)(means:right_hand)
(object:hand_brake)" type="interaction"/>
(object:brake_lever)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 33: Pull (hand): Hand brake Multimodal Interaction Model (UsiXML
source code).
A.16 Grasp (hand): Light switch

assistive
device
Grasp Motor Light Lower Push Motor Switch built
(hand) switch and upper (hand) into hand
limb controls or
impaired steering ball,
blepper or
tone systems
Table 40: Grasp (hand): Light switch Multimodal Interaction Model definition.

Figure 55: Grasp (hand): Light switch Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Grasp_hand_light_switch"
type="abstraction">
(object:light_switch)" type="interaction"/>
(object:switch_built_into_hand_controls,steering_ball,bleeper,tone_s
ystem)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 34: Grasp (hand): Light switch Multimodal Interaction Model (UsiXML
source code).

A.17 Turn (hand): Light switch

assistive
device
Turn Motor Light Lower and Push Motor Switch built
(hand) switch upper limb (hand) into hand
impaired controls or
steering ball,
blepper or
tone systems
Speak Voice Voice control
control system for
lights
Lower and Suck/blow Motor Suck and
upper limb (mouth) blow tube
impaired
Upper Move Motor Headrest
limb (head) control
impaiired
Table 41: Turn (hand): Light switch Multimodal Interaction Model definition.
Figure 56: Turn (hand): Light switch Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Turn_hand_light_switch"
type="abstraction">
(object:light_switch)" type="interaction"/>
(object:switch_built_into_hand_controls,steering_ball,blepper,tone_s
ystem)" type="interaction"/>
(object:lights_voice_controlled_system)" type="interaction"/>
<task id="st0task4" name="Suck_blow(modality:motor)(means:mouth)
(object:suck_and_blow_tube)" type="interaction"/>
(object:headrest_control)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 35: Turn (hand): Light switch Multimodal Interaction Model (UsiXML
source code).
A.18 Move up/down (hand): Direction indicator

/ assistive
device
Move Motor Direction Lower limb Move Motor Extended
up / indicator impaired up/down indicator
down (hand) stalk
(hand) Lower and Suck/blow Motor Suck and
upper limb (mouth) blow tube
impaired
Upper limb Move Motor Headrest
impaired (head) control
Table 42: Move up/down (hand): Direction indicator Multimodal Interaction Model

definition.
Figure 57: Move up/down (hand): Direction indicator Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Move_upDown_hand_direction_indicator"
type="abstraction">
<task id="st0task1" name="Move_upDown(modality:motor)(means:hand)
(object:direction_indicator)" type="interaction"/>
<task id="st0task2" name="Move_upDown(modality:motor)(means:hand)
(object:extended_indicator_stalk)" type="interaction"/>
<task id="st0task3" name="Suck_blow(modality:motor)(means:mouth)
(object:suck_and_blow_tube)" type="interaction"/>
(object:headrest_control)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 36: Move up/down (hand): Direction indicator Multimodal Interaction

A.19 Grasp (hand): Radio knob

assistive
device
Grasp Motor Radio Upper Speak Voice Voice
(hand) knob limb control controlled
impaired radio
Table 43: Grasp (hand): Radio knob Multimodal Interaction Model definition.
Figure 58: Grasp (hand): Radio knob Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Grasp_hand_radio_knob" type="abstraction">
(object:radio_knob)" type="interaction"/>
(object:voice_controlled_radio)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 37: Grasp (hand): Radio knob Multimodal Interaction Model (UsiXML
source code).

A.20 Turn (hand): Radio knob

assistive
device
Turn Motor Radio Upper limb Speak Voice Voice
(hand) knob impaired control controlled
radio
Table 44: Turn (hand): Radio knob Multimodal Interaction Model definition.
Figure 59: Turn (hand): Radio knob Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Turn_hand_radio_knob" type="abstraction">
(object:radio_knob)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 38: Turn (hand): Radio knob Multimodal Interaction Model (UsiXML
source code).

A.21 Push (hand): Radio button

assistive
device
Push Motor Radio Upper Speak Voice Voice controlled
(hand) button limb control radio
impaired
Table 45: Push (hand): Radio button Multimodal Interaction Model definition.
Figure 60: Push (hand): Radio button Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Push_hand_radio_button"
type="abstraction">
(object:radio_button)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 39: Push (hand): Radio button Multimodal Interaction Model (UsiXML
source code).

A.22 Push (hand): Window button

assistive
device
Push Motor Windo Motor Speak Voice Voice
(hand) w control controlled
button window
Table 46: Push (hand): Window button Multimodal Interaction Model definition.
Figure 61: Push (hand): Window button Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Push_hand_window_button"
type="abstraction">
(object:window_button)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 40: Push (hand): Window button Multimodal Interaction Model (UsiXML
source code).

A.23 Grasp (hand): Window handle

assistive
device
Grasp Motor Window Upper Speak Voice Voice
(hand) handle limb control controlled
impaired window
Table 47: Grasp (hand): Window handle Multimodal Interaction Model definition.
Figure 62: Grasp (hand): Window handle Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Grasp_hand_window_handle"
type="abstraction">
(object:window_handle)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 41: Grasp (hand): Window handle Multimodal Interaction Model

A.24 Turn (hand): Window handle

assistive
device
Turn Motor Window Upper Speak Voice Voice
(hand) handle limb control controlled
impaired window
Table 48: Turn (hand): Window handle Multimodal Interaction Model definition.
Figure 63: Turn (hand): Window handle Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Turn_hand_window_handle"
type="abstraction">
(object:window_handle)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 42: Turn (hand): Window handle Multimodal Interaction Model (UsiXML
source code).

A.25 Turn (hand): Rear mirror

assistive
device
Turn Motor Rear Upper Speak Voice Voice
(hand) mirror limb control activated
impaired mirror
Table 49: Turn (hand): Rear mirror Multimodal Interaction Model definition.
Figure 64: Turn (hand): Rear mirror Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Turn_hand_rear_mirror" type="abstraction">
(object:rear_mirror)" type="interaction"/>
(object:voice_activated_mirror)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 43: Turn (hand): Rear mirror Multimodal Interaction Model (UsiXML
source code).

A.26 Push (hand): Rear mirror

Task Modal Task Disability Alternativ Alternative Alternative
ity object e task(s) modality task object /
assistive
device
Push Motor Rear Upper Speak Voice Voice
(hand) mirror limb control controlled
impaired mirror
Table 50: Push (hand): Rear mirror Multimodal Interaction Model definition.
Figure 65: Push (hand): Rear mirror Multimodal Interaction Model

relationships.

<taskmodel>
<task id="st0task0" name="Push_hand_rear_mirror" type="abstraction">
(object:rear_mirror)" type="interaction"/>
(object:voice_controlled_mirror)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 44: Push (hand): Rear mirror Multimodal Interaction Model (UsiXML
source code).

A.27 Grasp (right hand): Gear handle

assistive
device
Grasp Motor Gear Upper Push Motor Gear controls
(right handle limb (hand) on steering
hand) impaired wheel
Table 51: Grasp (right hand): Gear handle Multimodal Interaction Model definition.
Figure 66: Grasp (right hand): Gear handle Multimodal Interaction Model
relationships.

<taskmodel>
<task id="st0task0" name="Grasp_right_hand_gear_handle"
type="abstraction">
(object:gear_handle)" type="interaction"/>
(object:gear_controls_on_steering_wheel)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 45: Grasp (right hand): Gear handle Multimodal Interaction Model

A.28 Push (right hand): Gear handle

Task Modal Task Disability Alternative Alternative Alternative task
ity object task(s) modality object /
assistive device
Push Motor Gear Upper Push Motor Gear controls on
(right handle limb (hand) steering wheel
hand impaired
)
Table 52: Push (right hand): Gear handle Multimodal Interaction Model definition.
Figure 67: Push (right hand): Gear handle Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Push_right_hand_gear_handle"
type="abstraction">
<task id="st0task1" name="Push(modality:motor)(means:right_hand)
(object:gear_handle)" type="interaction"/>
(object:grar_controls_on_steering_wheel)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 46: Push (right hand): Gear handle Multimodal Interaction Model

A.29 Push (hand): Navigation system buttons

ity object task(s) modality task object
/ assistive
device
Push Motor Navigation Upper Speak Voice Voice
(hand) system limb control controlled
buttons impaired navigation
system
Upper Move Motor Scanning
limb and (head) switch
speech interface for
impaired navigation
system
Table 53: Push (hand): Navigation system buttons Multimodal Interaction Model
definition.
Figure 68: Push (hand): Navigation system buttons Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Push_hand_navigation_system_buttons"
type="abstraction">
(object:navigation_system_buttons)" type="interaction"/>
(object:voice_controlled_navigation_system)" type="interaction"/>
(object:scanning_switch_interface_for_navigation_system)"
</task>
</taskmodel>
CodeSnippet 47: Push (hand): Navigation system buttons Multimodal Interaction
A.30 Push (right foot): Rear brake pedal

assistive
device
Push Motor Rear Lower Pull (right Motor Dual lever
(right brake limb hand) braking system
foot) pedal impaired
Table 54: Push (right foot): Rear brake pedal Multimodal Interaction Model definition.
Figure 69: Push (right foot): Rear brake pedal Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Push_right_foot_rear_brake_pedal"
type="abstraction">
(object:rear_brake_pedal)" type="interaction"/>
<task id="st0task2" name="Pull(modality:motor)(means:right_hand)
(object:dual_lever_braking_system)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 48: Push (right foot): Rear brake pedal Multimodal Interaction Model
A.31 Listen: Navigation system audio cues

/ assistive
device
Listen Audition Navi Hearing Look Vision Navigation
gation impaired system
system visual cues
audio
cues
Table 55: Listen: Navigation system audio cues Multimodal Interaction Model
definition.
Figure 70: Listen: Navigation system audio cues Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Listen_navigation_system_audio_cues"
type="abstraction">
<task id="st0task1" name="Listen(modality:audition)
(object:navigation_system_audio_cues)" type="interaction"/>
<task id="st0task2" name="Look(modality:vision)
(object:navigation_system_visual_cues)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 49: Listen: Navigation system audio cues Multimodal Interaction Model
B. Smart Living Spaces
B.1 Grasp (hand): Faucet controls

Task Modal Task Disability Alternative Alter Alternative task
ity object task(s) native object /
modality assistive
device
Grasp Motor Faucet Upper Push (foot) Motor Foot pedal
(hand) controls limb faucet control
impaired
Table 56: Grasp (hand): Faucet controls Multimodal Interaction Model definition.
Figure 71: Grasp (hand): Faucet controls Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Grasp_hand_faucet_controls"
type="abstraction">
(object:faucet_controls)" type="interaction"/>
(object:foot_pedal_faucet_control)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 50: Grasp (hand): Faucet controls Multimodal Interaction Model
B.2 Grasp (hand): Hob gas control knob

Task Modal Task Disabil Alternative Alternative Alternative
ity object ity task(s) modality task object /
assistive
device
Grasp Motor Hob Wheel Grasp Motor Hob gas control
(hand) gas chair (hand) knob on height
control user adjustable
knob cooking station
Visually Grasp Motor Hob gas knob
impaired (hand) with distinctive
user surface (blade
knob)
Table 57: Grasp (hand): Hob gas control knob Multimodal Interaction Model definition.
Figure 72: Grasp (hand): Hob gas control knob Multimodal Interaction Model
relationships.


<taskmodel>
<task id="st0task0" name="Grasp_hand_hob_gas_control_knob"
type="abstraction">
(object:hob_gas_control)" type="interaction"/>
(object:hob_gas_control_knob_on_height_adjustable_cooking_station)"
(object:hob_gas_knob_with_distinctive_surface)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 51: Grasp (hand): Hob gas control knob Multimodal Interaction Model
B.3 Push (hand): Stove knob

assistive
device
Push Motor Stove Visually Push Motor Stove knob
(hand) knob impaired (hand) with
distinctive
surface /
audio
feedback
Table 58: Push (hand): Stove knob Multimodal Interaction Model definition.
Figure 73: Push (hand): Stove knob Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Push_hand_stove_knob" type="abstraction">
(object:stove_knob)" type="interaction"/>
(object:stove_knob_with_distinct_surface_or_audio_feedback)"
</task>
</taskmodel>
CodeSnippet 52: Push (hand): Stove knob Multimodal Interaction Model (UsiXML
source code).
B.4 Pull (hand): Washing machine porthole handle

/ assistive
device
Pull Motor Washing Upper Push Motor Button that
(hand) machine limb (hand) opens
porthole impaired, washing
handle wheelchai machine
r user, porthole
elderly
Table 59: Pull (hand): Washing machine porthole handle Multimodal Interaction
Model definition.
Figure 74: Pull (hand): Washing machine porthole handle Multimodal Interaction
Model relationships.


<taskmodel>
<task id="st0task0" name="Pull_hand_washing_machine_porthole_handle"
type="abstraction">
(object:washing_machine_porthole_handle)" type="interaction"/>
(object:button_that_opens_washing_machine_porthole)"
</task>
</taskmodel>
CodeSnippet 53: Pull (hand): Washing machine porthole handle Multimodal
Interaction Model (UsiXML source code).
B.5 Turn (hand): Dishwasher knob

assistive
device
Turn Motor Dish Visually Turn (hand) Motor Dishwasher
(hand) washer impaired knob with
knob tactile/audio
feedback,
blade knob
Table 60: Turn (hand): Dishwasher knob Multimodal Interaction Model definition.
Figure 75: Turn (hand): Dishwasher knob Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Turn_hand_dishwasher_knob"
type="abstraction">
(object:dishwasher_knob)" type="interaction"/>
(object:dishwasher_knob_with_tactile_or_audio_feedback,blade_knob)"
</task>
</taskmodel>
CodeSnippet 54: Turn (hand): Dishwasher knob Multimodal Interaction Model
B.6 Push (hand): Hood button

assistive
device
Push Motor Hood Visually Push Motor Blade knobs,
(hand) button imapired (hand) buttons with
audio
feedback
Table 61: Push (hand): Hood button Multimodal Interaction Model definition.
Figure 76: Push (hand): Hood button Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Push_hand_hood_button" type="abstraction">
(object:hood_buttons)" type="interaction"/>
(object:blade knob,button_with_audio_feedback)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 55: Push (hand): Hood button Multimodal Interaction Model (UsiXML
source code).
B.7 Pull (hand): Oven door handle

assistive
device
Pull Motor Oven Upper Pull (hand) Motor Lever, push
(hand) door limb button that
handle impaired opens oven
Table 62: Pull (hand): Oven door handle Multimodal Interaction Model definition.
Figure 77: Pull (hand): Oven door handle Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Pull_hand_oven_door_handle"
type="abstraction">
(object:oven_door_handle)" type="interaction"/>
(object:lever,button_that_opens_oven)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 56: Pull (hand): Oven door handle Multimodal Interaction Model
C. Workplace Office
C.1 Twist (hand): Faucet control

assistive
device
Twist Motor Faucet Upper Push (foot) Motor Foot pedal
(hand) control limb faucet control
impaired
Table 63: Twist (hand): Faucet control Multimodal Interaction Model definition.
Figure 78: Twist (hand): Faucet control Multimodal Interaction Model

relationships.


<taskmodel>
<task id="st0task0" name="Twist_hand_faucet_control"
type="abstraction">
<task id="st0task1" name="Twist(modality:motor)(means:hand)
(object:faucet_control)" type="interaction"/>
(object:foot_pedal_control)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 57: Twist (hand): Faucet control Multimodal Interaction Model (UsiXML
source code).
C.2 Sit (knee, back): On toilet

assistive
device
Sit Motor On Wheel Reach Motor Side/rear
(knee, toilet chair user (hands) support bar
back) Grasp Motor Side/rear
(hands) support bar
Pull Motor Side/rear
(hands) support bar
Lift (hands) Motor Over toilet
Lower Motor On toilet
(hands)
Table 64: Sit (knee, back): On toilet Multimodal Interaction Model definition.

Figure 79: Stand up (knee, back): Toilet Multimodal Interaction Model relationships.

<taskmodel>
<task id="st0task0" name="Sit_on_toilet" type="abstraction">
<task id="st0task1" name="Sit(modality:motor)(means:knee,back)
(object:on_toilet)" type="interaction"/>
<task id="st0task2" name="Sit_using_support_bar"
type="abstraction">
<task id="st0task3" name="Reach(modality:motor)(means:hands)
(object:over_toilet)" type="interaction"/>
<task id="st0task7" name="Lower(modality:motor)(means:hands)
(object:on_toilet)" type="interaction"/>
</task>
</task>
<enabling>
</enabling>
<enabling>
</enabling>
<enabling>
</enabling>

<enabling>
</enabling>
</taskmodel>
CodeSnippet 58: Sit (knee, back): On toilet Multimodal Interaction Model (UsiXML
source code).
D. Infotainment
D.1 Push (hand): Mouse

assistive
device
Push Motor Mouse Upper Push Motor Joystick
(hand) limb (hand)
impaired
Visually Push Motor Haptic device
impaired (hand)
Table 65: Push (hand): Mouse Multimodal Interaction Model definition.
Figure 80: Push (hand): Mouse Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Push_hand_mouse" type="abstraction">
</task>
</taskmodel>
CodeSnippet 59: Push (hand): Mouse Multimodal Interaction Model (UsiXML source
code).
D.2 Move (hand): mouse

assistive
device
Move Motor Mouse Upper Move Motor Joystick
(hand) limb (hand)
impaired
Visually Move Motor Haptic
impaired (hand) device
Table 66: Move (hand): mouse Multimodal Interaction Model definition.
Figure 81: Move (hand): mouse Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Move_hand_mouse" type="abstraction">
</task>
</taskmodel>
CodeSnippet 60: Move (hand): mouse Multimodal Interaction Model (UsiXML source
code).
D.3 Press (hand): Keyboard key

assistive
device
Press Motor Key Upper Press Motor Alternative
(hand) board limb (hand) keyboard
key impaired
Visually Speak Voice Speech
impaired control recognition
software
Table 67: Press (hand): Keyboard key Multimodal Interaction Model definition.
Figure 82: Press (hand): Keyboard key Multimodal Interaction Model relationships.


<taskmodel>
<task id="st0task0" name="Press_hand_keyboard_key"
type="abstraction">
(object:keyboard_key)" type="interaction"/>
(object:alternative_keyboard)" type="interaction"/>
(object:speech_recognition_software)" type="interaction"/>
</task>
</taskmodel>
CodeSnippet 61: Press (hand): Keyboard key Multimodal Interaction Model (UsiXML
source code).

Veritas D2.8.1

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Veritas D2.8.1

Загружено:

Авторское право:

Доступные форматы

Accessible and Assistive ICT

First prototypes of the multimodal interface

Deliverable No. D2.8.1

Version History Table

December 2011 i CERTH/ITI

December 2011 ii CERTH/ITI

2.3.1 Canes and walkers.............................................................................................22

December 2011 iii CERTH/ITI

3.2.21 Noise induced hearing loss..............................................................................41

December 2011 iv CERTH/ITI

A.3 Grasp (hand): Interior door handle.......................................................................95

December 2011 v CERTH/ITI

December 2011 vi CERTH/ITI

Figure 36: Data flow regarding the Speech Synthesis Module...................................................76

December 2011 vii CERTH/ITI

December 2011 viii CERTH/ITI

December 2011 ix CERTH/ITI

December 2011 x CERTH/ITI

List of Code Snippets

December 2011 xi CERTH/ITI

December 2011 xii CERTH/ITI

December 2011 xiii CERTH/ITI

December 2011 1 CERTH/ITI

1.1 Defining Multimodal Interaction

1.1.1 Multimodal interaction: a human-centered view

Table 1: Different senses and their corresponding modalities [5].

December 2011 2 CERTH/ITI

somatic senses (touch, pressure, temperature, pain),

1.1.2 Multimodal interaction: a system-centered view

December 2011 3 CERTH/ITI

Figure 1: Design space for multimodal systems [9].

December 2011 4 CERTH/ITI

main categories of systems that have very different underlying paradigms:

1.2 Modelling Multimodal Interaction

1.2.1 Bersens taxonomy

December 2011 5 CERTH/ITI

linguistic and nonarbitrary modalities such as everyday spoken language,

1.2.2 Architecture of multimodal user interfaces

Figure 2: An architecture of multimodal user interfaces. Adapted from [16].

December 2011 6 CERTH/ITI

interface to expert systems.

1.2.3 Fusion of input modalities

1.2.3.2 Open Agent Architecture

December 2011 7 CERTH/ITI

Modeling the fusion of multiple modalities is an important problem in multimodal

1.2.3.3 Multimodal Architecture proposed by Obrenovic et al.

Figure 3: Obrenovic et al framework [21]

December 2011 8 CERTH/ITI

Modality Composed of Effect which the modality uses Effect type

Table 2: Interaction modalities described using the Obrenovic et al framework [21]

December 2011 9 CERTH/ITI

Disability Effects reduced by a disability

Table 3: Disabilities and their constraints (from [21])

December 2011 10 CERTH/ITI

Constraints Situations (constraint Influence on the usage of effect

Table 4: Constraints introduced by driving a car (from [97])

1.3 Multimodal Applications

December 2011 11 CERTH/ITI

1.3.1 Ambient spaces

December 2011 12 CERTH/ITI

1.3.3 Virtual environments

1.3.5 Users with disabilities

December 2011 13 CERTH/ITI