0 Голоса «за»0 Голоса «против»

Просмотров: 3200 стр.ca asc asc ascascascascasc

Dec 16, 2017

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

ca asc asc ascascascascasc

© All Rights Reserved

Просмотров: 3

ca asc asc ascascascascasc

© All Rights Reserved

- Igc3
- A Study on Rural Road Widening Project Based on Prediction of Traffic Volume - A Case Study in Rajkot Morbi State Highway 24
- Driving log
- Plaint John Nganga Kigotho
- Project Profile to Be Funded by the City
- Traffic Congestion in Mumbai, Will Public Authorities Take the Opportunity to Leapfrog - Gaële Lesteven
- Transportation Systems Engineering
- Congestion in the Commonwealth
- School
- fkfkfkfkfk
- Determination of traffic delay at selected intersection within Ilorin Metropolis
- UPDATES OF ROAD SAFETY STATUS IN MALAYSIA
- rtaq.pdf
- 25887782
- Paper ID-71201985
- A Public Health Perspective of Road Traffic Accidents
- Reading Tips
- Acc Injury Prev2
- spec_4750_498
- Behavior Based Safety Process

Вы находитесь на странице: 1из 200

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/265038765

an Automated Highway System: Learning

Methods and Interacting Vehicles...

Article

CITATIONS READS

22 509

4 authors, including:

Intuit University of Nevada, Las Vegas

25 PUBLICATIONS 959 CITATIONS 215 PUBLICATIONS 1,786 CITATIONS

Some of the authors of this publication are also working on these related projects:

All content following this page was uploaded by Pushkin Kachroo on 28 October 2014.

Intelligent Navigation of Autonomous Vehicles in an Automated

Highway System: Learning Methods and Interacting Vehicles Approach

by

Cem nsal

Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State

University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

in

ELECTRICAL ENGINEERING

Joseph A. Ball

William T. Baumann

Pushkin Kachroo

Hugh F. VanLandingham

Blacksburg, Virginia

Reinforcement Schemes

Intelligent Navigation of Autonomous Vehicles in an Automated Highway System:

Learning Methods and Interacting Vehicles Approach

by

Cem nsal

The Bradley Department of Electrical Engineering

(ABSTRACT)

One of todays most serious social, economical and environmental problems is traffic

congestion. In addition to the financial cost of the problem, the number of traffic related injuries

and casualties is very high. A recently considered approach to increase safety while reducing

congestion and improving driving conditions is Automated Highway Systems (AHS). The AHS

will evolve from the present highway system to an intelligent vehicle/highway system that will

incorporate communication, vehicle control and traffic management techniques to provide safe,

fast and more efficient surface transportation. A key factor in AHS deployment is intelligent

vehicle control. While the technology to safely maneuver the vehicles exists, the problem of

making intelligent decisions to improve a single vehicles travel time and safety while optimizing

the overall traffic flow is still a stumbling block.

design an intelligent vehicle path controller. Using the information obtained by on-board sensors

and local communication modules, two automata are capable of learning the best possible (lateral

and longitudinal) actions to avoid collisions. This learning method is capable of adapting to the

automata environment resulting from unmodeled physical environment. Simulations for

simultaneous lateral and longitudinal control of an autonomous vehicle provide encouraging

results. Although the learning approach taken is capable of providing a safe decision, optimization

of the overall traffic flow is also possible by studying the interaction of the vehicles.

The design of the adaptive vehicle path planner based on local information is then carried

onto the interaction of multiple intelligent vehicles. By analyzing the situations consisting of

conflicting desired vehicle paths, we extend our design by additional decision structures. The

analysis of the situations and the design of the additional structures are made possible by the study

of the interacting reward-penalty mechanisms in individual vehicles. The definition of the physical

environment of a vehicle as a series of discrete state transitions associated with a stationary

automata environment is the key to this analysis and to the design of the intelligent vehicle path

controller.

This work was supported in part by the Center for Transportation Research and Virginia

DOT under Smart Road project, by General Motors ITS Fellowship program, and by Naval

Research Laboratory under grant no. N000114-93-1-G022.

iii

To all the noble men and the knights fighting the demons

of a binary world in their quest for artificial intelligence.

iv

Acknowledgments

It has been a long ride since I received my M.S. degree. Many friends, local, cyber and dear, many

engineering students, EE, ME, or CE, many professors, and others have seen me working through

my dissertation. All of them, directly or indirectly, touched what is written in these pages.

First of all, I have to express my gratitude for my advisors, Dr. John S. Bay, most

probably the best advisor this campus had ever seen and a legend in thesis/dissertation editing, and

Dr. Pushkin Kachroo, percussionist extraordinaire (Isnt that what P.E. stands for?). They were

both genuinely understanding, and very supportive of me. I learned a great deal from them about

control engineering (among other fields) and about academic life. Without their guidance, their

advice, and their help, this dissertation would not have been completed.

Other members of my dissertation committee, Dr. William T. Baumann, Dr. Hugh F.

VanLandingham, and Dr. Joseph A. Ball, who were also my professors at some point during my

five and a half years at Tech, were kind enough to listen to me babbling about an intelligent

controller.

On multiple occasions, I had the chance of talking about my research (and my life) to the

other graduate students in the Machine Intelligence Laboratory. Doug, John, Mamun, and Ferat

were some of the faces I got used to seeing almost everyday. Also, Paul and Kevin forced me

every Wednesday afternoon to have my breakfast at Arnold's. I do have a feeling that I will miss

that very much.

During my assistantship at Center for Transportation Research, I had the privilege of

working with fellow students, researchers, and centers staff. Lisa, Terry and Christina were there

to answer my unnecessary questions. Kaan zbay was helpful in my starting to work at CTR, and

I am grateful for his confidence in me. My only regret about CTR is not having been able to know

Ashwin Amanna better; he is a true gentleman.

Local Turks, Simge, Erkan, BahadIr, Levent, Oya, Bahar, Kutsal, and others, suffered

indirectly from my research efforts. Sorry guys!

My parents, AydIn and Ayhan, watched me from a distance while I worked toward my

degree, missing most of the last three years of my life. The completion of this dissertation will

mean "seeing more of me" to them, I hope. I believe that my sister Gn will not make the

mistakes I have made in becoming an electrical engineer. Watching her catching up with me

makes me confused and very proud.

Probably, the people who heard the most about my dissertation and graduate life are the

members of KM (Kelek Muhabbet). This e-mail distribution list came to life sometime in 1992 and

grew to a much larger and diverse group. Some of the members are my close friends from the

early years of my academic venture, some of them are people whom I met on the Net, some of

them I have not seen yet. They also watched me getting tired of this lovely town of Blacksburg;

some even visited me here more than once. I could not, cannot and do not want to imagine going

a week without receiving e-mail from KM.

v

I am also grateful to Berrak PInar for helping me tremendously during the last two hectic

weeks of my dissertation work.

Finally, to the person who borrowed the Proceedings of the 1995 Intelligent Vehicles

Symposium, and refused to return it to the library since July 1996: You have been acknowledged!

This work is dedicated to all scientists and engineers who work on artificial intelligence,

and know exactly what I went through.

vi

Table of Contents

Abstract ii

Acknowledgments v

Table of Contents vii

List of Figures x

List of Tables xiv

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 ITS and AHS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 Intelligent Vehicle Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Different Approaches to Intelligent Vehicle Control . . . . . . . . . 4

1.2.2 Learning Automata as an Intelligent Controller . . . . . . . . . . . . . 6

1.3 Scope and Structure of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 AHS Program Phases and the National Automated Highway Systems

Consortium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Vehicle Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Lateral Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Longitudinal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3 Combined Lateral and Longitudinal Control . . . . . . . . . . . . . . . 15

2.3 Hierarchical Control Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Other AHS Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Sensors and Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.2 Safety and Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4.3 Human Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 An Experimentation and Evaluation Framework for AHS . . . . . . . . . . . . . 19

2.5.1 FLASH Laboratory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.2 Smart Road . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 A Simulation Tool for AHS Systems: DYNAVIMTS . . . . . . . . . . . . . . . . . 24

2.6.1 Software System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.6.2 Components of DYNAVIMTS . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 Earlier Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 The Environment and the Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 The Stochastic Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Variable Structure Automata and Its Performance Evaluation . . . . . . . . . . 34

vii

3.4.1 Norms of Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.2 Variable Structure Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.5 Reinforcement Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5.1 Linear Reinforcement Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.2 Nonlinear Learning Algorithms: Absolutely Expedient Schemes 39

3.6 Extensions of the Basic Automata-Environment Model . . . . . . . . . . . . . . 41

3.6.1 S-Model Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6.2 Nonstationary Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6.3 Multi-Teacher Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.6.4 Interconnected Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Sensor/Teacher Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 Nonlinear Combination of Multiple Teacher Signals . . . . . . . . . . . . . . . . 52

4.4 The algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.6 Discussion of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1 More Complex Sensors and Sensor Modules . . . . . . . . . . . . . . . . . . . . . . 62

5.1.1 Front Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.2 Side Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Additional Sensors and Inter-vehicle Communications . . . . . . . . . . . . . . . 72

5.2.1 Lane Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.2 Pinch Condition Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3 Higher Level Additions: More Intelligent Teachers . . . . . . . . . . . . . . . . . 77

5.3.1 Lane flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.3.2 Speed flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.3.3 Additional rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4.1 Overall Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4.2 Effect of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.4.3 Encountered Problems and . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.3.1 Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4.3.2 Information Content and Other Issues . . . . . . . . . . . . . 90

6.1 A Linear Reinforcement Scheme: Linear Reward-Penalty with Unequal

Parameters, LR-P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1.1 Proof of convergence for linear reward-penalty scheme LR-P . . . 100

6.1.2 Notes on the ideal environment and penalty probabilities . . . . 109

6.2 A Nonlinear Reinforcement Scheme: NLH . . . . . . . . . . . . . . . . . . . . . . . . 110

viii

6.2.1 Function H and the Conditions for Absolute Expediency . . . . . . 111

6.2.2 Comparison of NLH with General Absolutely Expedient Scheme 114

7.1 Interconnected Automata and Games . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2 Interacting Automata and Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.2.1 Interacting Automata in an Autonomous Vehicle . . . . . . . . . . . 118

7.2.2 Interacting Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

7.3 States of the Switching Automata Environment . . . . . . . . . . . . . . . . . . . . 122

7.4 Highway Scenarios as State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . 124

7.4.1 Scenario 1: Two Vehicles on a Three-Lane Highway . . . . . . . . 128

7.4.2 Scenario 2: Three Vehicles on a Three-Lane Highway . . . . . . . . 131

7.4.3 Scenario 3: Four Vehicles on a Three-Lane Highway . . . . . . . . . 134

7.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8 Conclusion 139

8.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.1.1 AHS as a Social Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.1.2 Technical Feasibility of AHS . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

8.1.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

8.2 Results and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.3 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Bibliography 149

A.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

A.2 Acronyms and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Inaction-Penalty Scheme LI-P 167

C.1 The Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

C.2 Additional Simulation Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Vita 185

ix

List of Figures

Figure 2.2 Automated FLASH Car: Test Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 2.3 Driving Station. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 2.4 Test tracks and the control station in the FLASH Laboratory. . . . . . . . . . 22

Figure 2.5 Location of the Smart Road. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 2.6 Software System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Figure 2.7 Components of DYNAVIMTS simulation package . . . . . . . . . . . . . . . . . 27

Figure 2.8 Opening screen of DYNAVIMTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Figure 3.1 The automaton and the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Figure 4.1 Automata in a multi-teacher environment connected to the physical layers 48

Figure 4.2 Memory vector/buffer in regulation layer. . . . . . . . . . . . . . . . . . . . . . . . . . 49

Figure 4.3 Sensor ranges for an autonomous vehicle. . . . . . . . . . . . . . . . . . . . . . . . . 50

Figure 4.4 The definition of the mapping F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Figure 4.5 Positions of nine vehicles: gray colored vehicle is autonomous, black

colored vehicles are cruising at constant speed. . . . . . . . . . . . . . . . . . . . . 56

Figure 4.6 Automated vehicles lane position and speed. . . . . . . . . . . . . . . . . . . . . . . 57

Figure 4.7 An automated vehicle following a slow moving vehicle in lane 2. . . . . . . . 57

Figure 4.8 Headway distance and speed of an autonomous vehicle following another

slowly moving vehicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 4.9 Headway distance and speed of an autonomous vehicle following another

slowly moving vehicle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Figure 4.10 Initial and final positions, numbers, and speeds of ten automated vehicles

traveling on a 3-lane 500-meter circular highway [mpeg]. . . . . . . . . . . . . . 60

Figure 5.1 Extended definitions of front and side sensor ranges. . . . . . . . . . . . . . . . . 64

Figure 5.2 Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 10m, d2 = 20m, and

fsr = 30m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Figure 5.3 Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 10m, d2 = 15m, and

fsr = 20m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Figure 5.4 Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 12m, d2 = 15m, and

fsr = 18m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Figure 5.5 Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 14m, d2 = 15m, and

fsr = 16m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

x

Figure 5.6 Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 12m, d2 = 15m, and

fsr = 16m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Figure 5.7 Positions of five automated vehicles: gray colored vehicle attempts to

shift to the right lane by slowing down. . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Figure 5.8 Positions of five automated vehicles: gray colored vehicle attempts to

shift to the right lane by slowing down. . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Figure 5.9 Headway distances in the platoon; lane change occurs at t 8.5sec. . . . . . 71

Figure 5.10 Speed and space-time trajectory of five automated vehicles. . . . . . . . . . . . 72

Figure 5.11 The lane module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Figure 5.12 Pinch condition: two vehicles decides to shift to the same lane

simultaneously. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Figure 5.13 Memory vector/buffer: If an action fills half of the buffer, it is signaled. 74

Figure 5.14 Initial positions and trajectories of two vehicles. . . . . . . . . . . . . . . . . . . . . 75

Figure 5.15 Lane and speed of two vehicles from t = 0 to t = 4sec. . . . . . . . . . . . . . . . 75

Figure 5.16 The output of the pinch module and the actions in the memory buffer

versus time for vehicle 1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Figure 5.17 Initial and final situation, speed, and lane positions of three automated

vehicles traveling on a 3-lane 500-meter circular highway [mpeg]. . . . . . . 78

Figure 5.18 Snapshots, speed and lane positions of two automated vehicles traveling

on a 3-lane 500-meter circular highway. . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Figure 5.19 The new definition of the mapping F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Figure 5.20 Automata in a multi-teacher environment with new definitions of mapping

functions and flag structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Figure 5.21 Average platoon speeds for simulations 1-4. . . . . . . . . . . . . . . . . . . . . . . . 85

Figure 5.22 The distance between vehicles for simulations 1-4. . . . . . . . . . . . . . . . . . . 86

Figure 5.23 Initial and final positions, and speeds of automated vehicles in four

ten-vehicle platoons [mpeg]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 5.24 Simulations 5 and 6: average speed for a platoon of ten automated

vehicles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Figure 5.25 Simulations 5 and 6: distances between vehicles for a platoon of ten

automated vehicles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Figure 5.26 Initial and final positions, and speeds of automated vehicles in four

ten-vehicle platoons [mpeg]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 5.27 Simulations 7 and 8: mpeg movies of two simulations of 10-vehicle

platoons with lead vehicle decelerating to 75mph [mpeg]. . . . . . . . . . . . . 90

Figure 6.1 Number of steps needed for popt to reach 0.995 for different values of

learning parameters a and b. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Figure 6.2 For some parameter values, convergence is not obtained for all 500 runs. . 97

Figure 6.3 Number of steps needed for popt to reach 0.995 for different values of

learning parameters a and b. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Figure 6.4 Number of steps needed for popt to reach 0.995 for different values of

learning parameters a and b. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

xi

Figure 6.5 The region for asymptotic convergence to pure optimal strategy. . . . . . . . 106

Figure 6.6 Probabilities and cumulative average of the probabilities of the non-

optimal actions for a 3-action automata with LR P (a = 0.2, b = 0.1). . . . . 106

Figure 6.7 Sketch of the function H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Figure 7.1 Longitudinal automaton determines the lateral automatons environment. . 118

Figure 7.2 Multiple vehicles interacting through the physical environment. . . . . . . . . 121

Figure 7.3 Situations for two interacting vehicles and combined environment

responses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Figure 7.4 Changes in the physical (left) and automata environments (right): vehicle

1 shifts to the middle lane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Figure 7.5 Possible physical environment states for 2 vehicles in 3-lane highway. . . . 126

Figure 7.6 State transition diagram for two vehicles on a 3-lane highway; reflexive

transitions are not shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Figure 7.7 Scenario 1: Two vehicles with conflicting desired paths. . . . . . . . . . . . . . . 129

Figure 7.8 Possible chains for Scenario 1 (reflexive transitions are not shown; two

chains are distinguished with dashed and solid lines). . . . . . . . . . . . . . . . . 130

Figure 7.9 Possible penalty-reward structures to force physical environment to

switch to states B1 or C1 from current state A1. . . . . . . . . . . . . . . . . . . . 131

Figure 7.10 Scenario 2: Three vehicles with conflicting desired paths. . . . . . . . . . . . . . 132

Figure 7.11 A possible chain for Scenario 2: lane flag forces vehicles 1 and 2 to slow

down. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Figure 7.12 Three-vehicle transition diagram can be written as three separate two-

vehicle transition diagrams using the definitions in Figure 7.8. . . . . . . . . . 133

Figure 7.13 Three-vehicle transition diagram is equivalent to two separate two-vehicle

diagrams for the example in Section 5.3.1. . . . . . . . . . . . . . . . . . . . . . . . . 134

Figure 7.14 Scenario 3: Four vehicles with conflicting desired speeds. . . . . . . . . . . . . 135

Figure 7.15 Two possible solutions to a situation with 4 vehicles. . . . . . . . . . . . . . . . . 135

Figure 7.16 Speeds and positions of vehicles 1 and 2 for scenario 3 [mpeg]. . . . . . . . . 137

Figure B.1 Probabilities of five actions in the LI-P scheme; only c1 = 0. . . . . . . . . . . . . 169

Figure C.1 Structure of the simulation program. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

Figure C.2 The simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Figure C.3 Graphic User Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Figure C.4 GUI for data visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Figure C.5 Trajectory command for relative position plots. . . . . . . . . . . . . . . . . . . 176

Figure C.6 Scenario window: Clickable buttons initialize several different scenarios. . 176

Figure C.7 Timed snapshots of a simulation run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Figure D.1 All possible immediate neighborhood situations for two vehicles. . . . . . . . 180

Figure D.2 Combined states for two vehicles: states not shown are identical to those

given here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Figure D.3 Further simplified states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Figure D.4 All possible immediate neighborhood situations for three vehicles in a

three-lane highway. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

xii

Figure D.5 Combined states for three vehicles: states not shown are identical to those

indicated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Figure D.6 Possible states for four vehicles actually a two-vehicle situation. . . . . . 184

xiii

List of Tables

Table 4.1 Possible action pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 4.2 Output of the left/right sensor modules. . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 4.3 Output of the Front Sensor block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 4.4 Output of the Speed Sensor Module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 4.5 Action - Sensor Module matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 4.6 Possible longitudinal action-sensor output scenarios. . . . . . . . . . . . . . . . . 53

Table 4.7 Properties of the reinforcement schemes. . . . . . . . . . . . . . . . . . . . . . . . . . 55

Table 4.8 Parameter settings for simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Table 5.1 Output of the Front Sensor block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Table 5.2 Front sensor parameters for simulations. . . . . . . . . . . . . . . . . . . . . . . . . . 64

Table 5.3 Output of the left and right sensor modules. . . . . . . . . . . . . . . . . . . . . . . . 69

Table 5.4 Possible lateral action-sensor output combinations. . . . . . . . . . . . . . . . . . 80

Table 5.5 Parameter settings for simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Table 6.1 Behavior of the automata under general LR P scheme. . . . . . . . . . . . . . . . . 96

Table 6.2 Behavior of the automata under general LR-P scheme. . . . . . . . . . . . . . . . . 98

Table 6.3 Effect of number of actions on behavior of the automata using general

LR-P scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Table 6.4 Convergence rates for a single optimal action of a 3-action automaton

in a stationary environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Table 7.1 Assumed probabilities of penalty for each action based on the front and

side sensors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Table 7.2 Probability of sensing a vehicle in the sensor range. . . . . . . . . . . . . . . . . . 119

Table 7.3 Action-transition pairs for state A1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Table 7.4 Possible transitions for Scenario 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Table C.1 Description of the subroutines and functions for multiple lane, multiple 171

automata intelligent vehicle simulation. . . . . . . . . . . . . . . . . . . . . . . . . . .

Table C.2 Description of the situations and simulation files [mpeg] . . . . . . . . . . . . . . 178

xiv

Chapter 1

Introduction

1.1 Motivation

One of todays most serious social, economical, and environmental problems in the world is

traffic congestion. In the United States alone, the problem is costing over 100 billion dollars each

year. In addition, more than 40,000 persons are killed and another three million are injured (an

additional cost of $137 billion per year). This is compounded by the fact that it is becoming

increasingly difficult, for both financial and environmental reasons, to continue to build additional

highway capacity [Hayes94]. Therefore, the solution to the problem must lie in other

approaches, one of which is to make better use of the existing highway infrastructure. The

current infrastructure has begun to plateau in its ability to meet the operational requirements of

highway transportation. Consequently, congestion and safety risks are increasing. Previous

solutions of building or widening roads are not feasible anymore; the average cost of a mile of

urban freeway is $38 million. Furthermore, the tens of thousands of lives per year is a steep

price to pay for mobility [Hayes94].

New approaches discussed in the next section have the potential of increasing the safety,

reducing the congestion, and improving the driving conditions. Studies conducted at the beginning

of the intelligent transportation era show that it is possible to cut accidents by 18%, gas

emissions by 15%, and fuel consumption by 12% by employing new approaches. It is claimed

that these improvements will result in $10 return to the economy for each dollar invested in the

new transportation systems [Card93].

Considering the potential of the application of intelligent systems to surface

transportation problem, US Department of Transportation deems the investigation of new

technologies as crucial. Most of the smart highway technologies were first suggested in the

'60's, but the idea did not gain momentum until Intermodal Surface Transportation Efficiency

Act (ISTEA) passed by Congress in 1991. The technology needed to create an intelligent

transportation system is already available, although still expensive for full implementation.

Todays vehicles have at least four or five microprocessors monitoring and controlling such

things as ignition spark, fuel and emission controls, automatic transmission, cruise control, air

bags, and anti-lock brakes. Further development is underway on computerized controls for active

suspensions, traction control, all-wheel drive, and four-wheel steering. Few people realize, in

fact, that todays car has four times the computing power of the first Apollo moon rocket

[Card93].

Cem nsal Chapter 1. Introduction 2

The approach taken by the US Department of Transportation (DOT) is called Intelligent

Transportation Systems (ITS; previously, Intelligent Vehicle Highway System, IVHS). ITS is

actually a broad range of diverse technologies, including information processing, communications,

control, and electronics. DOTs IVHS Strategic Plan [Plan93] summarizes the approach, mainly

defining the use of intelligent vehicles and extensive communications between the vehicle and the

roadside infrastructure. This initial plan set the goals, milestones, and objectives of the ITS

program through 1997. The federal government is one of the many players in the implementation

of a nationally compatible system. Others are found among the private sector, professional

societies, consumer and industry groups, academia, and State and local governments.

Some of the direct goals and objectives of the ITS program are:

Significant reduction in the number of annual fatalities and injuries due to accidents,

Improvement in the safety of commercial vehicles and hazardous material movement,

Significant reduction in the cost associated with congestion,

Increase in the volume of people and goods that can be moved on existing highways,

Improvement of travel time predictability, and reduction in the level of stress associated

with travel,

Reduction in harmful vehicle emission, and in surface transport energy consumption per

vehicle-mile and per passenger-mile traveled.

While congestion is the primary problem in urban areas, rural areas suffer from a higher

traffic fatality rate due to road conditions and higher rates of speed. Increased margins of safety

provided by ITS technologies might reduce the number of accidents in both urban and rural areas.

Most importantly, communication capabilities could improve emergency response times, and

improve levels of rural public transportation services.

The established process of delivering the ITS program has four major components

covering the following issues:

of the program include (a) providing basic research tools and a knowledge base, (b)

creating a favorable environment and (c) defining potential opportunities for

applications, and (e) developing ITS applications.

Operational Tests are designed to evaluate applications of new technologies and

system concepts, facilitating the transition into operational use.

The Automated Highway System (AHS) is established to pursue the more technically

challenging, longer term goals of having a fully operational vehicle-highway system that

automates the driving process.

Deployment Support: Unlike major national systems in aviation, space, defense, and

other sectors, the DOT will not own and operate the national ITS.

Cem nsal Chapter 1. Introduction 3

The DOT has established a set of discrete milestones to be accomplished until the end of

the year 1997. Some of these milestones related to the work presented here are system

architecture, communication frequencies, collision avoidance, and automated highway systems.

A major long-term element of DOTs research and development is AHS. The deployment

of early phases of AHS will begin in the next ten years, but will continue beyond that period.

Current AHS program activities address the imminent user services involving safety advisory and

driver assistance products, e.g., in-vehicle warnings of impending danger, and emergency control

intervention. The ITS architecture must be structured to provide a sensible evolution to

accommodate these types of user services, as well as the eventual realization of total automation.

The 1997 demonstration requested by Congress is the AHS programs first major

milestone, and will assess proof of concept feasibility for the AHS. Preparation for

demonstration will include the identification and evaluation of feasible alternative conceptual

approaches. The demonstration will feature many feasible concepts. Subsystems and control,

including lateral and longitudinal control, transition between manual and automated control,

maintenance of position in the roadway traffic flow, lane changing, and various malfunction

management capabilities will be demonstrated all in a collision-free automated driving

environment.

The demonstration results, if successful, will lead to selection of the AHS conceptual

approach, documentation of the approach in performance specifications and standards, and

establishment of a partnership with industry to design, develop, and operationally test an AHS

with public participation. Underpinning the development of the first AHS demonstration is an

understanding of the human factor issues related to AHS driver operations. The DOT has begun a

research project to develop comprehensive driver performance guidelines for AHS. Beyond 1997,

deployment of a nationally compatible system will become the primary concern of DOT. It is

expected that the work will shift towards supporting deployment of ITS user services, and

continuing progress on long-term ITS goals, especially the AHS. Details of the AHS concept, and

major important factors are described in detail in Chapter 2.

Vehicle control is probably the most important part of the advanced AHS applications.

Implementation of AHS requires automatically controlled vehicles for the reasons listed in

Chapter 2. The aim is to increase safety by removing the driver from control of the vehicle, and

the throughput by automation of vehicle following, lane changing, and other maneuvers.

The past and present research on vehicle control emphasizes the importance of new

methodologies in order to obtain stable longitudinal and lateral control. Combined lateral and

longitudinal control needs to be implemented for full AHS applications. As indicated in Section

2.2, there is a large group of investigators working on vehicle control issues. However, being able

to control the dynamics of a vehicle does not necessarily mean that we have an AHS. What we

mean by intelligent vehicle control is the ability of an automated vehicle to make decisions on

its steering, and velocity commands, i.e., its ability to plan its path. In an environment where

Cem nsal Chapter 1. Introduction 4

there are multiple fast moving vehicles, making the right decision to avoid collisions and optimize

vehicle path is difficult. There are several different approaches currently considered in the current

AHS research. These and our position relative to current research are discussed next.

1.2 Context

1.2.1 Different Approaches to Intelligent Vehicle Control

There are two main approaches to the implementation of a fully automated highway system:

hierarchical structure and autonomous vehicle approaches. The first approach, defined in

[Varaiya93], centers around the notion of platooning (See Section 2.3). In this approach, there

are different layers of control hierarchy, each responsible for a different task required to

implement an automated highway system. The layers of the architecture, starting at the top, are:

network, link, planning, and regulation. The network layer assigns a route to each vehicle as it

enters the system. The link layer assigns each vehicle a path which balances traffic for all lanes

and assigns a target speed for each section of highway. The planning layer creates a plan which

attains the desired path. The regulation layer controls the vehicle trajectory so that it confirms to

this plan.

The tasks assigned to each layer differ from each other in three dimensions: time scale,

information span, and spatial impact of decisions. As we go up in the hierarchy, the frequency of

decisions decrease, more aggregated information is used, and the decisions affect vehicles over

larger distances. For example, the planning layer of the hierarchy makes decisions to create a plan

that evolves every minute by coordinating with neighboring vehicles. Path decisions affects only

the neighboring vehicles.

This approach assumes an information flow from the higher hierarchy and to/from other

vehicles for making decisions about the optimal path. The information available to an automated

vehicle has global characteristics since it is relayed from a higher layer of the hierarchy.

A simpler approach for reasons of implementation is the autonomous vehicle approach,

where each vehicle is treated as autonomous agents using local sensors (and maybe limited

communications with neighboring vehicles). It does not use the concept of closely separated

vehicles and aims to solve the problem in a shorter time interval, and to reach a fully automated

highway systems by an evolutionary implementation, i.e., by slowly increasing the capabilities

of autonomous vehicles over time.

In the first case, the control commands are relayed by higher layers in the hierarchy, and

the vehicles do not need to make intelligent decisions for path planning; the hierarchical structure

controls the vehicles. In the latter approach however, the vehicles need to make decisions based

on local information. Our learning planner can be visualized as part of the second approach, or a

backup system for the first one.

The first attempt to solve the problem of real-time decision making in an automated

highway environment dates back to 1992. Mourou and Fade described a planning method

applicable to agents with perception and decision-making capabilities and the ability to

communicate with other agents [Mourou92]. The agents, in this example, are assumed to

Cem nsal Chapter 1. Introduction 5

transmit their actions to their neighbors. This work emphasized the possibility of achieving a

more flexible, fast, and reactive system by multi-agent planning methods and execution

monitoring. However, this approach requires constant and complete data transfer between agents.

Recent research on intelligent vehicles includes an adaptive intelligent vehicle module used

in a simulation and design system for tactical driving algorithms [Sukthankar96a, 96b]. The

approach divides the driving task into three levels: strategic (route planning, goal determination),

tactical (maneuver selection for short term goals) and operational (translating maneuvers into

control operations). Intelligent modules are designed to answer the need for real-time tactical level

decisions. A reasoning system called SAPIENT combines the high-level goals with low-level

sensor constraints to control vehicles. The system uses a number of modules whose outputs are

combined using a voting scheme. The Population-Based Incremental Learning (PBIL) method is

used to adjust the large number of parameters defining the behavior of the modules. This

approach is basically a method for finding the suitable parameters for the module that are used to

fire lateral and longitudinal actions. The learning algorithm for the parameters is a combination of

evolutionary optimization and hill climbing, and it is very similar to learning automata

reinforcement schemes except for the use of a mutation factor for the probabilities of actions at

the end of each iteration.

The learning automata controller we describe here can be visualized as a combination of

intelligent modules at the tactical level. However, our approach differs from the above mentioned

research in the use of the learning algorithms. Instead of learning the parameters affecting the

firing of actions on repeated runs, learning automata learn which action to fire based on the local

sensor information. In other words, the learning is not at the design phase, but at the run phase.

The parameters defined in [Sukthankar96a] correspond to learning (Chapters 3, 4, and 6) and

sensor (Chapters 4 and 5) parameters of our controller, and can be used to adjust those. Learning

and sensor parameters defines the capabilities of an automated vehicle, and can also be used to

model different driver behaviors. For example, large learning parameters decrease the decision

time, and consequently result in a more agile vehicle path.

Another approach to intelligent vehicle controller for autonomous navigation uses a

decision-theoretic architecture with probabilistic networks [Forbes95]. The problem is modeled

as a partially observable Markov decision process, and the optimal action is a function of the

current belief state described as the joint distribution over all possible actual states of the world.

However, this work currently considers only combinations of dynamic decision networks with a

decision tree, i.e., if-then rules where each predicate is actually a complex set of probability

thresholds on specific variables, and it is similar to the previous work in the sense of firing

actions. Similarly, Niehaus and Stengel defined a rule-based navigation system that uses worst-

case decision making (WCDM) approach. Again, a stochastic model of the traffic situation based

on sensor measurements is assumed [Niehaus94].

Cem nsal Chapter 1. Introduction 6

The concept of the learning automaton is defined as a result of the work on modeling observed

behavior, on the choice of actions based on past observations, on implementation of optimal

strategies in the context of the two-armed bandit problem, and on the need for rational decisions

in random environments.

In classical control theory, the control of a process is based on complete knowledge of the

system. Later developments considered the uncertainties present in the system. However, all

those assumptions on uncertainties and/or input functions may be insufficient to successfully

control the system. It is then necessary to acquire additional information online since a priori

assumptions are not sufficient. One approach is to view these as problems in learning.

Learning can be defined as a change in behavior as a result of past experience. In a purely

mathematical context, the goal of a learning system is the optimization of a functional not known

explicitly [Narendra74]. A stochastic automaton attempts to find a solution of the problem

without any information on the optimal action (the control input to the environment/system).

One action is selected at random, the response from the environment is observed, action

probabilities are updated based on that response, and the procedure is repeated. A stochastic

automaton acting as described to improve its performance is called a learning automaton.

As seen clearly from the general description of a learning automaton, our approach differs

from previous and current efforts to intelligently control autonomous vehicles in the use of the

learning. Instead of learning the parameter/behaviors/firing rules for best actions to take for

achieving a safe and optimal path, our controller learns the action in real-time. In that sense, the

decision to fire an action is never taken at exactly the same time for similar conditions.

Furthermore, there are no prescribed conditions for actions. The parameters that define the

learning process of a stochastic automaton, as well as the sensor parameters defining decision

ranges, can be learned too using methods similar to ones described in Section 1.2.1.

The idea of defining a fixed structure to be utilized to find the optimal action has its

own appeal, since the performance of the system is deterministic in the sense that the best action

for a specific situation is known. However, even a good driver does not follow rules

deterministically. In this sense, the learning automata approach is able to capture the dynamics of

the driver behavior. On the other hand, a rule-based system, although known to perform well on

many situations, has the disadvantage of requiring extensive modifications, even for a minor

adjustment in the rules. Furthermore, such systems cannot handle unanticipated situations

[Cremer94].

We visualize two learning automata employing a reinforcement algorithm as the heart of

our path planner. Using local sensor data (and maybe limited communications with neighboring

agents), longitudinal and lateral automata learn the optimal action to be taken for a given situation.

Sensor information is processed in virtual teacher modules that evaluate each action in the light of

the current sensor status. Given a large enough time and sufficiently large learning parameters, the

automata indicate the best action to take and sends them to a lower layer which in turns fires

the action. The details of the concept are given in Chapters 4 and 5.

Cem nsal Chapter 1. Introduction 7

This application of learning automata is one of the very few applications to real-time

learning problems [Narendra89, Najim91b]. The difficulty in modeling the environment (highway,

the vehicles, and the drivers) makes learning automata a likely candidate for such an application,

since a learning system is guaranteed to adapt to existing conditions provided that the teachers

are designed carefully.

Chapter 1 described the overall problem of traffic congestion and safety, as well as ITS, currently

considered as a solution by DOT as well as other agencies throughout the world. The direct goals

and objectives of the ITS program, its advantages and the role of Automated Highway Systems

(AHS) to reach these goals as well as the importance of the intelligent vehicle control for AHS are

also emphasized in this chapter.

Chapter 2 introduces the AHS program in detail. The phases of the program and National

AHS Consortium are introduced to give a perspective on the current developments. Background

on the previous and current research on vehicle control, proposed control structures, and related

issues such as sensors and communication difficulties, and safety analysis are also given in

Chapter 2. The chapter concludes with the description of the evaluation and experimentation

framework designed at Virginia Tech Center for Transportation Research. Current small scale and

simulation studies for AHS are briefly described.

Chapter 3 is an introduction to learning automata and reinforcement learning. All the

necessary descriptions and ideas to understand our application of learning automata to intelligent

vehicle control are given. This chapter also lists past and present research on learning automata

and previous applications of learning automata.

Chapter 4 and 5 describe the use of learning automata as an intelligent decision maker for

vehicle path planning. The application of learning automata techniques to intelligent vehicle

control is introduced in Chapter 4. The basic automata-teacher model and abstraction of the

sensing modules are given; the idea of learning necessary actions for a vehicle to survive in a

dynamically changing environment is introduced. The idea differs from the previous and current

research by its on-line learning approach. The method described in Chapter 4 is found to be

capable of safely directing the autonomous vehicles. Simulations of learning model also indicated

that there is a need for additional sensor information as well as faster learning algorithms.

Result of Chapter 4 leads to the extension of the decision capabilities of an automated

vehicle by extending its capabilities sensing and communication capabilities and by defining a

more complex decision structure, given in Chapter 5. Our investigation of the control of

autonomous vehicles using local information determines the need for new and improved sensor

structure. Chapter 5 details the necessary additions for more intelligent vehicles in order to

optimize the overall behavior of the traffic, and provides solutions to problems generated by

intelligent vehicle interactions. The flag structure described in this work enables autonomous

vehicles to reach their immediate goals while optimizing the traffic flow in close neighborhood.

Cem nsal Chapter 1. Introduction 8

The overall design structure, effects of the several parameters on the decision, and several other

issues related to intelligent path planner conclude Chapter 5.

In Chapter 6, we introduce reinforcement schemes (learning algorithms) used for our

application of learning automata. The schemes are compared according to their desired update

characteristics. A new nonlinear scheme and a linear scheme previously not considered by

previous research efforts are found to be more advantageous than others. For the linear scheme,

proof of optimality for a specific case in our application to intelligent vehicle control is given.

Instead of general linear differential equation approach [Narendra89], stability theorems for

discrete-time systems are used for this purpose. The behavior of the new nonlinear reinforcement

scheme is also discussed in this chapter. It is found that the algorithm is absolutely expedient.

These two new schemes are the results of our attempt to find fast converging reinforcement

schemes.

Chapter 7 introduces the concept of interacting automata and the treatment of multiple

automata as game playing agents. The concept of the automata games is then extended to

multiple automata in multiple vehicles. An approach similar to automata games [Narendra89] is

taken for interacting vehicles as well. By defining the physical environment of vehicle as a

switching environment, it is possible to treat a highway situation as state transitions in a Markov

chain. Associating automata environments to each stationary physical environment, it is possible

to analyze and design physical environment changes as state transitions based on the reward-

penalty structure of the automata environment1. This approach led to the design of the flag

structure explained in detail in Chapter 5.

A brief discussion of technical feasibility and user acceptance of AHS, the conclusions

drawn from the work presented here, and our recommendations for future research as well as the

discussion of the problems we encountered in this research are listed in Chapter 8.

Notations and definitions of the variables, list of abbreviations and acronyms, additional

proof of convergence for a simple linear reinforcement scheme, a brief description of the

simulation program and diagrams of the states for highway scenarios mentioned in Chapter 7

are all given in the Appendices.

1

The distinction between automata and physical environments is discussed in Chapter 4.

Chapter 2

Automated highway Systems (AHS). The AHS program is a broad international effort to

provide the basis for, and transition to, the next major performance upgrade of the

vehicle/highway system through the use of automated vehicle control technology [NAHSC96].

The detailed definition of the Automated Highway System is as follows [Plan93]:

The term fully automated intelligent vehicle-highway system is interpreted to

mean a system that:

w Evolves from todays roads (beginning in selected corridors);

w Provides fully automated hands-off operation at better levels of performance

than todays roadways in terms of safety, efficiency, and operator comfort; and,

w Allows equipped vehicles to operate in both urban and rural areas on highways

that are both instrumented, and not instrumented.

The consensus in the AHS community is that AHS will evolve over a series of smaller

steps in technology. The final step of full automation will not be a leap, but a logical consequence

of previous development and deployment efforts. Each step in the technology will have its own

benefits and be self-sustaining. Vehicle and infrastructure evolutions will be synchronous

[James94]. We will briefly mention the steps of this evolution here before introducing the AHS

program and discussing automatic vehicle control technologies in detail.

When the cruise control was first developed, there was much concern over the safety and

user acceptance of the new system; however, it has become widely accepted and used. In the near

future, obstacle and headway warning and Automatic Vehicle Identification (AVI) will be added

to modern cruise control and existing communications infrastructure. The success of AHS

depends on linking the power of cellular communications and the emerging range of high-

performance computers to the ongoing vehicle based developments. Ideally, the highway system

can be divided into a number of cells which contain local radio receivers or beacons that will be

linked together through a fiber-optic network. Vehicles will also be equipped with a transceiver

unit carrying several user services. The first applications of this technology are the Automatic

Vehicle Identification (AVI) and Electronic Toll Collection (ETC). Obstacle and headway

warning is the next step in AHS development in vehicles. Vehicle on-board radar (VORAD)

systems in many commercial vehicles are already in use for the last two years. An important

issue in warning systems is the capabilities of the sensor modules. Differentiating between a large

Cem nsal Chapter 2. Automated Highway Systems 10

vehicle and a small animal may not be possible using a simple system. A consequent application

of the headway warning system is the automatic headway control. Adaptive cruise control

systems are currently designed by many automobile manufacturers.

The market introduction of the first vehicle with adaptive cruise control is expected in

1997. This will enable the drivers to hold their desired speed as well as the desired headway

distance. Although the drivers defined as creepers will be cut-of by more aggressive drivers

(hunters), the ability to set the desired headway may be desirable to many users. Also, the

issues such as sensor types, curve handling, merging vehicles, changing lanes, integration of

steering and braking all have to be addressed to obtain a complete system design.

Applications in advanced traffic management, traveler information and public

transportation systems (ATMS, ATIS, APTS) will require more sophisticated vehicle location

capabilities. In addition, the number of uses for vehicle-to-roadside communications will

eventually increase. MAYDAY services, fleet tracking and automatic vehicle location (AVL)

applications will use radio-location beacons as well as more sophisticated transceivers. As a

result of AVL and AVI, processing real-time information on vehicle locations will be possible.

Although the number of vehicles equipped with AVI/AVL technologies will initially be small,

traffic management centers can effectively use a small percentage of vehicles as probes.

Roadside-to-vehicle and vehicle-to-vehicle communications are also important for the

future of AHS. Automatic braking systems may be activated by decelerating vehicles in front, or

by the infrastructure sending a deceleration request to the headway control system. The vehicle

must be very sure about the imminent danger, and knowledge of following vehicles and their

speeds is an important factor to be considered. Inter-vehicle communications and rear sensing

both would help in automatic braking.

Evolution of the AHS system will continue with lane departure warning. It will be the

first system to control lateral movement of vehicles. The lane holding feature will consequently

be added to the adaptive cruise control, shortly after the lane departure warning feature.

Automatic lane holding will provide a "hands off/feet off" driving situation where the driver is

still responsible for all command decisions in the vehicle and must be aware at all times of his

surroundings. If the infrastructure knows the location of each vehicle, possesses the information

about its current path, and is communicating with the vehicle, then the lateral control can be

coordinated from the infrastructure.

Further advances in technology will force the driver to lose his control of the vehicle. In

order to gain any additional benefit of safety and efficiency1, the driver must be removed as the

primary source of command and control. Of course, this change requires that the automated

system perform better than a good driver, i.e., no more than 1 critical error in 108 seconds be

made. This step will be the natural consequence of the previous progress. Obviously, not all

vehicles will be equipped with this technology right away. Automated and manually driven

vehicles have to coexist for some time.

1

With the driver in control, the highway capacity can only be increased to 15% above the maximum value of 40

vehicle/lane/min. Furthermore, 90% of highway accidents are results of incorrect driver decisions.

Cem nsal Chapter 2. Automated Highway Systems 11

A vehicle that can predict the actions of neighboring vehicles is an important step for

safer highway transportation. Locating the position of all the vehicles in close proximity to the

automated vehicle with high accuracy is essential. This can be accomplished through multi-sensor

systems for adjacent vehicles and possibly inter-vehicle communications to give an idea of what

to expect beyond adjacent vehicles. Alternatively, the roadside control may have knowledge of

the positions of the vehicles relative to fixed reference points. This knowledge is obtained by

either vehicle based or roadside based detection, and/or by communicating with the vehicle.

This technology requires extreme accuracy in vehicle location at all times. If the system is

infrastructure-based, the infrastructure needs to know the locations of the non-automated

vehicles, for safe and efficient implementation. The minimum update rate of information must be

larger than 100 times per second with an accuracy less than 10 cm for the desired level of safety

[James 94]. Automated vehicle control (AVC) systems are expected to boost the capacity by

50% even for mixed vehicle traffic. Once the system has knowledge of the surrounding

environment to all extents, it can make decisions on merging and passing in addition to the

headway control and lane keeping performed under driver control. Full system optimization and

higher efficiencies can then be obtained as the percentage of automated vehicles on the road

increases.

Highways contain many characteristics that simplify the problem of automation, such as

uninterrupted traffic flow, controlled access. Therefore automation on arterials will lag

significantly behind automated highways. However, many safety measures can be taken on

arterials using the equipment already designed for the highway. For example, the problem of

intersection collision can be reduced by activating the onboard warning systems and automatic

braking systems with electronic signal lights in addition to the normal traffic signal. If the

intersection detects a potential for a collision it can notify equipped vehicles. Problems to be

encountered during AHS deployment on arterials include integration of cyclists and motorcyclists

to the AVC system, and the effects of pedestrian and animal traffic.

The final step in the AHS future is fully automatic control, wherein the driver will have

no control over the vehicle. All trip decisions will be automatically made using AVL and ATIS

information. The driver may be able to include additional criteria for route selection. Once the trip

decision is made, the infrastructure, utilizing AVC, will guide the vehicle while constantly

updating the routing strategy based on the current information obtained through Advanced

Traffic Management System (ATMS).

Systems Consortium

The AHS program in United States is planned around three broad phases: Analysis

(1993-96), System Definition (1994-2001), and Operational Tests and Evaluation (starting in

2001). The National Automated Highway System consortium (NAHSC) is responsible for

conducting the second phase.

Cem nsal Chapter 2. Automated Highway Systems 12

Systems Analyses (PSA) [PSA94] by 15 contractor teams that addressed automated vehicle

control requirements and issues in 16 topic areas, a human factors study effort to develop an

AHS human factors design handbook, and National Highway Traffic Safety Administration

(NHTSA) analyses to investigate other ITS automated vehicle control-based services that avoid

collisions through warning and control. The PSA identified issues and risks associated with

various AHS concepts and design areas. All contract teams submitted final reports in November

1994. The NAHSC is actively using these findings in their research.

The Systems Definition Phase is currently underway. The NAHSC is working in

partnership with the federal government. The consortium includes representatives from the

vehicle industry, highway industry, State and local governments, regional and metropolitan

transportation agencies, and electronics/communications industries associated with the vehicle

and communications market.

The milestones of the consortium program are as follows: (a) establishment of

performance and design objectives, (b) a 1997 proof-of-technical-feasibility demonstration, (c)

identification and description of multiple feasible AHS system concepts, (d) selection of the

preferred AHS system configuration, (e) completion of prototype testing, and (f) completion of

system and supporting documentation.

The Operational Test and Evaluation Phase is currently not funded. It would logically

follow a successful completion of the Systems Definition Phase.

The National Automated Highway System Consortium (NAHSC) was formed to

specify, develop, and demonstrate a prototype Automated Highway System (AHS)

[NAHSC96]. The specification provides for an evolutionary deployment that can be tailored to

meet todays transportation needs. The Consortium will seek opportunities for early

introduction of vehicles and highway automation technologies to achieve benefits for all surface

transportation users, while incorporating public and private stakeholder views to ensure that an

AHS is economically, technically, and socially viable.

The consortiums proof-of-technical-feasibility demonstration is one of the major

milestones of the AHS program, and is scheduled for October 1997. This demonstration will be a

full-scale exhibition with multiple vehicles on a segment of the I-15 interstate highway near San

Diego, integrating the technological achievements of the participants of the consortium and the

transportation industry. The demonstration will focus on existing technologies and concepts that

can be integrated quickly to provide a solid proof of feasibility.

The selection of the final AHS system configuration will be made during 1999, and the

final decision on the preferred AHS configurations will be made in February 2000. Immediately

after that date, the design, development and testing of the prototype system will begin. The

design process will be finished before the end of year 2000; the development will conclude in

2001, and the prototype systems testing will be completed in August 2002.

The detailed description of the national automated highway program, information on

NAHSC and its participants, and the detailed schedule of AHS tasks and subtasks can be found

Cem nsal Chapter 2. Automated Highway Systems 13

at [NASHC96]. The information about the past and future of the AHS program can also be found

at [AHS96].

Outside of the US, similar AHS programs are being conducted in Japan [ITSJ96], and in

Europe [Braess95]. Almost all automotive companies in Japan have automated vehicles in the

design and/or testing phase [Construction96]. Similar activities are reported in the

PROMETEUS program in Europe [Rault95].

Vehicle control is probably the most important part of the advanced AHS applications.

Implementation of AHS necessitates automatically controlled vehicles as mentioned previously.

Achieving the optimal solution to congestion and safety problems requires extensive research in

system modeling, lateral (steering) controls and longitudinal (speed and headway) controls. In a

fully automated highway system, these control systems will rely on vehicle-to-vehicle

communication, as information on velocity and acceleration of other vehicles will be utilized in

individual vehicle controllers [Varaiya93]. The same information and much more (e.g., desired

speed and lane) may also be received via vehicle-to-roadside communications. Here, we will

briefly discuss the previous research on lateral, longitudinal and combined lateral and longitudinal

control of vehicles.

Hessburg and Tomizuka [Hessburg91] designed a fuzzy rule-based controller for lateral guidance

of a vehicle. This system is based on human-type reasoning. Advantages of such a controller

include flexibility in the choices of input/outputs, and on-line/off-line training capability. Their

focus was achieving good tracking for a variety of roadway curves over a range of longitudinal

vehicle speeds. Simulations to demonstrate its performance under parameter variations and

external disturbances gave satisfactory results.

An alternative approach is presented in [Lubin92]. It concentrates the intelligence in the

vehicle, using the visual sensing approach described in [Kornhauser91]. In this model, no

infrastructure modification is needed, but considerable cost and complexity is added to each

individual vehicle. With the current rate of technology improvement, this system may become

feasible for production purposes. During the last five years, the research on lateral vehicle control

and lane changing maneuvers was extensive. For a (non comprehensive) list of publications on the

subject, see [PathDb96].

Besides the theoretical modeling and simulations for lateral control of vehicles, there are a

few important experimental accomplishments: the use of magnetic markers, and the use of visual

information for lateral position handling. The first method was designed by the PATH Program

[PATH96] and employs magnetic markers imbedded into the road to detect the lateral

displacement from the center of the lane [Tan96]. Current tests with a vehicle equipped with

magnetic sensors on its front bumper are reported to be successful [Lee95]. The second

application for lateral control uses visual data and on-board computing resources to obtain the

Cem nsal Chapter 2. Automated Highway Systems 14

steering command, and is designed by another NASHC participant. In order to locate the road

ahead, the rapidly adapting lateral position handler (RALPH) uses a template-based matching

technique to find parallel image features such as lane markings or tire and oil markings. During the

experiment called No Hands Across America, the test bed vehicle equipped with the RALPH

system drove 98% of the 2850 mile journey autonomously. An average speed of 63mph in

conditions that included bright sunlight, dusk, rain and nighttime, and a maximum stretch of 69-

miles autonomous driving are reported [Pomerlau96]. A third application for lateral control

consists of a vision-based system with a neural network learning from a driver. Performance

levels comparable to the human driver are reported in [Moon96]. zgner et al. reported

successful results for lane following using a vision sensor [zgner96].

Longitudinal control is an important aspect of the future AHS. One of the major concepts in this

area is platooning, which is a formation of traveling vehicles that maintain close spacing at

highway speeds. The concept requires inter-vehicle communication links to provide velocity and

possibly acceleration information from the lead vehicle to each of the following vehicles, as well

as the velocity and acceleration of the preceding vehicle in the platoon. Sheikholeslam and Desoer

[Sheikholeslam89] showed that inter-vehicle communications increases the stability of the

platoon formation in the case of identical vehicle platoons.

In the case of a platoon of non-identical vehicles, the situation is more complex. Frank,

Liu, and Liang [Frank89] explicitly considered the case of non-identical vehicles. The control

scheme presented combines three nested control loops for speed regulation, spacing control, and

speed synchronization. They also concluded that (a) the platoon size must be limited to

approximately 15 vehicles, (b) nonlinearities significantly affect the response characteristic of the

platoon, and (c) emergency situations need further investigation before proper sensor

specifications can be set.

It has also been shown that communicating the lead vehicles information to other vehicles

is not a requirement if we can tolerate degradation in the performance. This degradation is said to

be not catastrophic [Sheikholeslam91].

Recent research on longitudinal control includes vehicle follower control design for heavy-

duty vehicles [Yanakiev96], adaptive control of a nonlinear platoon model [Spooner96],

automatic braking systems and their effects on capacity [Hedrick96], advanced control

techniques [Kachroo95b], and adaptive traction control [Lee96]. Again, A relatively

comprehensive list of publications on longitudinal vehicle control can be found at [PathDb96].

Experimental results of longitudinal vehicle control include a platoon of four vehicles

traveling at 55mph with a headway distances under 50 cm [Hedrick96]. Again, lead vehicles

information is transmitted to following vehicles in order to achieve string stability.

Cem nsal Chapter 2. Automated Highway Systems 15

Although much of the research to date has focused primarily on either lateral or longitudinal

control, an overall automated driving system combining both lateral and longitudinal control is

vital for future automated highway systems.

System models which incorporate longitudinal and lateral dynamics are very rare.

Kachroo and Tomizuka [Kachroo95c] studied combined longitudinal and lateral control to

investigate the resulting behavior of the coupled system. It is shown that longitudinal controllers

that directly control the wheel slip are inherently more stable, especially during lateral maneuvers

on very slippery road conditions. Spooner and Passino [Spooner95] also developed sliding mode

controllers for longitudinal and lateral control. Their fault tolerant algorithms were found to be

stable for a variety of faults such as braking, powertrain, and steering systems. Yu and Sideris

[Yu95] considered combined control using partial state-measurements of longitudinal and lateral

deviations, longitudinal velocity and yaw rate. The research on combined control of vehicles is

moving toward more realistic systems. New control approaches for more platoon operations in

more complex situations such as entry and exit maneuvers are being studied [Yang96].

The PATH program investigates the use of machine vision for guiding lane change

maneuvers [Malik95]. The vision system is modularly interfaced with the existing magnetic

sensor system for lateral position measurements, and with active range sensors. zgner

[zgner95] also described a vehicle-roadway system in which the control of vehicle movement

is based on instrumentation located both in the vehicle and the roadway. A radar based system is

used for both cruise control, and for providing position information in lateral maneuvers.

Combined lateral and longitudinal control experiments are yet to be designed and

implemented; the 1997 AHS Demonstration will be a good occasion for combined control tests.

Varaiya introduced a structure for designing ITS functions and their relation to driver decisions

[Varaiya93]. The focus of AHS applications is mainly on the in-trip phase of the ITS activities.

An automated vehicle has to (a) choose its route to reduce travel time, (b) plan its path to ensure

a smooth traffic flow, (c) maneuver in coordination with other vehicles, and (d) regulate the

proper spacing and steering to increase traffic flow in a safe manner.

An automated highway that leaves the driver in the control of the vehicle can only achieve

a capacity of 15% above the maximum value of 40 vehicle/lane/min2. Driver behavior is the

capacity bottleneck in such a system. Furthermore, 90% of highway accidents are results of

incorrect driver decisions [Varaiya93]. Therefore, an automated system might increase both the

capacity and the safety.

In order to increase the capacity of the existing highways, the California PATH program

suggests organizing the traffic in platoons. A platoon size of 15 with intra-platoon spacing of 2m,

inter-platoon spacing of 60m, and a steady-state speed of 72km/h will increase the capacity to

105 vehicle/lane/min, which is much larger than maximum empirically observed values. Decreasing

2

This value is suggested by empirical studies.

Cem nsal Chapter 2. Automated Highway Systems 16

the distance between vehicles to 0.20m will change this number to 130 vehicles/lane/min, although

it may not be feasible for a heterogeneous platoon of vehicles. For intra-platoon distances

discussed here, it is impossible for a driver, who has a reaction delay of 0.25-1.20 sec, to

guarantee adequate safety.

According to Varaiya, the tasks for an automated vehicle-highway system to accomplish

can be achieved by a four-layer hierarchical control architecture [Varaiya93]. The layers of the

architecture, from the top, are network, link, planning, and regulation. The network layer assigns

a route to each vehicle as it enters the system. The link layer assigns each vehicle a path which

balances traffic for all lanes, and assigns a target speed for each section of highway. This layer

may also assign platoon size. The planning layer creates a plan which approximates the desired

path. The regulation layer controls the vehicle trajectory so that it conforms to this plan. Below

the regulation layer, a physical layer that provides sensor data and responds to actuator signals,

is assumed.

Different choices of partition of control authority between the infrastructure and the

vehicle are compatible with this architecture. The regulation and planning layer controllers are on-

board vehicles, the link and network layers are on the roadside. It is important to find a design

that combines the advantages of a fully centralized control system and an architecture based on

autonomous vehicles and local sensor information.

In this thesis, we introduce an intelligent controller which can be seen as the planning

layer of an autonomous vehicle. The planning layer, as defined in [Varaiya93], has three tasks:

Coordination of maneuvers with the planning layers of neighboring vehicles,

Supervising the regulation layer in the execution of a trajectory corresponding to the

chosen maneuver.

In our application, where the scenario includes non-automated vehicles and minimal

communications, the coordination of maneuvers between planning layers does not exist. The

communication between planning and regulation layers is fairly simple: the planning layer sends a

command, and the regulation layer returns a reply once it successfully completes the command.

A richer interface may be required: the planning layer could pass multiple parameters to the

regulation layer, which could then return parameters indicating the success or errors and

exceptions. The theory of control of such a system is not yet developed. There is a need for

research as to how the regulation layer should switch from one control law to another. Lasky and

Revani state that this represents an open research issue whose solution maybe vital to the

implementation of a full AHS [Lasky93].

The approach above is one of the many different approaches to AHS3. Some of these

concepts are based on cooperative architectures [McKendree96], maximum adaptability

3

We emphasized this architecture since it is used in DYNAVIMTS, the Virginia Tech Center for Transportation

Centers simulation program (See Section 2.6).

Cem nsal Chapter 2. Automated Highway Systems 17

[Godbole96b].

Besides the automatic vehicle control, there are several important issues that need to be carefully

considered for a successful implementation of an automated highway system. During the first

few years of the AHS research efforts, the problems related to the issues given here were not

investigated as much as vehicle control problems. However, as the AHS related research

progressed, it expanded to the areas of sensing and communications, fault tolerance, and human

factors. In this section, we will emphasize related research efforts on these areas.

The realization of full AHS needs hardware both in infrastructure and the vehicle. Roadside

monitors will measure traffic flow and speed, and vehicle paths will be calculated based on this

information. Such measurements are currently made with loop detectors, ultrasonic sensors, AVI

tags or vision systems. Information may be communicated by infrared beacons, broadcast and

cellular radio, or using emerging ultra wideband technologies [James96]. The vehicles need a

longitudinal sensor to measure distance and relative speed of the preceding vehicle. Such sensors

may be based on radar, ultrasound, or vision [zgner95, Hedrick96, Pomerlau96]. Microwave

radar sensors perform very well in fog and heavy rain, but they are very expensive. Laser radar

systems are low-cost, but cannot handle low visibility conditions [Yanagisawa92].

To facilitate lane changes at a range of relative speeds, the vehicle must be equipped with

sensors that locate vehicles on the side with a longitudinal range of about 30m. Infrared and laser

range finding techniques may prove to be useful in this area.

Besides headway and side sensor information, longitudinal and lateral velocity and

acceleration, yaw rate, front steering angle, and lateral deviation data is needed to obtain a robust

combined lateral and longitudinal control. All of these except the last one can be obtained using

on-board accelerometers and encoders. For vehicle position sensing, there are two alternatives:

magnetic markers [Lee95], and vision systems [Pomerlau96]. Recent research done on vision

systems showed significant promise, however these systems are more expensive than magnetic

markers which, in turn, require infrastructure deployment as well as on-board sensors.

A sequence of single magnetic markers can also form a word that transfers information

such as curvature, and number of lanes. However, this magnetic marker data contains only the

static information (roadway characteristics), not dynamic information (such as information on

other vehicles incidents), unlike the vision system.

Roadside-vehicle communications is also a critical aspect of AHS. Vehicles need to be

identified, speed must be communicated to vehicles, and actions need to be coordinated for a fully

automated system. In addition, there is a need for vehicle-to-vehicle communications because of

the designed longitudinal control methods and coordination issues. Precise control can be obtained

using full duplex communication. Networking represents a higher level of communications, in

Cem nsal Chapter 2. Automated Highway Systems 18

which traffic and hazard information detected by one group of vehicles can be communicated to

other vehicles. There are a variety of methods for roadside-vehicle communications; most of them

are discussed in [Field92]. Recent research on communications includes interference studies for

vehicle transceivers [Gavan96], design of communication protocols [Godbole96], WaveLAN

radio [Chen96], and network architectures and protocols for vehicle-to-vehicle communications

[Bolla96, Fuji96].

Although the solutions to most of the technical problems in vehicle control, traffic management,

information systems, and communications have been found, the envisioned AHS will never be

deployed unless the safety of the overall system can be verified. One issue which is often

overlooked by researchers is the possibility of undesirable interaction of the systems. An

example in [Safety92] mentions two devices which try to maintain vehicles at constant lateral

spacing using side range sensing. If the devices (and the vehicles) have different dynamics, it is

possible that one or both vehicles may become unstable, possibly resulting in a collision.

Furakawa [Furakawa92] approaches the automobile safety question from an interesting

perspective: the instinctive ability of humans (and animals) to sense and avoid dangers, an ability

which is impaired when driving a car. Furakawa notes that building this instinctive function into

automobiles as a form of intelligent control technology is a good approach to improve safety.

Current research on safety and fault tolerance includes lane crossing studies [Lin96],

sensor validation [Agogino96,Singer95], fault tolerant design for AHS [Lygeros96], emergency

maneuvers for automated vehicles [Shiller95], and design constraints on intelligent vehicle

controllers [Puri95].

In an advanced system such as AHS, the driver will be confronted with significantly more

information, and possibly more controls, than are currently used in vehicles. In a system that

uses complete automation during some segments of a trip, the safe transition from automated

driving to manual driving is a difficult issue. Also, there exist several advanced vehicle control

systems (AVCS) related problems such as the acceptance of a system which takes the control

away from drivers, privacy issues related to AVI and AVL systems, and platooning

claustrophobia. For users to accept the AHS, a successful AVC system must address important

issues such as dealing with false alarms and system failures to gain acceptance and public

confidence, displays and warnings with the right amount and content of information, and driver

skills and attentiveness for smooth automated-manual switching [Chira92].

Human factors assessment of the drivers interfaces of existing collision avoidance systems

is currently under investigation [Mazzae96]. On the other hand, although AHS deployment has

not started yet, simulator tests are conducted to see how drivers behave when they are in the lead

vehicle in a string of vehicles and another vehicle enters the automated lane ahead of them

[Bloomfield96]. Also, the effect of different AHS automation degrees (manual, semi-automated,

Cem nsal Chapter 2. Automated Highway Systems 19

fully automated) on driver behaviors while entering the automated lane are tested using a driving

simulator [Bloomfield96b]. Levitan presented a comprehensive report on human factors

considerations related to the design of AHS. The report is intended as a guideline for effective,

efficient, safe and user-acceptable AHS design [Levitan96].

According to Center for Transportation Researchs approach, the progression in AHS

development from conceptualization to implementation has four steps, [Kachroo95]. The

intelligent controller described in this work is part of the four-block structure visualized at the

center. As seen in Figure 2.1, the first block is computer software simulation (see Section 2.6 for

a detailed description) that is preceded by mathematical modeling. The second block consists of

conducting small scale experiments in the Flexible Low-cost Automated Scaled Highway

(FLASH) Laboratory. Then, hardware tests comprising the third block are performed on a test

site with actual vehicles. The Smart Road being built near Blacksburg, Virginia will be a suitable

test bed for conducting such experiments using actual vehicles and controlled traffic conditions.

The fourth block is the deployment of AHS on conventional highways.

Feedforward

Input Conclusion

Feedback Loops

Figure 2.1. Four-block evaluation and experimentation framework for AHS.

These four blocks can be considered as the building blocks of a comprehensive testing and

evaluation methodology for AHS. The input can be a hypothesis, a model, or technologies. The

evaluation and testing procedure defined by this methodology is not seen as a single feedthrough

four-step process, but as having some feedback and feedforward loops depending on the results

obtained at each block. These loops represent the changes made to the hypothesis, model or the

technological concept. Hardware tests are important since they provide the means to validate

computer results or to modify them in the case of discrepancies, due to unmodeled or

inadequately modeled dynamics. Without hardware testing, it would be foolhardy to jump into

actual implementation. For instance, the FLASH Laboratory could be used to improve the

computer simulation via scale model tests before starting the tests with full scale vehicles.

Cem nsal Chapter 2. Automated Highway Systems 20

Although testing with real-sized vehicles will provide the most accurate results, it has

significant cost, safety and liability considerations. In order to test highway traffic and similar

situations, there is a need for a large number of automated and non-automated vehicles. This is

very expensive, and due to a testing situation involving humans, the costs can become prohibitive

and the situations dangerous, which could lead to insurance and safety problems. Also, full-size

testing of infrastructure based systems which require complex communication scenarios may be

expensive due to (roadside and in-vehicle) installation costs. The four-block structure is designed

to overcome these problems. A comparison of the design/development stages for different

characteristics is given in Table 2.1.

Agents Vehicle models Scaled vehicles Real vehicles Real vehicles

Tasks Experimentation Experimentation Experimentation

& & & Implementation

Demonstration Demonstration Demonstration

Cost Low Moderate High Very High

Development Short Short to Long Very Long

Time Moderate

Table 2.1. Comparison of different stages.

The Flexible Low-cost Automated Scaled Highway (FLASH) Laboratory is visualized a

precursor to a full-scale highway [Kachroo95]. It will provide a platform for experimental work

as well as an arena for demonstration of AHS systems. In order to test highway situations with

complex realistic scenarios of merging, splitting, exit, and entrance, small scale vehicles (1:10 to

1:15 scale) are designed in a modular fashion. Modularity will guarantee inexpensive and fast

incorporation of different system configurations. The FLASH laboratory will be composed of

tens of small vehicles (of different sizes) with a flexible highway system and communication

network. The FLASH laboratory concept makes the construction of specifically designed scaled

highway configurations possible in a short period of time, and at much less cost than it would be

required for the construction of a similar full scale test bed.

The use of small low-cost vehicles with known mathematical models would be beneficial

for researchers seeking to verify their computer simulations quickly and economically. The use of

standard vehicles eliminates some of the uncertainty due to particular vehicle models that make

comparisons of different AHS approaches difficult. The laboratory will provide researchers with

the vehicle as well as its verified mathematical model. This capability will enhance the results

from computer simulations.

The FLASH laboratory has significance not only to all the phases of AHS development

programs, but also to the improvement and maintenance of the actual automated highways. The

laboratory will be extremely useful in the analysis phase of the AHS development. The

Cem nsal Chapter 2. Automated Highway Systems 21

laboratory and its vehicles may serve as a benchmark through which various configurations,

architectures, and technologies can be compared. Evaluation of alternative AHS system concepts,

selection of a system approach based on analysis, test and evaluation, and demonstration of

prototype configurations on a smaller and cheaper scale are the foci of the laboratory.

Furthermore, the laboratory can be used to obtain preliminary data for the operation evaluation

(third) phase of the AHS program, and be a test bed for several issues which cannot be tested on

selected locations because of considerations of insurance and safety.

Currently, the laboratory is equipped with multiple small scale vehicles with different

types of tires and shock mechanisms. The vehicles have infrared and magnetic sensors, and

wireless video camera for lateral displacement measurements, ultrasonic sensors for headway

sensing, encoders for speed measurements. Sensing and control is provided by GCB-11

microcontroller. Wireless radio modems provide full duplex serial communications (Figure 2.2.)

Remote control station includes IBM PC computers equipped with frame grabber DSP board, a

micro-kernel operating system, and a data acquisition board for lateral feedback control via image

processing, as well as a steering wheel, control pedal and monitor for manual control (Figure 2.3).

Single- and double-lane small scale modular tracks with inclines and multiple radius left and right

turns are designed using EPDM rubber membrane (Figure 2.4). The laboratory also has an

experimental testing platform designed for traction control and vehicle plant modeling tests

[Kachroo96, Schlegel96].

ultrasonic

sensor

Wireless

modem

Infra-red sensors

Battery

GCB11 microcontroller

board (under the modem)

Cem nsal Chapter 2. Automated Highway Systems 22

Figure 2.4. Test tracks and the control station in the FLASH Laboratory.

Cem nsal Chapter 2. Automated Highway Systems 23

When completed, the Smart Road [Smart94] will be a 6-mile roadway between Blacksburg and

I-81 interstate highway in southwest Virginia, linking the Roanoke and New River Valleys. The

first 2 miles is designated as a controlled test facility (Figure 2.5). It will be the first of its kind to

be built from the ground up with an ITS infrastructure incorporated into the roadway. This full-

scale test bed will provide controlled testing of a variety of ITS technologies and concepts with

varied terrain and environmental conditions. Research involving crash avoidance, driver behavior,

vehicle dynamics, sensors, and automated vehicle control will take place under a broad range of

carefully controlled testing conditions.

For the roadway, an advanced communication system consisting of a local area, wireless

network interfaced with a complete fiber optic backbone is planned. Thus, the road is designed to

have the capability to service a multitude of sensors and messaging equipment concurrently. The

test bed design also incorporates the ability to simulate a diverse range of weather conditions,

from mild to severe (fog, rain, and snow). In addition, the all-weather testing area will overlap the

variable lighting area to provide simulation capabilities for night driving conditions. To facilitate

continuous testing, the road will be equipped with turnarounds at both ends. Overhead structures

are designed to accommodate variable signing as well as overhead sensors. Two safety zones with

50-foot clearance areas and additional crash protection features for prototype testing of advanced

in-vehicle systems are also planned.

Cem nsal Chapter 2. Automated Highway Systems 24

During the last few years, many computer programs for microscopic and macroscopic level

simulations of vehicles in an automated system have surfaced. Paramics (Parallel Microscopic

Simulation) is currently used to find the reasons for large-scale congestions in the Controlled

Motorway pilot project on the M25 highway in England, by dividing the the network into a

number of regions [Duncan96, Cameron96]. DYNAVIS is another dynamic visualization package

specifically designed for evaluating automatic control of vehicles [Xu96]. This package is part of

a bigger effort by the California PATH program, SmartPath [Eskafi95]. SmartPath is a simulation

package for an automated highway system, designed to provide a framework for simulation and

evaluation of AHS alternatives. It is again a micro-simulator, i.e., the functional elements and the

behavior of each vehicle are individually modeled. SmartPath has two separate simulation and

animation modules.

An example of a macroscopic simulation is KRONOS developed at the University of

Minnesota [Kwon95]. It is a PC based simulation using a simple continuum modeling approach

to evaluate the flow behaviors such as merging and diverging. Argonne National Laboratorys

prototype simulation includes the modeling of automated vehicles with in-vehicle navigation

units and a Traffic Management Center (TMC) using distributed computer systems or massively

parallel computer systems [Ewing95].

The Center for Transportation Research of Virginia Tech is developing a simulation

package that models micro and macroscopic behavior of vehicles over a network. The software is

envisioned to integrate the different research efforts at CTR into one comprehensive ITS

software package. In the development of this tool, particular emphasis is given to modeling the

AHS system based on the control architecture developed under the PATH program [Varaiya93].

The package is called DYNAVIMTS, Dynamic Visual Micro/Macroscopic Traffic Simulation

Tool.

DYNAVIMTS is a multi-purpose AHS simulation software currently developed using different

software tools and platforms [Kachroo96b]. It has a distributed system architecture enabling the

user to distribute several time consuming functions of DYNAVIMTS to separate computers. The

main simulation currently runs on a Sun Sparc 2000 Workstation. The output of the simulation

is transferred to a Pentium PC which has animation software developed using OpenGL .

DYNAVIMTS consists of three main modules: the Communicator, the Simulator, and the

Animator (Figure 2.6).

Cem nsal Chapter 2. Automated Highway Systems 25

Simulator Animator

C++, CSim C, OpenGL, Matlab

Unix Windows NT, Unix

Communicator

C, X-Motif, Matlab,

ArcInfo

Unix, Windows NT

Depending upon the spatial dimensions of the problem at hand, macroscopic and

microscopic simulation models will be used. For example, DYNAVIMTS is currently capable of

studying the effects of vehicle dynamics on the stability of platoons. Incorporation of different

brake technologies are also underway. DYNAVIMTS has a range of vehicle dynamics and control

models that can be used for microscopic simulation studies. On the other hand, DYNAVIMTS is

also designed to study the effects of highway automation on the overall traffic flow in the traffic

network.

The simulation software is developed using C++ with CSIM libraries [CSIM96], and Matlab

[Matlab96]. As we move toward the macroscopic models, the design of the package will be

completely object-oriented, and multi-processor machines or parallel processing with multiple

machines will become necessary. A detailed description of the DYNAVIMTS package can be

found in [Nagarajan96].

As mentioned above, there are three main modules in the DYNAVIMTS software package

(Figure 2.7). The front-end graphical user interface (Communicator, Figure 2.8) is developed in C

using OSF/Motif and Xt Intrinsics libraries on a Sun Sparc. The structure of the interface and

the underlying modules again have been modeled based on the four-layer structure of the

California PATH Program. The submodule for the network layer design calls ARC/INFOs

ArcEdit [Arc96] utilities to construct the network. The GUI also consists of routines to invoke

incident management software, traffic assessment algorithms, and other related programs to form

the central part of an integrated ITS simulation package. The microscopic level simulator is a C++

program using CSIM libraries. There are three basic classes - vehicle, link and sensor. These

objects interact among themselves and with the CSIM objects, through their member functions.

The animation module is started after a run is executed by the simulator. The Animator consists

of two components: graphical plots and animation. The graphical plots generated by Matlab give

Cem nsal Chapter 2. Automated Highway Systems 26

the detailed information about the control parameters and sensor data. The animation submodule

is developed in C using OpenGL libraries on a NT workstation. The simulation data is currently

transferred via a pre-formatted ASCII data file. This module also contains the graphical data for

the planned Smart Road project. A detailed description of the package, its file structure and

graphic user interface is given in [Nagarajan96].

The control structure we introduce in Chapter 4 is incorporated in the DYNAVIMTS

simulation package as part of the planning layer simulator. Since the link layer development in

DYNAVIMTS is not complete, the planning layer simulator on Matlab assumes that the link

layer data is available (if at all necessary), and the resulting data is transferred to the animator

submodule in the form of an ASCII file. At the end of each simulation run, the user is also able to

use the GUI to plot different parameters and values using Matlab. The connection between the

planning and regulation layers does not yet exist as is the case in similar simulation packages. A

snapshot of the planning layer GUI is given in Appendix C.

Cem nsal Chapter 2. Automated Highway Systems 27

GUI

Parameters

Network Layer Planning Layer

Link Layer Regulation Layer

Physical Layer

SIMULATION

Sim.

Car Straight Roadside

3D ANIMATION GRAPHICAL

PLOTS

Birds Eye View

Drivers View

Zooming

Panning V

Fast Forward

Rewind

Pause & resume

Matlab statistics

t

Figure 2.7. Components of DYNAVIMTS simulation package.

Cem nsal Chapter 2. Automated Highway Systems 28

Chapter 3

predetermined sequence of operations or respond to encoded instructions. The term stochastic

emphasizes the adaptive nature of the automaton we describe here. The automaton described here

do not follow predetermined rules, but adapts to changes in its environment. This adaptation is

the result of the learning process described in this chapter.

psychologists in modeling observed behavior, the efforts of statisticians to model the

choice of experiments based on past observations, the attempts of operation

researchers to implement optimal strategies in the context of the two-armed bandit

problem, and the endeavors of system theorists to make rational decisions in random

environments [Narendra89].

In classical control theory, the control of a process is based on complete knowledge of the

process/system. The mathematical model is assumed to be known, and the inputs to the process

are deterministic functions of time. Later developments in control theory considered the

uncertainties present in the system. Stochastic control theory assumes that some of the

characteristics of the uncertainties are known. However, all those assumptions on uncertainties

and/or input functions may be insufficient to successfully control the system if changes. It is

then necessary to observe the process in operation and obtain further knowledge of the system,

i.e., additional information must be acquired on-line since a priori assumptions are not sufficient.

One approach is to view these as problems in learning.

Rule-based systems, although performing well on many control problems, have the

disadvantage of requiring modifications, even for a minor change in the problem space.

Furthermore, rule-based approach, especially expert systems, cannot handle unanticipated

situations. The idea behind designing a learning system is to guarantee robust behavior without

the complete knowledge, if any, of the system/environment to be controlled. A crucial advantage

of reinforcement learning compared to other learning approaches is that it requires no information

about the environment except for the reinforcement signal [Narendra89, Marsh93].

A reinforcement learning system is slower than other approaches for most applications

since every action needs to be tested a number of times for a satisfactory performance. Either the

learning process must be much faster than the environment changes (as is the case in our

Cem nsal Chapter 3. Stochastic Learning Automata 30

application), or the reinforcement learning must be combined with an adaptive forward model

that anticipates the changes in the environment [Peng93].

Learning1 is defined as any permanent change in behavior as a result of past experience,

and a learning system should therefore have the ability to improve its behavior with time, toward

a final goal. In a purely mathematical context, the goal of a learning system is the optimization of

a functional not known explicitly [Narendra74].

In the 1960s, Y. Z. Tsypkin [Tsypkin71] introduced a method to reduce the problem to

the determination of an optimal set of parameters and then apply stochastic hill climbing

techniques. M.L. Tsetlin and colleagues [Tsetlin73] started the work on learning automata during

the same period. An alternative approach to applying stochastic hill-climbing techniques,

introduced by Narendra and Viswanathan [Narendra72], is to regard the problem as one of finding

an optimal action out of a set of allowable actions and to achieve this using stochastic automata.

The difference between the two approaches is that the former updates the parameter space at

each iteration while the later updates the probability space.

The stochastic automaton attempts a solution of the problem without any information on

the optimal action (initially, equal probabilities are attached to all the actions). One action is

selected at random, the response from the environment is observed, action probabilities are

updated based on that response, and the procedure is repeated. A stochastic automaton acting as

described to improve its performance is called a learning automaton.

The first learning automata models were developed in mathematical psychology. Early research in

this area is surveyed by Bush and Mosteller [Bush58] and Atkinson et al. [Atkinson65]. Tsetlin

[Tsetlin73] introduced deterministic automaton operating in random environments as a model of

learning. Most of the works done on learning automata theory has followed the trend set by

Tsetlin. Varshavski and Vorontsova [Varshavski63] described the use of stochastic automata

with updating of action probabilities which results in reduction in the number of states in

comparison with deterministic automata.

Fu and colleagues [Fu65a, Fu65b, Fu67, Fu69a, Fu69b, Fu71] were the first researchers

to introduce stochastic automata into the control literature. Applications to parameter estimation,

pattern recognition and game theory were initially considered by this school. Properties of linear

updating schemes and the concept of a growing automaton are defined by McLaren

[McLaren66]. Chandrasekaran and Shen [Chand68,Chand69] studied nonlinear updating schemes,

nonstationary environments and games of automata. Narendra and Thathachar have studied the

theory and applications of learning autoamta and carried out simulation studies in the area. Their

book Learning Automata [Narendra89] is an introduction to learning automata theory which

surveys all the research done on the subject until the end of the 1980s.

1

Webster defines learning as to gain knowledge or understanding of a skill by study, instruction, or experience.

Cem nsal Chapter 3. Stochastic Learning Automata 31

The most recent (and second most comprehensive) book since the Narendra-Thathachar

collaboration on learning automata theory and applications is published by Najim and Pznyak in

1994 [Najim94]. This book also includes several applications and examples of learning automata.

Until recently, the applications of learning automata to control problems were rare.

Consequently, successful updating algorithms and theorems for advanced automata applications

were not readily available. Nario Babas work [Baba85] on learning behaviors of stochastic

automata under a nonstationary multi-teacher environment, and general linear reinforcement

schemes [Bush58] are taken as a starting point for the research presented in this dissertation.

Recent applications of learning automata to real life problems include control of

absorption columns [Najim91], bioreactors [Gilbert92], control of manufacturing plants

[Sequeria91], pattern recognition [Oommen94a], graph partitioning [Oommen94b], active vehicle

suspension [Marsh93, 95], path planning for manipulators [Naruse93], distributed fuzzy logic

processor training [Ikonen97], and path planning [Tsoularis93] and action selection [Aoki95] for

autonomous mobile robots.

Recent theoretical results on learning algorithms and techniques can be found in

[Najim91b], [Najim94], [Sastry93], [Papadimitriou94], [Rajaraman96], [Najim96], and

[Poznyak96].

The learning paradigm the learning automaton presents may be stated as follows: a finite number

of actions can be performed in a random environment. When a specific action is performed the

environment provides a random response which is either favorable or unfavorable. The objective

in the design of the automaton is to determine how the choice of the action at any stage should be

guided by past actions and responses. The important point to note is that the decisions must be

made with very little knowledge concerning the nature of the environment. It may have time-

varying characteristics, or the decision maker may be a part of a hierarchical decision structure

but unaware of its role in the hierarchy. Furthermore, the uncertainty may be due to the fact that

the output of the environment is influenced by the actions of other agents unknown to the

decision maker.

The environment in which the automaton lives responds to the action of the automaton

by producing a response, belonging to a set of allowable responses, which is probabilistically

related to the automaton action. The term environment2 is not easy to define in the context of

learning automata. The definition encompasses a large class of unknown random media in which

an automaton can operate. Mathematically, an environment is represented by a triple {, c, }

where represents a finite action/output set, represents a (binary) input/response set, and c is

a set of penalty probabilities, where each element ci corresponds to one action i of the set .

The output (action) (n) of the automaton belongs to the set , and is applied to the

environment at time t = n. The input (n) from the environment is an element of the set and can

2

Commonly refers to the aggregate of all the external conditions and influences affecting the life and development of

an organism.

Cem nsal Chapter 3. Stochastic Learning Automata 32

take on one of the values 1 and 2. In the simplest case, the values i are chosen to be 0 and 1,

where 1 is associated with failure/penalty response. The elements of c are defined as:

{

Prob (n) = 1 (n) = i }= c i (i = 12

, ,...) (3.1)

Therefore ci is the probability that the action i will result in a penalty input from the

environment. When the penalty probabilities ci are constant, the environment is called a

stationary environment.

There are several models defined by the response set of the environment. Models in

which the input from the environment can take only one of two values, 0 or 1, are referred to as

P-models3. In this simplest case, the response value of 1 corresponds to an unfavorable

(failure, penalty) response, while output of 0 means the action is favorable. A further

generalization of the environment allows finite response sets with more than two elements that

may take finite number of values in an interval [a, b]. Such models are called Q-models. When the

input from the environment is a continuous random variable with possible values in an interval [a,

b], the model is named S-model.

State

Transition Function F

Input Output Function G Output

Automaton

Environment

Penalty probabilities c

w is a set of internal states. At any instant n, the state (n) is an element of the finite

set = {1, 2,..., s}

w is a set of actions (or outputs of the automaton). The output or action of an

automaton an the instant n, denoted by (n), is an element of the finite set = { 1, 2,...,

r}

w is a set of responses (or inputs from the environment). The input from the

environment (n) is an element of the set which could be either a finite set or an infinite

set, such as an interval on the real line:

= { 1 , 2 ,..., m } or = {(a, b)} (3.2)

3

We will follow the notation used in [Narendra89].

Cem nsal Chapter 3. Stochastic Learning Automata 33

w F(,): is a function that maps the current state and input into the next

state. F can be deterministic or stochastic:

(n + 1) = F[ (n), (n)] (3.3)

w H(,): is a function that maps the current state and input into the current

output. If the current output depends on only the current state, the automaton is referred

to as state-output automaton. In this case, the function H(,) is replaced by an output

function G(): , which can be either deterministic or stochastic:

(n) = G[ ( n)] (3.4)

For our applications, we choose the function G() as the identity function, i.e., the states

of the automata are also the actions.

We now introduce the stochastic automaton in which at least one of the two mappings F and G is

stochastic. If the transition function F is stochastic, the elements fij of F represent the

probability that the automaton moves from state i to state j following an input 4:

f ij = Pr{ (n + 1) = j (n) = i , (n) = } i , j = 12

, ,..., s = 1 , 2 ,..., m (3.5)

, ,..., r (3.6)

Since fij are probabilities, they lie in the closed interval [a, b]5; and to conserve probability

measure we must have:

s

f ij = 1 for each and i. (3.7)

j =1

Example

States: 1, 2 Inputs: 1 , 2

050

. .

050

F( 1 ) =

025

. .

075

Transition matrices:

01. .

09

F( 2 ) =

075

. .

025

In Chapter 4, where we define our initial algorithm, the probabilities fij will be the same for all inputs and all

4

states i given a state j . Furthermore, the output mapping G is chosen as identity mapping.

5

[a, b]=[0, 1] in most cases.

Cem nsal Chapter 3. Stochastic Learning Automata 34

Transition Graphs:

0.50

= 1 0.50

0.75

1 2

0.25

0.90

0.10

= 2 0.25

1 2

0.75

In the previous example, the conditional probabilities fij were assumed to be constant,

i.e., independent of the time step n, and the input sequence. Such a stochastic automaton is

referred to as a fixed-structure automaton. As we will discuss later, it is useful to update the

transition probabilities at each step n on the basis of the environment response at that instant.

Such an automaton is called variable-structure automaton.

Furthermore, in the case of variable-structure automaton, the above definitions of the

transition functions F and G are not used explicitly. Instead of transition matrices, a vector of

action probabilities p(n) is defined to describe the reinforcement schemes as we introduce in

the next sections.

If the variable-structure automaton is a state-output automaton, and the probabilities of

transition from one state to another fij do not depend on current state and environment input,

then the relation between the action probability vector and the transition matrices is as follows:

Since the automaton is a state-output automaton, we omit the transition matrix G, and

only consider the transition matrix F.

The transition does not depend on the environment response, therefore there is only

one state transition matrix F where the elements are fij. For example, we must have:

f 11 f 12

F (1 ) = F ( 2 ) F = (3.8)

f 21 f 22

Furthermore, the probability of being in one state (or generating one action) does not

depend on the initial/previous state of the automaton. Therefore, the transition matrix reduces to:

[ ]

F = f 11 = f 21 f 1 f 12 = f 22 f 2 p (3.9)

th

where vector p consists of probabilities pi of the automaton being in the i state (or choosing the

ith output/action).

Before introducing the variable structure automata, we will introduce the definitions necessary to

evaluate the performance of a learning variable-structure automaton. A learning automaton

generates a sequence of actions on the basis of its interaction with the environment. If the

automaton is learning in the process, its performance must be superior to intuitive methods.

Cem nsal Chapter 3. Stochastic Learning Automata 35

To judge the performance of the automaton, we need to set up quantitative norms of behavior.

The quantitative basis for assessing the learning behavior is quite complex, even in the simplest

P-model and stationary random environments. To introduce the definitions for norms of

behavior, we will consider this simplest case. Further definitions for other models and non-

stationary environments will be given whenever necessary.

If no prior information is available, there is no basis in which the different actions i can be

distinguished. In such a case, all action probabilities would be equal a pure chance situation.

For an r-action automaton, the action probability vector p(n) = Pr {(n) = i } is given by:

1

pi (n ) = i = 12

, ,..., r (3.10)

r

Such an automaton is called pure chance automaton, and will be used as the standard for

comparison. Any learning automaton must at least do better than a pure-chance automaton.

Consider a stationary random environment with penalty probabilities

{c , c ,..., c } where

1 2 r { }

ci = Pr ( n) = 1 (n) = i . We define a quantity M(n) as the average

penalty for a given action probability vector:

M ( n) = E[ (n ) p(n)] = Pr{ (n) = 1 p( n)}

r

i =1

= ci pi ( n)

i =1

1 r

Mo = ci

r i =1

(3.12)

E [ M (n)] = E {E[ (n ) p(n)]}

(3.13)

= E [ (n)]

i.e., E[M(n)] is the average input to the automaton. Using above definitions, we have the

following:

n

E [ M (n)] < M o .

r r

i =1 i =1

lim

n

E [ M (n)] = cl (3.14)

Cem nsal Chapter 3. Stochastic Learning Automata 36

Optimality implies that the action l is associated with the minimum penalty probability cl is

chosen asymptotically with probability one. In spite of efforts of many researchers, the general

algorithm which ensures optimality has not been found [Baba83, Kushner72, Narendra89]. It

may not be possible to achieve optimality in every given situation. In this case, a suboptimal

behavior is defined.

lim E [ M ( n)] = cl + (3.15)

n

Some automata satisfy the conditions stated in the definitions for specified initial

conditions and for certain sets of penalty probabilities. However, we may be interested in

automata that exhibit a desired behavior in arbitrary environments and for arbitrary initial

conditions. These requirements are partially met by an absolutely expedient automaton, which is

defined as:

E[M(n+1) | p(n)] < M(n) (3.16)

for all n, all pi(n) (0, 1) and for all possible sets {c1 , c 2 ,..., cr } . Taking the expectations of both

sides, we can further see that:

E [ M (n + 1)] < E[ M (n )] (3.17)

interested in the case where there is one optimal action j with cj = 06. In such an environment,

a learning automaton is expected to reach a pure optimal strategy [Najim94]:

p( n) e rj (3.18)

where erj is the unit vector with jth component equal to 1. In other words, the automaton will be

r

i =1

A more flexible learning automaton model can be created by considering more general stochastic

systems in which the action probabilities (or the state transitions) are updated at every stage

using a reinforcement scheme. In this section, we will introduce the general description of a

reinforcement scheme, linear and nonlinear. For simplicity, we assume that each state

6

We will discuss the feasibility of the case later.

Cem nsal Chapter 3. Stochastic Learning Automata 37

corresponds to one action, i.e., the automaton is a state-output automaton. The simulation

examples in later chapters also use the same identity state-output mapping.

In general terms, a reinforcement scheme can be represented as follows:

p( n + 1) = T1 [ p (n), (n), (n)] (3.19a)

or

f ij (n + 1) = T2 [ f ij (n), (n), (n + 1) ( n)] (3.19b)

where T1 and T2 are mappings. Again, is the action, is the input from environment, and is

the state of the automaton. From now on, we will use the first mathematical description (3.19a)

for reinforcement schemes. If p(n+1) is a linear function of p(n), the reinforcement scheme is said

to be linear; otherwise it is termed nonlinear.

Consider a variable-structure automaton with r actions in a stationary environment with

= {0, 1} . The general scheme for updating action probabilities is as follows:

If (n) = i

p j (n + 1) = p j (n) gi ( p( n)) for all j i

r

when = 0 pi ( n + 1) = pi (n) + g k ( p(n)) (3.20)

k =1

k i

r

when = 1 pi (n + 1) = pi (n) hk ( p(n))

k =1

k i

where gk and hk (k=1,2,...,r) are continuous, nonnegative functions with the following

assumptions:

0 < gk ( p(n )) < pk (n)

r

0 < [ pk ( n) + hk ( p( n))] < 1 (3.21)

k =1

k i

for all i=1,2,...,r and all probabilities pk in the open interval (0,1). The two constraints on update

functions guarantees that the sum of all action probabilities is 1 at every time step.

The reinforcement scheme is the basis of the learning process for learning automata. These

schemes are categorized based on their linearity. A variety of linear, nonlinear and hybrid

schemes exists for variable structure learning automata [Narendra89]. The linearity characteristic

of a reinforcement scheme is defined by the linearity of the update functions gk and hk (see the

Cem nsal Chapter 3. Stochastic Learning Automata 38

examples in the next section). It is also possible to divide all possible algorithms for stochastic

learning automata into the following two classes [Najim94]:

nonprojectional algorithms, where individual probabilities are updated based on their previous

values:

pi (n + 1) = pi (n) + k i (pi (n ), (n), ( n)) (3.22)

where is a small real number and k i are the mapping functions for the update term.

projectional algorithms, where the probabilities are updated by a function which maps the

probability vector to a specific value for each element:

pi (n + 1) = k i (p( n), ( n), (n)) (3.23)

where the functions k i are the mapping functions.

The main difference between these two subclasses is that nonprojectional algorithms can

only be used with binary environment response (i.e., in a P-model environment). Projectional

algorithms may be used in all environment types and are more complex. Furthermore,

implementation of projectional algorithms are computationally more tasking.

Early studies of reinforcement schemes were centered around linear schemes for reasons

of analytical simplicity. The need for more complex and efficient reinforcement schemes

eventually lead researchers to nonlinear (and hybrid) schemes. We will first introduce several

well-known linear schemes, and then close this section with absolutely expedient nonlinear

schemes.

For an r-action learning automaton, the general definition of linear reinforcement schemes can be

obtained by the substitution of the functions in Equation 3.20 as follows:

g k ( p( n)) = a p k (n)

b

hk ( p(n)) = b p k (n ) (3.24)

r 1

0 < a ,b < 1

Therefore, the scheme corresponding to general linear schemes is given as:

If (n) = i ,

p j (n + 1) = (1 a) p j (n) for all j i

when = 0

pi (n + 1) = pi (n) + a [1 pi (n)]

b

p j (n + 1) = +(1 b ) p j (n ) for all j i

when = 1 r 1 (3.25)

pi (n + 1) = (1 b ) pi (n)

As seen from the definition, the parameter a is associated with reward response, and the

parameter b with penalty response. If the learning parameters a and b are equal, the scheme is

Cem nsal Chapter 3. Stochastic Learning Automata 39

called the linear reward-penalty scheme LR-P [Bush58]. In this case, the update rate of the

probability vector is the same at every time step, regardless of the environment response. This

scheme is the earliest scheme considered in mathematical psychology.

For the linear reward-penalty scheme LR-P , E[p(n)], the expected value of the probability

vector at time step n, can be easily evaluated, and, by analyzing eigenvalues of the resulting

difference equation, it can be shown that asymptotic solution of the set of difference equations

enables us to conclude [Narendra89]:

r

r

ck

k =1

lim E[ M(n)] = r

< = Mo (3.26)

n r

1

ck

k=1

Therefore, from Definition 3.1, the multi-action automaton using the LR-P scheme is expedient for

all initial action probabilities and in all stationary random environments.

Expediency is a relatively weak condition on the learning behavior of a variable-structure

automaton. An expedient automaton will do better than a pure chance automaton, but it is not

guaranteed to reach the optimal solution. In order to obtain a better learning mechanism, the

parameters of the linear reinforcement scheme are changed as follows: if the learning parameter b

is set to 0, then the scheme is named the linear reward-inaction scheme LR-I . This means that the

action probabilities are updated in the case of a reward response from the environment, but no

penalties are assessed.

For this scheme, it is possible to show that pl(n), the probability of the action l with

minimum penalty probability cl, monotonically approaches 1. By making the parameter a

{ }

arbitrarily small, we have Pr lim pl (n) = 1 as close to unity as desired. This makes the learning

n

Although early studies of automata learning were done mostly on linear schemes, a few attempts

were made to study nonlinear schemes [Vor65, Chand68, Shapiro69, Vis72]. Evolution of early

reinforcement schemes occurred mostly in a heuristic manner. For two-action automata, design

and evaluation of schemes was almost straight forward, since actually only one action was being

updated. Generalization of such schemes to multi-action case was not straightforward

[Narendra89]. This problem led to a synthesis approach toward reinforcement schemes.

Researchers started asking the question: What are the conditions on the updating functions that

guarantee a desired behavior? The problem viewed in the light of this approach resulted in a new

concept of absolute expediency. While searching for a reinforcement scheme that guarantees

optimality and/or absolute expediency, researchers eventually started considering schemes that

can be characterized as nonlinear. The update functions gk and h k are nonlinear functions of

Cem nsal Chapter 3. Stochastic Learning Automata 40

penalty probabilities. For example, the following update functions are defined by previous

researchers for nonlinear reinforcement schemes7:

a

g j ( p( n)) = h j ( p(n)) = p ( n)(1 pi ( n))

r1 i

This scheme is absolutely expedient for restricted initial conditions and for restricted

environments [Shapiro69].

g j ( p (n)) = p j ( n) ( p(n ))

pi (n) ( pi ( n))

h ( p ( n )) =

r 1

j

m

1 1

probabilities must be satisfied (e.g., cl < and c j > for j l). [Chand68]

m m

g j ( p( n)) = a ( p1 ( n),1 p1 (n )) p j +1 (1 p j (n))

h j ( p(n)) = b ( p1 ( n),1 p1 (n )) p j +1 (1 p j (n))

where ( p1 ,1 p1 ) = (1 p1 , p1 ) is a nonlinear function which can be suitably selected, and

1. Parameters a and b must be chosen properly to satisfy the conditions in Equation

3.21. Sufficient conditions for ensuring optimality is given in [Vor65], but they are valid only

for two-action automata.

The general solution for absolutely expedient schemes is found in early 1970s by

Lakshmivarahan and Thathachar [Lakshmivarahan73]. Absolutely expedient learning schemes are

presently the only class of schemes for which necessary and sufficient conditions of design are

available. This class of schemes can be considered as the generalization of the LR-I scheme.

Consider the general reinforcement scheme in Equation 3.20. A learning automaton using this

scheme is absolutely expedient if and only if the functions gk (p) and hk (p) satisfy the following

conditions:

g1 ( p) g2 ( p) g ( p)

= == r = ( p)

p1 p2 pr

(3.27)

h1 ( p) h2 ( p) hr ( p)

= = = = ( p)

p1 p2 pr

where (p) and (p) are arbitrary functions satisfying 8:

Again, (n) = i is assumed. We give only the updating functions for action probabilities other than the current

7

action; the function for current action can be obtained by using the fact that the sum of probabilities is equal to 1.

8

The reason for the conditions on the update functions will be explained in detail in Chapter 6.

Cem nsal Chapter 3. Stochastic Learning Automata 41

0 < ( p) < 1

p j (3.28)

0 < ( p) < min

j=1,...,r 1 pj

Detailed proof of this theorem is given in both [Baba84] and [Narendra89]. A similar proof (for a

new nonlinear absolutely expedient scheme) is given in Chapter 6.

3.6.1 S-Model Environments

Up to this point we based our discussion of learning automata only on the P-model environment,

where the input to the automaton is either 0 or 1. We now consider the possibility of input

values over the unit interval [a, b]. Narendra and Thathachar state that such a model is very

relevant in control systems with a continuous valued performance index. In previous sections, the

algorithms (for P-model) were stated in terms of the functions gk and hk which are the reward and

penalty functions respectively. Since in S- (and Q-) models the outputs lie in the interval [0,1],

and therefore are neither totally favorable or totally unfavorable, the main problem is to

determine how the probabilities of all actions are to be updated. Narendra and Thathachar

[Narendra89] show that all the principal results derived for the P-model carry over the more

realistic S- and Q-models also. Here, we will not discuss the Q-model; all definitions for the Q-

model are similar to the P-model.

In the case where the environment outputs k are not in the interval [0,1], but in [a, b] for

some a ,b IR, it is always possible to map the output into the unit interval using:

a

k = k (3.29)

ba

where a = mini{ i} and b = maxi{ i}9. Therefore, only the normalized S-model will be

considered.

The average penalty M(n) is still defined as earlier, but instead of ci {0,1}, si[0,1] are

used in the definition:

M ( n) = E[ (n) p (n)]

r

k =1

r

= sk pk

k =1

where:

sk = E[ (n) (n) = k ] (3.31)

9

Note that this normalization procedure requires the knowledge of a and b.

Cem nsal Chapter 3. Stochastic Learning Automata 42

In the S-model, sk plays the same role as the penalty probabilities in the P-model, and are

called penalty strengths. Again, for a pure-chance automaton with all the action probabilities

equal, the average penalty Mo is given by:

1 r

r i

Mo = si (3.32)

=1

The definitions of expediency, optimality, -optimality, and absolute expediency are the

same, except the fact that cis are replaced by s is. Now, consider the general reinforcement

scheme (Equation 3.20). The direct generalization of this scheme to S-model gives:

pi (n + 1) = pi ( n) (1 ( n)) gi ( p(n)) + ( n)hi ( p (n)) if ( n) i

(3.33)

pi (n + 1) = pi ( n) + (1 ( n)) g j ( p(n)) ( n) pi h j ( p( n)) if (n ) = i

j i j i

The previous algorithm (Equation 3.20) can easily be obtained from the above definition

by substituting ( n) = 0 and 1. Again, the functions gi and hi can be associated with reward and

penalty respectively. The value 1 (n ) is an indication of how far ( n) is away from the

maximum possible value of unity, and is maximum when ( n) is 0. The functions gi and hi have to

satisfy the conditions for Equation 3.21 for pi(n) to remain in the interval [0,1] at each step. The

non-negativity condition on gi and hi maintains the reward-penalty character of these functions,

though it is not necessary.

For example the LR-P reinforcement scheme (Equation 3.25) can be rewritten for S-model

(SLR-P) as:

a

pi (n + 1) = pi (n) [1 ( n)] a pi (n) + (n) [ a p( n)] if ( n) i

r 1 (3.35)

pi (n + 1) = pi (n) + [1 ( n)] a [1 pi (n )] ( n) a pi (n) if (n) = i

r

lim E[ M( n)] = r (3.36)

n

1si

k= 1

Thus, the SLR-P scheme has analogous properties to the LR-P scheme. It is expedient in all

stationary random environments and the limiting expectations of the action probabilities are

inversely proportional to the corresponding si. The conditions of Equations 3.27 and 3.28 for

absolute expediency also applies to updating algorithms for S-model environment.

Thus far we have considered only the automaton in a stationary environment. However, the need

for learning and adaptation in systems is mainly due to the fact that the environment changes

Cem nsal Chapter 3. Stochastic Learning Automata 43

with time. The performance of a learning automaton should be judged in such a context. If a

learning automaton with a fixed strategy is used in a nonstationary environment, it may become

less expedient, and even non-expedient. The learning scheme must have sufficient flexibility to

track the better actions. The aim in these cases is not to evolve to a single action which is

optimal, but to choose the actions to minimize the expected penalty. As in adaptive control

theory, the practical justification for the study of stationary environments is based on the

assumption that if the convergence of a scheme is sufficiently fast, then acceptable performance

can be achieved in slowly changing environments.

An environment is nonstationary if the penalty probabilities ci (or si) vary with time. The

success of the learning procedure then depends on the environment changes as well as the prior

information we have about the environment. From an analytic standpoint, the simplest case is the

one in which penalty probability vector c(n) can have a finite number of values. We can visualize

this situation as an automaton operating in a finite set of stationary environments, and assign a

different automaton Ai to each environment Ei. If each automaton uses an absolutely expedient

scheme, then -optimality can be achieved asymptotically for the overall system. However, this

method requires that the system be aware of the environmental changes in order to assign the

correct automaton, and the number of environments must not be large. Furthermore, since the

automaton Ai is updated only when the environment is Ei, updating strategies may happen

infrequently so that the corresponding action probabilities do not converge [Narendra89]. In our

application, a similar approach is employed as explained in Chapter 4.

There are two classes of environments that have been analytically investigated:

Markovian switching environments, and state-dependent nonstationary environments. If we

assume that the environments E i are states of a Markov chain described by a state transition

matrix, then it is possible to view the variable-structure automaton in the Markovian switching

environment as a homogeneous Markov chain. Therefore, analytical investigations are possible.

Secondly, there are four different situations wherein the states of the nonstationary environment

vary with the step n, and analytical tools developed earlier are adequate to obtain a fairly

complete descriptions of the learning process. These are:

w Variable-structure automata in state-dependent nonstationary environments

w Hierarchy of automata

w Nonstationary environments with fixed optimal actions

The fundamental concepts such as expediency, optimality, and absolute expediency have

to be re-examined for nonstationary environments, since, for example, the optimal action can

change with time. Here, we again consider the P-model for purposes of stating the definitions.

The average penalty M(n) becomes:

r

M(n) = ci (n) pi (n ) (3.37)

i =1

Cem nsal Chapter 3. Stochastic Learning Automata 44

1 r

Mo (n ) = ci (n) (3.38)

r i =1

Using these descriptions, we now have the following:

no such that for all n > no, we have:

E[ M(n)] Mo (n) < 0 (3.39)

The previous concept of optimality can be applied to some specific nonstationary

environments. For example, if there exists a fixed l such that cl(n) < ci(n) for all i = 1,2,...,r, i l,

and for all n (or at least for all n > n 1 for some n1), then l is the optimal action; the original

definitions of optimality and -optimality still hold. In general, we have:

automaton can be defined as optimal if it minimizes:

1 T

lim E[ ( n)] (3.40)

T T

n =1

The definition of absolute expediency can be retained as before, with time-varying

penalty probabilities and the new description in Equation 3.37.

All the learning schemes and behavior norms discussed up to this point have dealt with a single

automaton in a single environment. However, it is possible to have an environment in which the

actions of an automaton evoke a vector of responses due to multiple performance criteria. We

will describe such environments as multi-teacher environments. Then, the automata have to

find an action that satisfies all the teachers.

In a multi-teacher environment, the automaton is connected to N teachers (or N single-

teacher environments). Then, a single-teacher environment is described by the action set , the

output set i ={0,1} (for a P-model), and a set of penalty probabilities {c , c ,..., c }where

i

1

i

2

i

r

{ }

c ij = Pr i ( n) = 1 ( n) = j . The action set of the automaton is of course the same for all

teachers/environments. Baba discussed the problem of a variable-structure automaton operating

in a multi-teacher environment [Baba85]. Conditions for absolute expediency are given in his

works (see also Chapter 6 for new nonlinear reinforcement schemes).

Some difficulty may arise while formulating a mathematical model of the learning

automaton in a multi-teacher environment. Since we have multiple responses, the task of

interpreting the output vector is important. Are the outputs from different teachers to be

summed after normalization? Can we introduce weight factors associated with specific teachers?

Cem nsal Chapter 3. Stochastic Learning Automata 45

If the environment is of a P-model, how should we combine the outputs? In the simplest case

that all teachers agree on the ordering of the actions, i.e., one action is optimal for all

environments, the updating schemes are easily defined. However, this will not be the case in our

application.

The elements of the environment output vector must be combined in some fashion to

form the input to the automaton. One possibility is to use to use an AND-gate as described in

[Thathachar77]. Another method is taking the average, or weighted average of the all responses

[Narendra89, Baba85]. This is valid for Q- or S-model environments. Our application uses logic

gates (OR) and additional if-then conditions as described in Chapters 4 and 5.

Although we based our discussion to a single automaton, it is possible that there are more than

one automata in an environment. If the interaction between different automata is provided by the

environment, the case of multi-automata is not different than a single automaton case. The

environment reacts to the actions of multiple automata, and the environment output is a result of

the combined effect of actions chosen by all automata. If there is direct interaction between the

automata, such as the hierarchical (or sequential) automata models, the actions of some automata

directly depend on the actions of others.

It is generally recognized that the potential of learning automata can be increased if

specific rules for interconnections can be established. There are two main interconnected

automata models: synchronous and sequential. An important consequence of the synchronous

model is that the resulting configurations can be viewed as games of automata with particular

payoff structures. In the sequential model, only one automaton acts at each time step. The action

chosen determines which automaton acts at the next time step. Such a sequential model can be

viewed as a network of automata in which control passes from one automaton to another. At

each step, one decision maker controls the state transitions of the decision process, and the goal

of the whole group is to optimize some overall performance criterion.

In Chapters 4 and 5, the interconnection between automata is considered. Since each

vehicles planning layer will include two automata one for lateral, the other for longitudinal

actions the interdependence of these two sets of actions automatically results in an

interconnected automata network. Furthermore, multiple autonomous vehicles with multiple

automata as path controllers constitute a more complex automata system as described in Chapter

7.

Chapter 4

Intelligent Vehicle Control

As stated in Section 2.2, vehicle control is one of the most important issues in ITS, and especially,

in AHS. Designing a system that can safely control a vehicles actions while contributing to the

optimal solution of the congestion problem is difficult. Besides the control problems at the

regulation layer, there is a need for extensive research on the planning layer of the control

structure described by Varaiya [Varaiya 93], or at the tactical level of the driving task

[Sukthankar96c]. When the design of a vehicle capable of carrying out tasks such as vehicle

following at high speeds, automatic lane tracking, and lane changing is complete, we must also

have a control/decision structure that can intelligently make decisions in order to operate the

vehicle in a safe way.

The task of creating intelligent systems that we can rely on consequently brings the idea of

artificial intelligence to mind. ITS community is well aware of the fact that the implementation

of such a complex (and maybe global) system requires investigation of several different methods

and the simultaneous applications of many. Several emerging methods, such as cellular automata,

self-organization, neural networks, fuzzy logic, and hybrid systems applications are being

mentioned [TRB95, Godbole95, Ho96, Deshpande96]. Initial research on intelligent vehicle

control indicates that a planning-regulation system that can guarantee optimal operation with a

sound theoretical background has not yet been developed, and it may be vital to the

implementation of an automated highway system [Varaiya93, Lasky93].

In this chapter, we introduce a decision/control method for intelligent vehicles.

Considering the complexity of an automated highway system or an intelligent vehicle-highway

system, classical control methods are found to be insufficient to provide a fully automated,

collision-free environment [Varaiya93]. Although we may not solve the whole problem using a

single method, we attempt to find a way to make intelligent decisions here. Our approach to the

problem of vehicle control makes the use of Learning Automata techniques described in Chapter

3. The learning algorithms used in this application are introduced and discussed separately in

Chapter 6 for reasons of clarity. We visualize the planning layer (See Section 2.3) of an intelligent

vehicle as an automaton (or automata group) in a nonstationary environment1. The aim here is to

design an automata system that can learn the best possible action (or action pairs: one for lateral,

1

The interpretation of the term environment is twofold, as we explain later.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 47

one for longitudinal) based on the data received from on-board sensors, and possibly some form

of vehicle-to-vehicle and roadside-to-vehicle communications.

The significance of this system is that the controller we define will be useful as a backup

system or the primary system in controlling the path of a vehicle in the case of communication loss

with the higher layer in the hierarchy of a full AHS as well as during transition from fully

automated to manual control. Since the implementation of a fully automated highway system is

not imminent, we will try to concentrate on autonomous vehicles throughout this work.

For our model, we assume that an intelligent vehicle is capable of two sets of lateral and

longitudinal actions. Lateral actions are shift-to-left-lane (SL), shift-to-right-lane (SR) and stay-

in-lane (SiL). Longitudinal actions are accelerate (ACC), decelerate (DEC) and keep-same-speed

(SM). The actions stay-in-lane and keep-same-speed are idle actions, and can also be treated as

a single action cruise. There are nine possible action pairs provided that speed deviations during

lane changes are allowed (Table 4.1).

DEC SL DEC SR DEC SiL

SM SL SM SR SM SiL

idle actions that can be combined into a single action).

An autonomous vehicle must be able to sense the environment around itself. In the

simplest case, it is to be equipped with at least one sensor looking at the direction of possible

vehicle moves. Furthermore, an autonomous vehicle must also have the knowledge of the rate of

its own displacement. Therefore, we assume that there are four different sensors on board the

vehicle. These are the headway sensor, two side sensors, and a speed sensor. The headway

sensor is a distance measuring device which returns the headway distance to the object in front of

the vehicle. An implementation of such a device is a laser radar. Side sensors are assumed to be

able to detect the presence of a vehicle traveling in the immediately adjacent lane. Their outputs

are binary. Infrared or sonar detectors are currently used for this type of sensor. The speed sensor

is simply an encoder returning the current wheel speed of the vehicle.

Each sensor is connected to its associated module. Sensor modules evaluate the sensor

signals in the light of the current automata actions, and send a response to the automata (Figure

4.1). We visualize each sensor module as a teacher in a nonstationary automata environment (or a

multi-environment system). The detailed descriptions of these sensor modules are given in the

next section.

Our basic model for planning and coordination of lane changing and speed control is

shown in Figure 4.1. The response of the environment is a combination of the outputs of all four

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 48

teacher blocks. The details of the mapping F is given in Section 4.4. The mapping F from sensor

module outputs to the input of the automata can be a binary function (for a P-model

environment), a linear combination of four teacher outputs, or a more complex function as is

the case for this application. An alternative and possibly more ideal model would use a linear

combination of teacher outputs with adjustable weight factors (e.g., S-model environment).

distance

Headway Teachers

binary

Left Detection Automaton

1 & 2 action

F Vehicle Highway

binary

Right Detection Lateral Regulation & Physical

Automaton Layers

Speed Detection

Sensors

environment. The action of an automaton is a signal to the regulation layer which defines the

current choice. It is the regulation layers responsibility to interpret this signal. When an action is

carried out, it of course affects the physical environment. The sensors in turn sense the changes in

the environment, and the feedback loop is closed with the sensor modules and the signal .

The regulation layer is not expected to carry out the action chosen immediately. This is

not even possible for lateral actions. To smooth the system output, the regulation layer carries out

an action if it is recommended m times consecutively by the automaton, where m is a predefined

parameter less than or equal to the number of iterations per second (Figure 4.2). In other words,

the length of the memory vector is m. Whenever this vector (or buffer) is filled with the same

action, the action is fired. This may of course be changed to k times in the last m choices or a

more sophisticated decision rule.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 49

Action from

A1 A2 A2 A3 A1 A2

Planning Layer

IF ACTIONS ARE THE SAME

Action to

Execute

When an action is carried out, the action probabilities in the controlling automaton are

initialized to 1/r, where r is the number of actions. Although not necessary, this initialization

enables the learning automaton to adapt to a new situation faster. One of the reinforcement

schemes described in Chapter 6 is designed to speed up the adaptation process when the

probability of the best/desired action is close to zero. When this new nonlinear scheme and/or very

fast update rates are used, probability vector initialization is not necessary. When an action is

executed, the memory vectors are initialized also; all locations are filled with the idle actions SiL

or SM. A minimum processing speed of 25, and a maximum of 200 iterations per second are

assumed. This value is related only to the computations; the sensor data feeds may have a

different (and constant) rate. The upper limit of 200Hz is due to the communication requirements

considered in the current AHS research [Lasky93].

The discussion of nonstationary environments in Chapter 3 is based on the changing

penalty probabilities of actions. In this application, the action probabilities of the learning

automata are functions of the status of the physical environment (e.g., a decreasing headway

distance will result in a penalty response from the front sensor module if the chosen action is for

ACC). The realization of a deterministic mathematical model of this physical environment may be

impossible, if not extremely difficult. Therefore, simulation is the only way of demonstrating the

expediency and/or absolute optimality for a nonstationary automata environment resulting from a

changing physical environment.

The four teacher modules listed above are simple decision blocks that calculate the response

associated with the corresponding sensor, based on the last chosen action. Table 4.2 below

describes the output of these decision blocks for side sensors.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 50

Vehicle in sensor No vehicle in sensor

2

Actions range or no adjacent range and adjacent

lane lane exists

SiL 0/0 0/0

SL 1/0 0/0

SR 0/1 0/0

SM # #

ACC # #

DEC # #

Table 4.2. Output of the left/right sensor modules.

As seen in Table 4.2, a penalty response from the left sensor module is received only when

the action is SL and there is a vehicle in the left sensors range or the vehicle is already traveling

at the left or rightmost lane. Similarly, the action SR is penalized if there is a vehicle on the right

lane. All other situations result in a reward response from this sensor module. The longitudinal

automaton do not use side sensors for reinforcement. Initial simulations define the range of the

side sensors using two parameters sr1 and sr2 as shown in Figure 4.3. It is assumed that the side

sensors range covers only the adjacent lane, and the sensor is mounted right at the middle of the

vehicles side.

The output of the side sensors are assumed to be binary, indicating the existence of a

vehicle on the adjacent lane. However, the side sensor module may also use the distance of the

detected vehicle from the sensor source for more intelligent decisions. This type of design may

result in a more expensive implementation, as it requires distance measurement and possibly

additional sensors.

V1 > V2

sr1 sr2

V3

V1 V2

fsr

2

The actions are shift-to-left-lane (SL), shift-to-right-lane (SR), stay-in-lane (SiL), accelerate (ACC), decelerate

(DEC) and keep-same-speed (SM).

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 51

Assuming that the front sensor is capable of detecting the headway distance, we define the

headway module as shown in Table 4.3. If the sensor sees a vehicle at a close distance (< fsr), a

penalty response is sent to the automaton for actions stay-in-lane, accelerate, and keep-same-

speed. All other actions (shifting lane or decelerating) may serve to avoid a collision, and

therefore, are encouraged. Then, an implemented headway modules task is simply to compare the

time-of-flight of the echo to the predefined time interval corresponding to distance fsr. Also note

that although we show the limiting value fsr as the sensor range in Figure 4.3, the actual sensor

range and this value do not have to be equal. The output of this module affects both automata as

seen in Table 4.3.

The speed sensor of the autonomous vehicle is assumed to be an encoder connected to the

wheel shaft. The output of the encoder can be used to detect the wheel speed, which is

approximately equal to vehicle speed (or equal when cruising). The speed modules task is simply

to compare the actual speed to the desired speed, and based on the action chosen, send a feedback

to the longitudinal automata. The responses depending on the speed and the longitudinal actions

are given in Table 4.4.

Sensor Status

Actions Vehicle in range No vehicle in range

(headway < fsr) (headway fsr)

SiL 1 0

SL 0 0

SR 0 0

SM 1 0

ACC 1 0

*

DEC 0 0

Table 4.3. Output of the Front Sensor block.

Sensor Status

Actions dev < -pdif -pdif < dev < pdif dev > pdif

SiL # # #

SL # # #

SR # # #

SM 1 0 1

ACC 0 0 1

DEC 1 0 0

dev actual speed - desired speed

pdif permitted difference between actual and desired speeds

Table 4.4. Output of the Speed Sensor Module.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 52

When the actual vehicle speed differs from the desired speed by a large amount, then only

the action decreasing the speed deviation receives a reward; others are penalized. This forces the

vehicle to slow down or speed up in order to match the desired speed. The value of the parameter

pdif (permitted difference between actual and desired speeds) is predefined.

The values of the sensor limits sr1, sr2, fsr, and pdif define the capabilities of the sensors

(in the case front and side sensors) as well as the behavior of the vehicle. Sensitivity to the

headway distance and/or to speed fluctuations are given by the predefined parameters and are key

to the behavior of the autonomous vehicle. For example, a vehicle with shorter side sensor range

will tend to jump in front of vehicle in the next lane more often than a vehicle equipped with

more capable sensors.

Now that we have defined the sensor module outputs, the problem is to intelligently

employ these signals for automata reinforcement. It is possible to treat all sensor modules as

separate teachers with conflicting feedback responses for the automata. For example, consider the

situation given in Figure 4.3. The longitudinal action will receive a penalty from the front sensor

module due to the existence of a vehicle in front. If the actual speed of this vehicle is less than the

desired speed, the speed module will try to force the vehicle to increase its speed. These two

outputs conflict, and it is obvious that one must have priority over the other: no matter what its

current speed is, a vehicle must slow down in order to avoid a collision if it senses another vehicle

occupying the immediate space in front of itself. The next section describes our approach to the

problem.

Action probabilities of the longitudinal and lateral automata are updated using the binary feedback

from the four sensor modules described in the previous section. As seen in Table 4.5, lateral

actions probabilities depend on three sensor outputs (front and side sensors), and longitudinal

action probabilities are updated based on the output of the headway and speed sensor modules.

Modules

Actions Headway Left Right Speed

SiL ? 0 0 -

SL 0 ? 0 -

SR 0 0 ? -

SM ? - - ?

ACC ? - - ?

DEC 0 or 0* - - ?

? means reward (0) or penalty (1) response is possible.

- means that output is not used; can be treated as 0.

0* has a higher priority (See discussion below).

Table 4.5. Action - Sensor Module matrix.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 53

The probability of the lateral action SiL depends mainly on the output of the headway

sensor. When this action is chosen, the probability is decreased when the headway module

indicates a vehicle in the front sensor range. Side sensors always sends a reward response to this

action no matter what the situation is. Therefore, we can obtain a combined environment response

for this action by simply OR-ing the binary module outputs.

The same method is also valid for the two other lateral actions. Probabilities of actions SL

and SR depend on the output of the left and right sensor modules, respectively. A decision to shift

lane is penalized if the associated sensor indicates the presence of a vehicle in the adjacent lane.

Therefore, by using a simple OR function, we can combine all three sensor module outputs to

obtain a meaningful teacher response for the lateral automaton.

For the longitudinal actions, although there are only two sensors modules whose feedback

is considered for reinforcement, the decision is more complicated. Longitudinal actions SM and

ACC may receive a penalty or reward from the front sensor depending on the headway distance.

The output of the speed module depends on the actual speed of the vehicle, and is used to force

the vehicle to match the desired speed. Action DEC is considered as good all the time by the

headway sensor module. However, the reward response in the case where there is a vehicle in the

front sensor range (indicated by 0*) is different than the normal reward response (indicated by 0):

this reward response must override a possible penalty from other modules. For the safety of the

vehicle, the output of this sensor must have a higher priority than that of the speed module.

Therefore, a simple OR-gate is not sufficient; additional Boolean functions must be used. Possible

action-sensor output scenarios are shown in Table 4.6.

Actions Speed Headway Combined

SM 0 0 0

or 0 1 1

ACC 1 0 1

1 1 1

0 0 0

DEC 0 0* 0

1 0 1

1 0* 0

Table 4.6. Possible longitudinal action-sensor output scenarios.

Again, an OR-ing of the headway and speed module outputs will give the combined

output in Table 4.6 except the case where the action is DEC and the headway module indicates

the presence of a vehicle in front. Therefore, the output of an OR-gate must be double-checked

for this condition. Since we are using OR gates, we can define dont care conditions (-) as

reward responses. Thus, it is possible to OR the outputs of all sensors. Then, the function F

mapping individual sensor outputs to a combined feedback signal is defined as in Figure 4.4.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 54

=0

ELSE = f

Action or

DEC =? LNG

Headway Sensor Long. Automaton

Left Sensor f

OR

Right Sensor

LAT

Speed Sensor Lateral Automaton

Therefore, by adding one conditional rule to the OR-ed feedback signal, we obtain a

combined environment response to be applied to the automata. This single response is used to

update the action probabilities in both automata using the reinforcement schemes described in the

next section.

As described in previous chapter, the environment response is used to update the action

probabilities in an automaton. The update algorithm, called the reinforcement scheme, is the key

to automata learning. In Chapter 3, we introduced the learning paradigm and related definitions.

As we describe in more detail in Chapter 6, the reinforcement schemes are categorized according

to their nature and the behavior of the automaton. There are four different reinforcement schemes

used throughout this dissertation. These are:

Multi-teacher general absolutely expedient (MGAE) scheme,

Linear reward-penalty LR-P (with a b),

NLH, an absolutely expedient nonlinear scheme.

The first algorithm is a well-known linear reinforcement scheme, one of the first LA

algorithms [Bush58]. The second algorithm is given by Baba [Baba83, Baba85], and also is valid

for S-model environments. The last two reinforcement schemes are extensions of the first two,

and have desirable characteristics for our application. They are the direct results of our attempts

to create reinforcement schemes with desirable characteristics suitable for this study of learning

automata applications to intelligent vehicle control. The detailed description of the algorithms can

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 55

be found in Chapter 6. Here, we will briefly mention their characteristics and relative properties.

Table 4.7 gives the general description of the above mentioned schemes.

LR-P - Linear - Expedient

- Equal learning [Narendra89]

parameters (a = b)

- Nonprojectional

LR-P - Linear - Expedient, and optimal -Faster convergence

- Unequal learning in an ideal environment with same learning

parameters (a b) (See Chapter 6) parameter a.

- Nonprojectional - Optimal (equilibrium)

solution is guaranteed

in N-person nonzero-

sum games

MGAE - Nonlinear - Absolutely expedient

- Projectional [Baba85]

- Applicable to S-model

NLH - Nonlinear - Absolute expedient (See - Faster convergence

- Projectional Chapter 6) than MGAE

- Applicable to S-model

Table 4.7. Properties of the reinforcement schemes.

The second algorithm is slightly different than the first one, and has not been studied in

detail previously. However, the convergence characteristics is better when the same learning

parameter a associated with reward is used. Furthermore, LR P is proven to guarantee

convergence to an equilibrium point3 in N-person games. This is a property we would like to use

for analyzing the behavior of multiple autonomous vehicles. The last algorithm is an extension of

the third one, again resulting in a faster convergence in probability updates. These two schemes

are also proven to be absolutely expedient, meaning that the decision tends to a better one at

every time step.

The first simulation example shows a single autonomous vehicle traveling faster than (nine) other

vehicles on a 3-lane highway. All other vehicles are assumed to be cruising at fixed speed of

80kmh without changing lanes. The autonomous vehicles desired speed is set at 85kmh. Sensors

limits are fsr = 15m and sr1 = sr2 = 10m; the permitted speed deviation is 1kmh. Linear reward-

penalty scheme LR = P is used for reinforcement (see Table 4.8 for all other parameter settings). At

3

Or Nash equilibrium, where no player has a positive reason to change her strategy.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 56

t = 0, the autonomous vehicle, traveling in lane 3, approaches the group of vehicles cruising at

fixed speed of 80kmh (Figure 4.5a; the vehicles are traveling from left to right). It immediately

changes lane to avoid a collision. It detects another slow moving vehicle in lane 2 at

approximately t = 10sec and slows down to keep a 15-meter headway distance (see Figure 4.6a).

At approximately t = 14sec, it shifts to lane 1. Immediately after the lane change, the speed is

adjusted to desired value of 85kmh. At t = 34sec and t = 45sec, two shift-to-left-lane actions are

carried out, again due to the presence of slow moving vehicles in front (Figures 4.5d-e). During

these maneuvers, the vehicle speed is not decreased, because the probability of the lateral action

SL reached 1 much faster than that of the longitudinal action DEC. The reason for this behavior is

the fact that the lateral action SR receives penalty at all times due to the presence of another

vehicle in the right lane, or due to the fact that the vehicle is in the rightmost lane.

After two shifts to the right lane (Figures 4.5f-h), the vehicle finds itself again in lane 1,

traveling behind another vehicle. The speed is adjusted to match the speed of the vehicle in front

(Figure 4.6). At this point, the only lateral action receiving reward is SiL, and this action is fired

repeatedly. Longitudinal automata, on the other hand, will chose the action necessary to keep the

headway distance over 15m. Whenever the headway distance is larger than this predefined limit,

the vehicle will attempt to increase its speed to 85kmh. It cannot move out from the pocket

created by two slow moving vehicles.

Lane 3

Lane 2

t=0s t=3s

(a) Lane 1 (b)

t=14s t=33s

(c) (d)

t=45s t=61s

(e) (f)

t=74s t=90s

(g) (h)

black colored vehicles are cruising at constant speed.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 57

90

4

88

86

84

3

82

80

2

78

76

74

1

72

0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

time (sec) time (sec)

(a) (b)

Figure 4.6. Automated vehicles (a) lane position and (b) speed (See also Figure 4.5).

The second and third simulation results show the behavior of a vehicle following a slowly

moving vehicle. Both simulations have been started with an initial headway distance of 30m

(Figure 4.7). Front sensor range fsr is set to 15m. The memory vector size is the same as the

processing speed (25), and this results in speed changes every second at the least. The only

difference between two simulations is in the desired speeds; 81kmh in the second, and 85kmh in

the third simulation. The headway distance and speed of the automated vehicles are shown in

Figures 4.8 and 4.9.

30m

t=0s

vehicle in lane 2.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 58

35 90

88

30

86

84

25

82

20 80

78

15

76

74

10

72

5 70

0 20 40 60 80 100 120 0 20 40 60 80 100 120

time (sec) time (sec)

(a) (b)

Figure 4.8. (a) Headway distance and (b) speed of an autonomous vehicle following another

slowly moving vehicle: the desired speed of the autonomous vehicle is 81kmh while the vehicle

in front cruises at 80kmh.

35 90

88

30

86

84

25

82

20 80

78

15

76

74

10

72

5 70

0 20 40 60 80 100 120 0 20 40 60 80 100 120

time (sec) time (sec)

(a) (b)

Figure 4.9. (a) Headway distance and (b) speed of an autonomous vehicle following another

slowly moving vehicle: the desired speed of the autonomous vehicle is 85kmh while the vehicle

in front cruises at 80kmh.

As seen in the above figures, both vehicles are able to avoid a collision4; the headway

distance is never too close. However, the response of the automated vehicle is oscillatory. The

longitudinal automaton sends the action ACC whenever the headway distance is larger than 15m

4

This is, of course, possible provided that the iteration rate is fast enough or memory vector is relatively short.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 59

to reach the desired speed of 85kmh. The action DEC is the choice when the distance is less than

the predefined limit. Also, the oscillation amplitude is higher when the relative speed is higher.

Although the oscillations are smaller for the desired speed setting closer to the speed of the

vehicle in front, the behavior is unstable, as seen in Figure 4.7.

# Parameters (m) Speed

a b fsr sr2 sr1 (Hz) lateral long.

1 0.15 0.10 15 10 10 25 25 25 4.5, 4.6

2 0.15 0.10 15 10 10 25 25 25 4.7, 4.8

3 0.15 0.10 15 10 10 25 25 25 4.7, 4.9

4 0.15 0.10 15 10 10 25 25 25 4.10

Table 4.8. Parameter settings for simulations.

Although not shown here, the behavior of the vehicles change with different parameters.

The choice of sensor limits and learning coefficients as well as the iteration speed and the size of

the regulation buffer are important factors.

Another important issue is the fact that the automated vehicles in simulations 2 and 3

chose to adjust their speed (back and forth) to keep a safe distance instead of shifting to lane 1 or

lane 3 to evade the slow moving vehicle. Since the only lateral action receiving a penalty from the

front sensor module is SiL, the probability of the two other actions SL and SR reach 0.5. This

results in a memory buffer filled with (approximately) equal number of two different actions. The

vehicle cannot decide which action to take.

Figure 4.10 shows the initial and final positions, numbers and speeds of ten autonomous

vehicles. The vehicles are traveling from left to right on a 500m circular highway. Final speeds

shown in Figure 4.10(f) are equal to desired speeds set initially. Simulation parameters are again

given in Table 4.8.

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 60

t=1s (a)

2 3 5 8

4 7 9

1 6 10

(b)

85 80 80 80

80 80 80

80 80 80

(c)

t=243s (d)

7 8

1 3 9 4 2

10 5 6

(e)

83 78

82 82 81 79 80

80 80 80

(f)

ch4m1.mpg

(1.230Mb, every frame represents one fifth of a second)

(g)

Figure 4.10. Initial and final positions (a, d), numbers (b, e), and speeds (c, f) of ten automated

vehicles traveling on a 3-lane 500-meter circular highway. The mpeg movie of the simulation is

accessible via icon (g).

As seen in figure above and the mpeg movie, vehicles with higher speeds are slowly

moving ahead of the others. All vehicles are able to avoid collision, either by slowing down or

shifting lanes. In the end, all vehicles have reached their desired speeds, since they were all able to

shift to an open lane, sooner or later during the simulation. However, the paths taken to overtake

slower vehicles are not the shortest and/or quickest paths.

The simulation results given in the previous section are indicative of several important issues. First

of all, assuming that the regulation layer can respond to the planning layers requests in time, and

that the real dynamic behavior of this system as well as actual sensor are not drastically different

than the simplified model here, our planning layer intelligent controller is capable of safely

directing the vehicle(s). Of course, the processing speed and the learning parameters need to be

adjusted to guarantee timely responses to environmental changes. Considering the current

Cem nsal Chapter 4. Application of Learning Automata to intelligent Vehicle Control 61

technology in computing, the iteration speed is not a big concern. Learning parameters must be

chosen large enough to guarantee fast learning for the desired iteration speed while avoiding

values larger than 0.3 for linear schemes, and 0.1 for nonlinear schemes in order to avoid

unnecessarily large action probability update of a non-optimal action probability to 1. The length

of the memory vector is another factor in firing actions. This buffer is definitely needed for

lateral actions, in order to avoid continuous lane changes, while its size can be decreased down to

1 for longitudinal actions provided that the speed changes are continuous.

Although there are no collisions with carefully selected parameters to obtain the necessary

convergence speed, the path and/or behavior of the vehicles are far from optimal. The

autonomous vehicle in simulation number 1 did not possess the information to help it avoid a

pocket created by two slow moving vehicles. The decisions are based on local data, and therefore

cannot guarantee global solutions for the vehicle, or the traffic as a whole for that matter.

Conflicting decisions are expected.

The observed oscillations in the headway distance is due to the discrete time control of

the speed, and the defined front sensor limit. For a fast processing speed and a more realistic

speed controller design, the problem may be relatively insignificant. However, with the current

definition of the headway module, the vehicles will keep decelerating even though the headway

distance is improving.

Furthermore, unnecessary lane changes can be observed in simulation number 4. This is

again due to the fact that the vehicles do not have an evaluation method for the desired lane when

such a decision is needed. This fact is also the cause of the behavior seen in simulations 2 and 3.

When lateral actions SL and SR are both good, the vehicle is unable to change lanes using the

current method of firing actions when the memory vector/buffer is full (see Section 4.1).

Initial simulations show promising results, but there is a need for more elaborate sensor

definitions, and for more complex decision rules. The next chapter will address these upgrades,

and other related issues.

It is interesting to note that again a task performed easily by human beings proves to be

very complicated for a machine. Even a simple decision of shifting lanes in order to avoid being

stuck in a packet of slow moving vehicles (simulation 1, Figure 4.5h) requires complicated

sensory hardware and, possibly, a complex communication mechanism. However, a human driver

can simply look farther ahead toward the adjacent lanes, and decide to slow down in order to find

an opening. Research currently conducted on AHS aims to solve such problems with extensive

vehicle-to-vehicle and/or vehicle-to roadside communications [Construction96, Lygeros94],

multiple sensors and sensor fusion [Agogino96] and a hierarchical control structure [Varaiya91,

Varaiya93, Lygeros95, Lygeros96]. When, the hierarchical system and all vehicles are

connected, they will collectively reach a solution. However, such an architecture consequently

requires an expensive implementation. The problem of making decisions based on local

information would exist even in a full AHS environment unless the system is fully hierarchical.

Chapter 5

Capabilities

The intelligent controller defined in the previous chapter, although capable of controlling an

automated vehicles path using local information, is far from optimal in the sense that the

resulting safe path is not always the shortest or fastest path, nor a good choice for improving

the overall throughput of a segment of the automated highway. Also not seen in the simulation

examples, a pinch maneuver condition where two vehicles changes lanes to occupy the same

spot on the highway (See Section 5.2.2) is possible when only the immediate neighborhoods of

the vehicles are considered. In this chapter, we will extend our assumptions on the capabilities of

the intelligent vehicles. These changes can be grouped into three different classes: extending the

capabilities of the sensors, creating more complete (and complex) decision rules, i.e., extending

the capabilities of the teachers, and adding more sensor/teacher modules. All these changes are

described in the following sections.

Extensions on sensor capabilities require more complex and expensive hardware

implementations, and extending the capabilities of the teachers leads to more complex evaluation

rules for teacher modules as we describe in the following sections. These include new definitions

of front and side sensors, more rules for headway and side sensor modules, as well as interaction

of lateral and longitudinal automata. Two new sensor modules are defined in this chapter, namely

the pinch and lane sensor-module pairs. The assumptions and physical implementations of these

sensors as well as the flag structure used in the decision modules are described.

The front sensor of an automated vehicle is defined as a range sensor in Chapter 4. Without

changing our assumptions on the sensor hardware, we will extend the decision capability of the

front sensor module by defining multiple headway ranges and by using consecutive range data to

obtain rate of change of the headway distance as well as its current value. For the side sensors, we

assumed only that these were capable of detecting a vehicle in a predefined range. Extensions on

the side sensor modules in this chapter require more capable and therefore complex sensor

structures.

Again assuming that the sensor mounted in the front of the vehicle can detect the headway

distance, reward-penalty rules in the headway module are redefined as shown in Table 5.1. The

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 63

cases where the output of the sensor has an overriding characteristic are indicated by asterisks.

There are now three different range values used in the feedback decision. The value fsr defines the

sensor range, or this value can be visualized as the limit of the region of interest. Any

measurements equal or greater than this value results in a reward response for all longitudinal and

lateral actions. As seen in Figure 5.1, two additional parameters d1 and d2 with d1 < d2 < fsr are

used to define a new buffer region where the sensor modules response also depends on the rate

of change of the headway distance.

Sensor Status

Vehicle in Vehicle in region B Vehicle in region C No vehicle

region A (d1 < dist. < d2 ) (d2 < dist. < fsr) in range

Current (dist. < d1) Vehicle is Vehicle is Vehicle is Vehicle is (dist. fsr)

Action approaching NOT approaching NOT

approaching approaching

SiL 1 1* 0 1* 0 0

SL 0** 0 0 0 0 0

SR 0** 0 0 0 0 0

SM 1 1 0 0 0 0

ACC 1 1 1 1 0 0

DEC 0 0* 0 0 0 0

Asterisks are used to distinguish the cases where additional functions need to be used (See Section 5.3.3

for details).

Table 5.1. Output of the Front Sensor block (Also see Figure 5.1).

The lateral action SiL is discouraged when the vehicle in front is dangerously close. The

parameter d1 defines this safety region. The same action also receives penalty when the headway

distance is larger than d1, but less than the safe distance defined by parameter fsr. However, for

the region between these two values, the action is penalized only when the rate of change of the

headway distance is negative, i.e., if the headway distance decreases. All other lateral actions

receive reward from the front sensor module.

Longitudinal action ACC is discouraged when a vehicle enters the front sensor range, and

is approaching. If the headway distance is under d2, this action is penalized regardless of the rate

of change of the measured distance. A vehicle should not increase speed while sensing another

vehicle in proximity. The differentiation of feedback response for the range [d2 fsr] enables the

automated vehicle to catch up with the vehicle in front. If the headway distance is too close, or

if the vehicle in front is approaching, the action SM also receives penalty.

Note that if we choose the parameters as d1 = d2 = fsr, the reward-penalty structure of

Table 5.1 becomes the one previously given in Table 4.4. By defining these new regions where the

feedback response is different for different longitudinal actions, we hope to obtain a smoother

response, and thus remove the oscillations in the headway distance while following a slowly

moving vehicle (See examples in Figure 4.8 and 4.9). The reward response of the front sensor

module for longitudinal action ACC in region C facilitates speed recovery.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 64

sr1 sr2

E D E

A B C

d1

d2

fsr

Again, consider the situation given in Figure 4.7 where the automated vehicle follows a

slow-moving vehicle. The initial headway distance is still 30m, the sensor parameters are chosen

as shown in Table 5.2. The automated vehicle has a desired (and initial) speed of 85kmh while the

other cruises at 80kmh. All other parameters are set to the same values as in the second and third

examples in Chapter 4 (Table 4.8). The regulation layer changes velocity after receiving the same

action 25 times consecutively, and the processing speed is 25 iterations per second.

Number d1 d2 fsr

1 5.2 10 20 30

2 5.3 10 15 20

3 5.4 12 15 18

4 5.5 14 15 16

5 5.6 12 15 16

Table 5.2. Front sensor parameters for simulations.

The vehicle speeds and the headway distances for the five simulations with the above

parameters are given in Figures 5.2-5.6. Only the sensor range definitions are changed to see the

effect of various settings. As seen in Figures 5.2 and 5.3, it is possible to prevent the oscillations

in the headway distance. Usually, the headway distance stabilizes at a value between d1 and d2

when the automated vehicle approaches a slow-moving vehicle. As seen in Figure 5.2, the

regulation layer changed the speed 5 times to match the speed of the vehicle in front, in

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 65

approximately 6 to 7 seconds. Again, note that a maximum of one change per second was

possible in these simulations.

90 35

88

30

86

84

25

82

80 20

78

15

76

74

10

72

70 5

0 10 20 30 0 10 20 30

time (sec) time (sec)

Figure 5.2. Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 10m, d2 = 20m, and fsr = 30m.

90 35

88

30

86

84

25

82

80 20

78

15

76

74

10

72

70 5

0 10 20

0 10 20

time (sec)

time (sec)

Figure 5.3. Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 10m, d2 = 15m, and fsr = 20m.

Figure 5.4 shows another vehicle with much shorter sensor ranges than the previous ones.

Again starting at the same initial conditions, the vehicle is able to match the speed of the vehicle in

front, and keep a safe distance. Since the parameter fsr is relatively shorter than previous

simulation runs, the vehicle had less time to adjust its speed starting at only 18m. As a result of

very rapid changes (only action DEC is permitted after t = 10sec since the headway is less than d2

and the vehicle is approaching), the headway decreased under d1 = 10m and the vehicle speed

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 66

below 80kmh. For approximately 6 seconds, the automated vehicle cruised at 79kmh in order to

permit the headway to increase to a safe distance. At t 30sec, right after the headway distance

becomes equal to d2, the speed is matched. In this case, the region [d2 fsr] is where the steady-

state value is obtained. After t = 40sec, the vehicle ahead changes its speed to 83kmh, then back

to 80kmh. As seen in Figure 5.4, this speed change is quickly matched, and the headway distance

reaches its steady-state a few seconds after the vehicle in front reaches its cruising speed of

80kmh (at t 60sec).

90 35

88

30

86

84

25

82

80 20

78

15

76

74 10

72

70 5

0 10 20 30 40 50 60 0 10 20 30 40 50 60

time (sec) time (sec)

Figure 5.4. Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 12m, d2 = 15m, and fsr = 18m.

When we attempt to decrease the sensor regions further to [d1 d2] = [14 15] and

[d2 fsr] = [15 16], oscillations resulted as seen in Figure 5.5. The regions where different actions

are encouraged based on the distance and rate of change of distance are so small that before an

intelligent decision can be made, the value of the headway distance moves out of the buffer

region. Again, it is important to note that an ACC or DEC action is fired every second in the best

case. For a much shorter memory buffer as well as a faster iteration speed, the response will be

quicker, and therefore these sensor values may be feasible for non-oscillatory response.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 67

90 35

88

30

86

84

25

82

80 20

78

15

76

74

10

72

70 5

0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

time (sec) time (sec)

Figure 5.5. Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 14m, d2 = 15m, and fsr = 16m.

Keeping the same processing speed and memory vector, we try to solve the problem by

enlarging the lower sensor region, i.e., by decreasing the value of d1 to 12m. The results are

shown in Figure 5.6. The response is not as damped as in simulations 1 and 2, but a steady-state

value is reached. Again, a change in the speed of the vehicle in the front (at t 64sec) is quickly

matched with a minimal deviation in the headway distance.

90 35

88

30

86

84

25

82

80 20

78

15

76

74

10

72

70 5

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70

time (sec) time (sec)

Figure 5.6. Headway distance and speed of an autonomous vehicle following another

slow-moving vehicle: sensor parameters are d1 = 12m, d2 = 15m, and fsr = 16m.

From the simulation results above, we see that there is a trade off between the time to

reach a steady-state value for headway distance and the permitted region for the steady-state

values. For faster processing speeds (which decrease the time to reach a decision in the planning

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 68

layer) and shorter memory buffer (which results in a shorter decision time in the regulation layer),

the regions may be smaller, and therefore the problem less insignificant.

The definition of the front sensor module in Section 4.2 is the limiting case for the

extended version given here. As d1 and fsr approach d2, the definition given here becomes similar

to the one given in Chapter 4, and oscillatory responses are observed.

The idea of having multiple parameters defining the regions [d1 d2 ] and [d2 fsr] is similar

to the idea of defining a boundary layer around a switching surface for sliding mode controllers in

order to stop chattering [Slotine91]. In sliding mode control, additional integral terms are added

to the driving function for the predefined boundary layer. Here, we force the speed to change

according to value and rate of change of the headway distance. Since distance is the integral of

the velocity, the resulting effect is similar. Of course, the control methodology is much more

discrete, and also less structured, than in a sliding mode controller. In the design methodology of

the sliding mode controllers, either the width of the boundary layer or the function parameters are

predefined and the rest adjusted accordingly [Kachroo96c]. Similarly, in our case, the choice of

processing parameters and the sensor regions are sequential, in the sense that we either define the

sensor regions first, and then choose the learning parameters, or vice versa.

The region [d1 d2 ] is effective when the automated vehicle has to slow down; [d2 fsr] is

mainly important in the case of acceleration while closely following another vehicle. This second

region also improves the vehicles braking time in the previous case.

Also note that the length of the memory vector is 25, i.e., at the very best, only one

velocity adjustment per second is permitted. More continuous regulation layer modeling is of

course possible, but not considered here. Since the controlled variable is the distance, not the

speed, and a more realistic model would also include delays, we did not investigate continuous

models. The aim here is to show the importance of sensor range definitions, and to introduce an

intelligent vehicle control method based on learning.

In the previous chapter, the side sensors are described as being capable of detecting vehicles and

obstacles in adjacent lanes. This may be achieved using relatively easy then the method we will

now describe. By using multiple sensor arrays, and combining different technologies such as

sonar, IR, and visual sensing, it may be possible to not only detect the presence, but also the

relative movement of vehicles in the adjacent lanes. Current side sensor implementations include

radar, vision, and IR technologies. Vehicle control systems based on the inter-vehicle

communications do not consider the use of side sensors since all vehicles are assumed to be

controlled by a higher level in the hierarchy or all will be capable of relaying their position

information to other vehicles. Autonomous navigation implementations have not yet reached the

level of capability where inter-vehicle collisions is a consideration.

Millimeter-wave radar sensing [Zoratti95] and short-range vehicle-to-vehicle

communications a more global solution to the problem are currently under development for

prediction of inter-vehicle collisions [Asher96]. Forward looking sensors [Liu89] or binocular

vision systems [Weber96] could also provide some information necessary for lateral movements

[Liu89].

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 69

Assuming that there is a sensing structure possibly an array of sensors that is able to

give information about the relative speed of the neighboring vehicles, the side sensors detection

area is divided into two sections as seen in Figure 5.1. An immediate neighborhood is now

defined. In this region (D) the output of the sensor module depends not only on the detection of

vehicles, but also on measurement of their relative longitudinal movements. The penalty-reward

structure for the side sensor modules is given in Table 5.3.

Actions Vehicle in region D Vehicle in region E

Approaching NOT approaching

SiL 0/0 0/0 0/0

SL 1/0 1/0 0/0

SR 0/1 0/1 0/0

Table 5.3. Output of the left and right sensor modules.

Longitudinal actions are not shown in the Table 5.3 since there are not affected by the side

sensor modules. The regions D and E are defined as shown in Figure 5.1; the parameters sr1, and

sr2 are used to define the detection range of the sensors (or sensor arrays). The parameter

differentiates the region D from the region E. If a vehicle is detected in the region E, and it is not

(longitudinally) approaching the automated vehicle, the penalty response to the lane changing

actions are suppressed. This distinction gives more flexibility to the automated vehicle as we

describe with the following simulation examples.

Figure 5.7 shows five automated vehicles with sensor ranges sr1 = sr2 = 12m, d1 = 8m,

d2 = 11m, fsr = 15m (the side sensor module of Chapter 4 is used in this simulation). The vehicle

in lane 3 wants to merge into the platoon, however the distance between the first two vehicles in

the platoon is only 20m. It slows down in order to change lanes (the behavior forcing this change

is given in Section 5.3.1). The right side sensor module always returns a penalty response for the

action SR, and consequently the vehicle is unable to change lanes. It slowly moves the to the back

of the platoon.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 70

t=0s t=3s

(a) (b)

t=6s t=9s

(c) (d)

t=15s t=19s

(e) (f)

vehicle attempts to shift to the right lane by slowing down.

However, if we employ the extended side sensor module for the same scenario, the vehicle

is able to change lanes. As seen in Figure 5.8, the vehicle in lane 3 slows down to change lanes,

and shifts lanes at t 8.5sec.

In this simulation, the side sensor ranges are sr1 = sr2 = 12m with the additional definition

of = 0.66 , i.e., if the detected range is between 8m and 12m and the neighboring vehicle is not

approaching, the action SR becomes acceptable. Using this decision structure, the gray colored

vehicle shifts to lane 2, and adjusts its speed accordingly. Figure 5.9 shows the headway distances

for the platoon. Before the lane change, the headway distance are steady with minor deviations

due to the random speed changes (See Figure 5.10a). At t 8.5sec, the lane shift occurs and the

vehicle entering the platoon is cruising at a lower speed (See Figure 5.10). Consequently, the

distance between the second and third vehicles decreases; the vehicle which is now the third in the

platoon detects a sudden change in headway distance. As a result, this and other trailing vehicles

slow down to adjust their headway distances. The effect can be seen in Figures 5.9 and 5.10b. The

ripple is moving backward through the chain of vehicles. All vehicles are able to keep their

headway distances within the acceptable range while returning to their desired speeds of 80kmh at

t = 30sec. The steady-state response is due to the new definition of the front sensor module given

in the previous section.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 71

t=0s t=6s

(a) (b)

t=8.5s t=8.8s

(c) (d)

t=12s t=16s

(e) (f)

t=24s t=32s

(g) (h)

vehicle attempts to shift to the right lane by slowing down.

25

20

15

10

5

0 4 8 12 16 20 24 28 32

time (sec)

Figure 5.9. Headway distances in the platoon; lane change occurs at t 8.5sec.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 72

90 500

88

400

86

84

position (m )

300

82

80

200

78

76 100

74

72 0

0 10 20 30 0 4 8 12 16 20 24 28 32

time (sec) time (sec)

(a) (b)

Figure 5.10. (a) Speed and (b) space-time trajectory of five automated vehicles

(Space-time trajectory for vehicle 2 is plotted after the lane change).

In the simulation example given here, the sensor ranges are chosen to illustrate the

advantage of the new definition and the new front sensors ability to stabilize the headway

distance. However, with such small (side sensor) range definitions, the lane shift creates ripples

in the otherwise stable platoon. The main reason for defining multiple regions for the side sensors

is actually to avoid sudden jumps of one vehicle in front of another. The sensor ranges must be

carefully chosen, possibly using adaptive techniques mentioned in Chapter 1.

In addition to the changes made to the previously defined sensors, we now introduce two new

modules which are more difficult to implement, and relatively more global than the previous

definitions. The first module is visualized as a lane detector; the second as a pinch situation

detector. While there are many possibilities for lane detector implementation, sensing a dangerous

situation that may occur during lane changes is probably beyond the local sensing abilities of an

automated vehicle.

In the previous chapter, we assumed that the side sensor modules were inherently capable of

sensing that the vehicle is in the leftmost or rightmost lane. Here, we will separate this function as

a new module, and assume that not only the barriers and/or roadside are detected, but the vehicle

can also sense which lane it is in.

A physical implementation of this module could be a vision system. Several AHS

programs in US, Europe, and Japan use camera images for lateral and longitudinal control (see

[Pomerlau96], [Malik95], [zgner96], and [Construction96] for recent examples). Extracting

the lane information from such an image is becoming more and more trivial. Also, the magnetic

markers used for lateral control may again provide position information. These solutions are still

based on local sensing; another alternative is to feed the present and desired lane information

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 73

to vehicles via broadcast communications, as envisioned by the PATH program (e.g., [Hedrick96]

and [Lygeros96]).

Once an automated vehicle knows its (lateral) position, and assuming that it can obtain the

information on the best choice of lane for its desired speed and path, the decision for lateral

movement is relatively straight-forward. Although current practical implementations do not yet

consider such computationally complicated1 maneuvers, research on control of lane changing

maneuvers is slowly emerging [OBrien96, Wonshik95, Hessburg95]. The difficult part in path

planning is to find the optimal lane position for a specific vehicle.

For our purposes we will assume that an automated vehicle can sense its present lane, and

that it has some idea about where it should be. Based on these two values, the action that leads to

necessary lane shifts is encouraged by the teacher module shown in Figure 5.11. The flag

structure described in Section 5.3.1 is used to force necessary longitudinal actions if the vehicle

has trouble shifting into the desired lane.

=0

ELSEIF Des. Lane > Cur. Lane

Desired Lane AND Action=SL

=0 LANE

Current Lane

ELSEIF Des. Lane < Cur. Lane

Lateral Action AND Action = SR

=0

ELSE

=1

Just like a driver who checks his rearview mirrors and looks to his side before changing lanes, it is

imperative for an automated vehicle to make sure that the next lane is not claimed by another

vehicle. The problem occurs when two vehicles one lane apart shift to the same spot in the lane

between them (Figure 5.12). This situation is defined as pinch condition. Besides the side

sensors returning local data, vehicles signaling to shift lanes must be checked in order to make

sure that a pinch condition does not occur.

1

Computational difficulty here describes the complexity of a good decision-making process for lane maneuvers;

the low-level control of a lane maneuver is possible for most of the current test vehicles.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 74

to the same lane simultaneously.

In the simulations, we use the memory vector to check for other vehicles intention to shift

lanes. If the number of memory locations containing SR or SL is more that half the size of the

vector for vehicle, it is assumed that the vehicle is likely to shift lanes (Figure 5.13). Since there

are three lateral actions, a 50% distribution in the memory vector most likely indicates that an

action will soon be fired. If such an intention signal is received from a neighboring vehicle, the

pinch module sends a penalty response to the lateral automaton for the action that may cause a

problem. In a sense, pinch module in an automated vehicle is driven by the memory vector of

neighboring vehicles. In a real implementation, this corresponds to a signaling vehicle which may

be detected by a vision system, vehicle-to-vehicle communications indicating intended actions, or

a roadside-to-vehicle communication relaying the position of the vehicles. The last possibility is a

global solution to the problem.

m

Lateral

Planning Action

Layer A1 A2 A2 A3 A1 A2

Signal Execute

Figure 5.13. Memory vector/buffer: If an action fills half of the buffer, it is signaled.

Figure 5.14 shows two vehicles cruising at the same speed. Initially, vehicle 1 is in the

rightmost lane, vehicle two in the leftmost lane. However, vehicle 1 needs to shift to the leftmost

lane, while the other wants to change to the rightmost lane. Figure 5.15 shows the lane and speed

of the two vehicles versus time. Assuming that their initial positions in Figure 5.14 correspond to

time t = 0, both vehicles are unable to change lanes until t 3sec, then vehicle 1 shifts to the

middle lane.

The reason for the delay is the pinch module in both vehicles. Without a detection method

for this condition, the vehicles would change to the middle lane at approximately the same time,

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 75

and the result would be a collision. As seen in Figure 15.6, both vehicles lateral memory vector is

initially filled with the idle action SiL. Since both vehicles are forced to chose the necessary lateral

actions by the lane module, the percentages of the actions SL and SR in the memory buffers (of

vehicle 1 and 2, respectively) increase. The percentage of the action SL in the memory buffer of

vehicle 1 reaches 50% slightly faster than that of action SR in the vehicle 2 (Figure 5.16). As a

result, the lateral action SR of the vehicle 2 starts receiving penalties earlier than vehicle 1, and

the probability of this action decreases. The result is also a decrease in the percentage of the

action SR in the memory buffer. The percentage of vehicle 1s SL action also drops (until n = 40,

or t = 1.6sec), but never under 50%. Once the intention signal from vehicle 2 is stopped, action

SL fills the memory buffer, and is fired at n = 71 (t 2.84sec). The memory vector is then reset

to all-SiL.

90

4

88

86

3 84

82

80

2

78

76

74

1

72

0 1 2 3 4 0 1 2 3 4

time (sec) time (sec)

(a) (b)

Figure 5.15. Lane (a) and speed (b) of two vehicles from t = 0 to t = 4sec.

This new module enforces safe lane transitions; either one of the vehicles temporarily

suppress its desire to shift lanes (i.e., a penalty response from the pinch module overrides all other

reward responses from lane and side sensor modules by ORing), or both vehicles are hindered

from changing lanes to achieve their desired paths.

As seen in Figure 5.14, the two vehicles wont be able to exchange lanes as long as they

cruise at the same speed. Some bottleneck situation might also occur in a pinch situation where

both vehicles have similar action ratios in their memory buffers. To overcome this problem, more

complex decision rules need to be defined. These are explained in the next section.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 76

-1

0 10 20 30 40 50 60 70 80 90 100

time (1/25 sec)

Number of actions in the memory buffer

25

20 SiL

15 SL

10

5 SR

0

0 10 20 30 40 50 60 70 80 90 100

time (1/25sec)

-1

0 10 20 30 40 50 60 70 80 90 100

time (1/25 sec)

Number of actions in the memory buffer

25

20 SiL

15 SR

10

5

SL

0

0 10 20 30 40 50 60 70 80 90 100

time (1/25sec)

Figure 5.16. The output of the pinch module and the actions in the memory

buffer versus time for vehicle 1 (top) and vehicle 2 (bottom).

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 77

With the addition of new sensor modules and the extension of the previous ones, several changes

need to be made to fuse all sensor/teacher responses. In this section, we define the flags used

to temporarily alter the behavior of the vehicles in order to obtain a better path or to avoid

bottleneck situations. The definition of the new mapping function and necessary changes are also

given here.

The lane flag is defined in order to enable an automated vehicle to take an action if it cannot

match its desired lane in a pre-specified time interval. A specific module keeps track of the time

passed after a desired lane value is set (by on-board or roadside decision makers). If the vehicle

cannot change to its desired lane in the predefined time interval, then the lane flag is set. The

effect of this flag is to temporarily decrease the value of the desired speed by some fixed amount.

As a result, the vehicle slows down and hopefully finds an opening to change lanes. Once the

vehicle reaches its desired lane, the flag is reset and the vehicle slowly increases its speed to match

the previous desired speed.

Consider the situation shown in Figure 5.17(a) where the velocities of the vehicles 1, 2

and 3 are 85, 80 and 80mph respectively. The desired speed for each vehicle is the same as its

current speed. Vehicle 1 slows down to match the speed of vehicle 3 to avoid a collision, however

it is unable to move away from the pocket created by vehicles 2 and 3. Assume that vehicle 1

knows that it has to be in lane 3 because its desired speed is faster than others. Also assume that

an automated vehicle is capable of sensing its current lane. In the simulation run, the desired lane

of vehicle 1 is changed to lane 3 at time t = 8sec. Since the vehicle is unable to shift lanes, after 4

seconds (the predefined time for lane changes) the lane flag is set, forcing the vehicles desired

speed to be redefined as 80-3 = 77mph. The amount of the change in the desired speed (3mph) is

also predefined, and may be a function of current vehicle speed, average speed of other vehicles in

the neighborhood, headway, and/or current lane and desired lane. Once the flag is set (at t =

12sec), the speed of the vehicle 1 starts decreasing to match its new desired value (Figure 5.17c).

As a result, the vehicle finds an opening once its left sensor(s) clear vehicle 2, and shifts lane at t

27sec (Figure 5.17d). Upon reaching the desired lane, the flag is reset, and the speed again

matches its previous desired value of 85mph (Figure 5.17c).

This lane flag is also used to get around the bottleneck situation given in Figure 5.14. For

the vehicles to exchange lanes, they have to change their speeds to find an opening on their side.

For this, the lane flag is designed to affect a vehicle that needs to shift to the right lane slightly

differently than a vehicle that needs to shift left. The value of the adjustment in desired speed

with the lane flag is chosen differently to break the symmetry. Again, the temporary adjustment

value for the desired speed may be a function of the present and desired lanes, solving the

symmetry problem.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 78

1 2 3

t=0s t=60s

Lane 1

(a) (b)

Vehicle Speeds (kmh) Vehicle Lanes

90

4

88

86

84

3

82

80

2

78

76

74

1

72

0 10 20 30 40 50 60 0 10 20 30 40 50 60

time (sec) time (sec)

(c) (d)

ch5m1.mpg

(0.273Mb; every frame represents one fifth of a second)

(e)

Figure 5.17. Initial (a) and final (b) situation, speed (c) and lane positions (d) of three

automated vehicles traveling on a 3-lane 500-meter circular highway. The mpeg movie

of the simulation is accessible via icon (e).

Similar to the method described above, another flag to temporarily change the desired lane is

defined. If desired lane is not set, an adjoining lane is selected. For this flag, the predefined time

interval must be larger than that of the lane flag since the time to reach the desired speed is much

longer than the time required to change lanes.

A specific module keeps track of the time passed after the current speed deviates from the

desired speed for the first time If the vehicle cannot adjust its speed in the predefined time

interval, then the speed flag is set. The effect of this flag is to send a penalty response to the

lateral action SiL. Consequently, the vehicle changes to the left or right lane, if there is an

opening. Once the vehicle changes lane, the speed flag and the time counter are reset for the next

interval.

Consider the situation in Figure 5.18(a) where an automated vehicle (#1) with desired

speed 85mph is trailing another one with 80mph cruising speed. Vehicle 1 is able to slow down

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 79

and match the speed of vehicle 2, but it is unable to travel at its desired speed of 80mph (Figure

5.18(e)). The predefined time interval to match the desired speed is 10sec in this case. The count

starts at t = 5.32sec (the + mark in Figure 5.18(f)), and the time expires at t = 15.32sec (S in

Figure 5.18(f)) while the automated vehicle is keeping a safe distance from the vehicle in front.

Once the flag is set, the action SiL, which is normally encouraged to avoid unnecessary lane

changes, starts receiving a penalty. The probability of one of the actions SR and SL, in this case

SR2, approaches one, and the action SR is fired at t = 18.96sec. The flag is reset when the vehicle

completes the lane change maneuver at t = 19.52sec (o in Figure 5.18(f)). Immediately after the

lane shift, the vehicle increases its speed to match the desired value.

t=0s t=19s

(a) (b)

t=19.6s t=35s

(c) (d)

90

4

88

86

84 3

82

80

2

78

76

74

1

72

0 10 20 30 0 10 20 30

time (sec) time (sec)

(e) (f)

Figure 5.18. Snapshots (a-d), speed (e) and lane positions (f) of two automated vehicles traveling

on a 3-lane 500-meter circular highway.

Addition of new modules and extending the capabilities of the previous sensors require changes

be made to the function mapping multiple teacher outputs to a single automaton input. Although,

2

Using the additional rule defined in Section 5.3.3.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 80

the structure of the front sensor module is changed, the mapping that combines the teacher

outputs for longitudinal actions does not change; the rules are exactly the same as the previous

case given in Table 4.7 and Figure 4.4.

For the lateral actions, the changes in the front sensor and the addition of lane and pinch

modules affect the mapping. The new function generating the automaton input is more complex

than a single OR gate. For the lateral action SiL, the combined output is again an OR-ed

combination of sensors except for one case. When the lane module sends a reward signal to the

lateral automaton, but the headway module indicates an approaching vehicle, the action SiL must

be penalized, as is the case with a simple OR function. However, in order to avoid unnecessary

lane changes, the penalty response of the headway module is suppressed whenever the

longitudinal action is DEC for headway distances not dangerously close (Table 5.4).

Module Output

Actions Right/Left Lane Headway Pinch Combined

0 0 0

0 1 1

SiL always 0 1* always 1 or 0

0 1 0 0 1

1 1 1

1 1* 1

0 0 0 0 0

0 0 0 1 1

0 0 0** 0 0

0 0 0** 1 1

0 1 0 0 1

0 1 0 1 1

SR 0 1 0** 0 0

or 0 1 0** 1 1

SL 1 0 0 0 1

1 0 0 1 1

1 0 0** 0 1

1 0 0** 1 1

1 1 0 0 1

1 1 0 1 1

1 1 0** 0 1

1 1 0** 1 1

Combined output depends on the longitudinal action (See text and Figure 5.19).

All combined outputs for SR and SL are OR-ed side, lane and pinch module outputs except

this one, if 0 and 0** are assumed to be the same binary value, FALSE.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 81

For actions SR and SL, again an additional if-then condition needs to be added for one

case. Consider a situation where the side sensors and pinch modules indicate a reward for lane

shifting actions SR and SL. The lane module, on the other hand, favors lateral action SiL because

the vehicle is currently cruising in its desired lane. However, there is another vehicle in very close

proximity of the vehicle and the front sensor module sends a reward signal to indicate the need for

a lane change (if the headway distance is under the predefined limit d1, we assume that the action

DEC has failed to avoid collision). This reward signal must override the lane sensors penalty

signal (indicated with 0 in Table 5.4). The resulting mappings from multiple teacher outputs to

a single automata input are given in Figure 5.19.

Headway Sensor =0

LNG

Long. Automaton

Left Sensor ELSE = f

Right Sensor f

OR

Speed Sensor

IF act.=DEC & Hdwy = 1* &

Pinch Sensor Lane=0

Lane Sensor =0

LAT

ELSEIF Hdwy=0* & Pinch=0 & Lateral Automaton

(Right=0 | Left=0)

Long. Action =0

ELSE

=f

In addition to the rules described above, one more subroutine is defined to check the

memory buffer interface to the regulation layer. Currently, if an automated vehicle senses a

decreasing headway and nothing on the sides with no useful information from the lane module, it

cannot decide whether to shift left or right. The reason for that is the lack of global information;

the vehicle using only its local sensing devices cannot distinguish between the left or right lane,

and the probability of both actions reach 0.5. To overcome the problem, a symmetry breaking rule

which randomly chooses one of the actions is introduced. The overall structure of the intelligent

planner is given in Section 5.4.1.

With all the additions described in this chapter, the planning layer of an automated vehicle is now

more complex and robust than what we started with. The presentation of the final structure of the

automata-teacher model, the discussion of the environment, the effect of parameters, and some

final thoughts on the design will conclude this chapter.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 82

The final design of our automata-teacher planning layer consists of two automata one for

lateral, the other for longitudinal actions a total of six teacher modules reporting on headway

distance, speed deviation, left and right sensor detection, lane deviations and possible pinch

conditions. The two flags described in Section 5.3 are used to provide additional information to

speed and lane detection modules (Figure 5.20). There are three additional rules that need to be

implemented, besides the logical OR gate, in order to guarantee collision-free path control.

Furthermore, the automata are interconnected, not only through the physical environment,

but also by the fact that the current output of the longitudinal automaton is used as a condition for

the lateral automaton (Section 5.3.3). This is not necessarily important for collision avoidance, but

it illustrates the fact that lateral and longitudinal actions cannot be treated as fully decoupled

actions.

distance

Headway F

Flag structure current

Lane Speed Detection

desired Longitudinal

If-then

Left Detection Automaton To Reg.

binary 1 & 2 Memory Layer

OR

binary Buffer

Right Detection Lateral

If-then

Automaton

current

Speed Lane Detection

desired

Pinch Detection

binary

Environment

of mapping functions and flag structures.

Although not important for our design purposes, we chose to disable speed changes

during lane shifting. Once the lane change is completed, the probability update for longitudinal

actions continues. The longitudinal vehicle dynamics are modeled as simply as they can be;

however, we assert that realistic vehicle dynamics will not affect the overall system behavior since

the regulation and physical layers will react to fired actions fast enough. Although not

instantaneous, speed changes of 1m/s are not hard to obtain. For lateral movements, we assumed

that an automated vehicle completes a single lane change so that the maximum lateral acceleration

is less than or equal to 0.5m/sec2.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 83

The automata use the algorithms given in Chapter 6 as reinforcement schemes. The

learning parameters are adjusted to give fast enough learning while keeping them relatively low to

avoid instantaneous learning.

There are many parameters affecting the behavior of an automated vehicle. In this section, we will

give simulation examples illustrating the effects of parameters such as the length of the memory

buffer, the definition of sensor ranges, and the learning parameters. The sample runs given here

are simulations of platoons; only longitudinal actions are considered for demonstration purposes.

The effects of the learning parameters on lateral actions is similar. Values of the parameters for

each simulation is given in Table 5.5. General linear reward-penalty scheme is used in all

simulations.

# Param. (m) Speed buffer size Figures

a b fsr sr2 Sr1 (Hz) longitudinal

1 0.15 0.10 16 15 12 25 25 5.21(a), 5.22(a)

5.23

2 0.15 0.10 16 15 12 25 9 5.21(b), 5.22(b)

5.23

3 0.15 0.10 20 15 10 25 25 5.21(c), 5.22(c)

5.23

4 0.05 0.05 16 15 12 25 9 5.21(d), 5.22(d)

5.23

5 015 0.10 18 15 10 25 25 5.24a, 5.25a,

5.26

6 0.15 0.10 18 15 10 25 9 5.24b, 5.25b,

5.26

7 0.15 0.10 18 15 10 25 9 5.27a

8 0.15 0.10 18 15 10 25 9 5.27b

Table 5.5. Parameter settings for simulations.

Figures 5.21 and 5.22 show the average platoon speed and the distances between vehicles

for four platoons of ten vehicles, each using a different set of parameters. Positions and speeds of

the vehicles for t = 0 and t = 29.6sec are given in Figure 5.23. All the simulations (#1-4) are run

separately, and then the data combined for illustrations. Each platoon corresponds to a simulation

run. Figure 5.23 also includes the mpeg movie of the four runs. The initial positions and speeds of

the vehicles are the same for all runs as is the processing speed of 25Hz. The desired speed for all

vehicles is 80kmh.

As seen in Figure 5.23 the positions of the vehicles at t = 29sec and their speed are

different for each simulation run. All values plotted in Figures 5.21, 5.22, and 5.23 are not

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 84

averaged over a large number of runs with the same parameters, but the result for each set of

parameters shows characteristics of a particular choice of parameters.

Simulations 1 and 2 (Figures 5.21a-b and 5.22a-b) illustrate the effect of the length of the

memory buffer on the vehicle following behavior of an automated vehicle. For the same initial

separation and velocity, the velocities of the vehicles reach a steady-state much faster when the

memory buffer is shorter. The separation between vehicles is more uniform with a larger memory

buffer. This is due to the fact that the system with the shorter buffer reaches the permitted region

[12m 18m] for the headway distance faster. Once the desired region is reached, the reaction to the

change in the feedback response is also faster. As a result, the separation between vehicles is more

uniform for a large memory buffer, however the response is not stable; the opposite is true for a

shorter memory vector length.

Changing the sensor range while keeping other parameters constant also affects the

behavior of the vehicles in the platoon, as expected. Simulations 1 and 3 differ only in the choice

of the values for d1 and fsr. As seen in Figures 5.21(a, c) and 5.22(a, c), by increasing the region

around the switching value d2, it is possible to obtain a steady-state speeds 15 sec after the start of

the run. The average speed of the platoon is still under the desired value of 80kmh since the

headway distance is still too small for the action ACC to fire for some of the vehicles. Eventually

all vehicles will reach their desired speed provided that the headway distance is above the safe

value. An example of this behavior is seen in Figures 5.21c and 5.22c: as soon as the headway

distance grows larger than d2 = 15m, at t 29sec, one of the vehicles increases- its speed, and the

effect on the average platoon speed is seen. Furthermore, it is important to note that the initial

vehicle separations are already in the defined sensor range. As seen in Figure 5.23, the overall

behavior of the platoon in simulation 3 is closer to the one in simulation 1 than in simulation 2.

Simulation 4 is given to illustrate the importance of the choice of learning parameters. In

terms of uniformity of the headway distances, the result of this simulation is worse. This is of

course due to relatively small learning parameters. Although the length of the memory vector is

the same as in simulation 2, the responses to the headway module are slow, and therefore the

headway and speed variations are large.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 85

82 82

81 81

80 80

79 79

78 78

77 77

0 10 20 0 10 20

time (sec) time (sec)

(a) (b)

Average platoon speed (kmh) Average platoon speed (kmh)

82 82

81 81

80 80

79 79

78 78

77 77

0 10 20 0 10 20

time (sec) time (sec)

(c) (d)

Figure 5.21. Average platoon speeds for simulations 1-4.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 86

Headway distances (d1=12, d2=15, fsr=16) Headway distances (d1=12, d2=15, fsr=16)

25 25

20 20

15 15

10 10

5 5

0 10 20 0 10 20

time (sec) time (sec)

(a) (b)

Headway distances (d1=10, d2=15, fsr=20) Headway distances (d1=12, d2=15, fsr=16)

25 25

20 20

15 15

10 10

5 5

0 10 20 0 10 20

time (sec) time (sec)

(c) (d)

Figure 5.22. The distance between vehicles for simulations 1-4.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 87

t=0s (a)

80 78 78 80 80 82 80 82 82 80

80 78 78 80 80 82 80 82 82 80

80 78 78 80 80 82 80 82 82 80

80 78 78 80 80 82 80 82 82 80

(b)

Simulation 4

79 80 80 79 79 7979 79 79 80

79 79 79 79 79 79 79 79 80 80

79 79 79 79 79 79 80 80 80 80

79 79 80 80 79 80 80 80 80 79

(d)

ch5m2.mpg

(0.237Mb; every frame represents one fifth of a second)

(e)

Figure 5.23. Initial and final positions (a,c) and speeds (b,d) of automated vehicles in four

ten-vehicle platoons. The mpeg movie of the simulation is accessible via icon (e).

Figures 5.24-5.26 show the results of a second batch of simulations (numbers 5 and 6). In

both simulations, the initial conditions are the same; the desired speed for all vehicles is still

80kmh except the leading vehicle which desires to slow down to 77mph. In this scenario, the

effect of the learning parameters is obvious: a fast learning vehicle will react faster to the speed

changes (and therefore headway changes). Also, a very short sensor region may create a problem

because it directly affects the reaction time. However, the effect of the length of the memory

buffer is not that obvious. In simulations 5 and 6 (Figures 5.24 and 5.25), the only difference

between two platoons is the size of the memory vector. There are no apparent differences in the

behavior of the platoon. Again, the separations are distributed in a larger region for a smaller

memory vector, but the difference is minor.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 88

81 81

80 80

79 79

78 78

77 77

76 76

0 10 20 30 40 50 60 0 10 20 30 40 50 60

time (sec) time (sec)

(a) (b)

Figure 5.24. Simulations (a) 5 and (b) 6: average speed for a platoon of ten automated vehicles.

Headway distances (d1=10, d2=15, fsr=18) Headway distances (d1=10, d2=15, fsr=18)

25 25

20 20

15 15

10 10

5 5

0 10 20 30 40 50 60 0 10 20 30 40 50 60

time (sec) time (sec)

(a) (b)

Figure 5.25. Simulations (a) 5 and (b) 6: distances between vehicles for a platoon of ten

automated vehicles.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 89

t=0s (a)

80 78 78 80 80 82 80 82 82 80

80 78 78 80 80 82 80 82 82 80

(b)

Simulation 2

76 76 76 76 76 76 77 77 77 78

76 76 76 76 76 77 77 76 77 77

(d)

ch5m3.mpg

(0.345Mb; every frame represents one fifth of a second)

(e)

Figure 5.26. Initial and final positions (a,c) and speeds (b,d) of automated vehicles in four

ten-vehicle platoons. The mpeg movie of the simulation is accessible via icon (e).

When the speed change for leading vehicle is much larger, the effect of the memory vector

in platoon behavior can be seen much clearly. Two mpeg movies in Figure 5.27 show two

unstable platoons. When the lead vehicle slows down to 75mph, the following vehicles are not

able to keep their safe distances. Using the same parameters as before, a smaller length of the

memory vector, resulting in faster firing of actions, gives a better result. With a memory vector of

length 25, the third (or sometimes fourth) vehicle crashes into the vehicle in front, while with a

smaller length of 9, the platoon stability is kept longer, and the problem occurs at the end of the

platoon (seventh, eighth, or ninth vehicles). Since the speed (and headway) deviations increase as

we move back in the platoon, an increase in the capability of the automated vehicles avoids an

early problem.

The simulations given here are run using the general linear reward-penalty learning

scheme. Examples of the nonlinear learning scheme and its comparison to linear schemes are

given in Appendix C.

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 90

ch5m4.mpg

(0.065Mb; every frame represents one fifth of a second)

(a)

ch5m5.mpg

(0.100Mb; every frame represents one fifth of a second)

(b)

Figure 5.27. Simulations 7 and 8: mpeg movies of two simulations of 10-vehicle platoons with

lead vehicle decelerating to 75mph: Length of the memory vector is (a) 25 and (b) 9, all other

parameters are the same as in simulation 6.

5.4.3.1 Environment Model

In our application, the input to the automata is binary, and thus, the environment is said to be of

P-model. Although the end result of the decision modules and functions is binary, we can visualize

the method we use as a multi-valued logic. Additional conditions and the logical OR gate can be

replaced by a simple logic function which takes multiple values into account. In this sense, the

teacher outputs are somehow similar to a Q-model environment where multiple values in the

range [0, 1] are possible. However, changing the learning schemes for a Q-model environment is

tedious. In fact, there are no known applications of the Q-model multi-teacher environments.

The lack of presence of the S-model in this work is due to similar reasons. As stated in

Chapter 6, the nonlinear reinforcement schemes are designed with S-model environments in mind.

Our initial attempts to define teacher modules for the S-model have failed because of the difficulty

in combining these outputs in a way that makes sense to the automaton. Again, the lack of

previous work on the continuous environment feedback and the choice of weighting factors for

the model leads to the difficulty. Nonlinear reinforcement models with continuous feedback

responses are discussed in the literature, however there are no applications of this model, even for

a single teacher case. The problem of choosing the appropriate weighting parameters is an open

research area, and needs to be investigated.

A great percentage of the previous research on intelligent control has emphasized the control

methods and technological necessities for AVC, rather than making intelligent decisions, as we

discussed previously in Chapter 1. In other words, the research mainly covers the regulation layer

in [Varaiya93]. All AHS projects currently carried out in the world have not yet considered lane

change maneuvers for implementation. Although the vehicles are individually capable of such

actions, the coordination of such maneuvers will obviously be difficult to implement. In the case

of a hierarchical control structure, there is not much need for autonomous decision making since

the control commands will be the direct result of information fed by the hierarchy. The advantage

of this approach is mainly in the content of the information. A hierarchical AHS system will have

complete knowledge about each vehicle, and will be able to relay this global information to

Cem nsal Chapter 5. More Intelligent Vehicles: Extending the Capabilities 91

every agent on the highway. The autonomous vehicle approach, on the other hand, suffers from

the lack of global information. While designing the intelligent controller, we assumed local sensing

and minimal or no communications.

Solving the path planning problem locally may be relatively inexpensive and simple due to

individual design goals and lack of communications, but the price to be paid may be the optimality

of such a distributed system. Consider the situation given in Section 5.3.1 (Figure 5.17). For an

autonomous vehicle to detect that the current solution is only local, and does not satisfy the

global goals (in the sense that the vehicle is capable of avoiding collision, but is not able to reach

its desired speed), the information about the relative position (and maybe speed) of other vehicles

needs to be known. This information can be provided by link layer in a hierarchical system

[Varaiya93], or may be extracted using visual clues in the autonomous vehicle approach

[zgner96, Pomerlau96, Weber96]. Again, the information content will be lower in the latter

case. The need for more global information, rather than using only local sensors, is crucial and

apparent, considering difficulties encountered during this research.

Chapter 6

Learning Automata

response of the environment, updates its action probabilities according to a reinforcement

scheme. The reinforcement scheme is the mechanism generating the learning behavior of the

stochastic automaton. It is of course based on the type of environment, and is generally classified

according to two criteria:

nature of the mapping from the previous probability vector to the new probability

vector, e.g.:

(i) linear, nonlinear or hybrid or

(ii) projectional, nonprojectional1

the behavior of the learning automaton using the scheme, e.g., optimal, expedient.

Our efforts of designing an intelligent path controller for an autonomous vehicle using

stochastic learning automata led us to employ four different reinforcement schemes. These are:

Multi-teacher general absolutely expedient scheme (MGAE), a nonlinear scheme

under S-model environment [Baba83,Baba85]

Linear reward-penalty LR P scheme (general LR P with a b) [Narendra89, nsal96]

Nonlinear reinforcement scheme with H function (NLH), an absolutely expedient

scheme [nsal95]

The first two algorithms are widely known [Narendra89, Baba85]. Their convergence

properties were investigated, and several applications using these schemes were previously

presented [Chand69, Baba80]. The third algorithm is less popular, since the behavior of an

automaton using this scheme have not been analytically proven [Narendra89]. A special case of

this algorithm, LR P , is obtained by adding a small penalty term ( 0 < b << a < 1 ) to the LR I

1

See Section 3.5 for the definition of these terms.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 93

scheme. However, the proof of -optimality for the general r-action case is again not complete

[Narendra89]. The general linear reward-penalty scheme, LR P , is linear, nonprojectional. In

Section 6.1.1, we will prove that this scheme is optimal in the case where there is an pure

optimal action.

The last scheme is nonlinear, projectional and is an extension of the general absolutely

expedient schemes [Narendra89, Baba85]. It differs from previous schemes by the definition of

an additional reward function. We will also show in this chapter that this scheme is absolutely

expedient in stationary environments.

The last two schemes are used in this work for their improved behavior in specific cases

frequently encountered in our application to intelligent vehicle control. Both schemes are found

to be convergent to the optimal solution faster than the first two schemes. They are the direct

results of our attempts to create reinforcement schemes with desirable characteristics suitable for

this study of learning automata applications to intelligent vehicle control.

In the following section, we will compare the linear reinforcement scheme LR P to

previously existing linear learning algorithms. The nonlinear reinforcement scheme NLH is also

compared to the general nonlinear scheme in Section 6.2. Since these two new schemes are found

to be useful, we have investigated their convergence properties. Using the nonlinear stability

theorems, we give the proof of optimality of the linear scheme LR P for a specific case.

Furthermore, conditions of absolute expediency are checked with the new nonlinear scheme NLH,

and the scheme is proven to be absolutely expedient.

Unequal Parameters, LR-P

Consider the general linear reward-penalty reinforcement scheme previously given in Section

3.5.1:

if ( n) = i

pi (n + 1) = pi ( n) + a (1 pi (n))

for (n) = 0

p j (n + 1) = (1 a ) p j (n) j i

(6.1)

pi (n + 1) = (1 b) pi (n)

for (n) = 1 b

p ( n + 1) = + (1 b) p j ( n) j i

j

r1

where r is the number of actions. Parameters a and b are associated with the reward and penalty

updates respectively. For b = 0, the algorithm is called linear reward-inaction scheme, and is

known to be -optimal [Narendra89]. The specific case where a = b is, generally, known as the

linear reward-penalty scheme, and is expedient for stationary environments [Bush58,

Narendra89].

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 94

with a single "optimal action" in a stationary environment and using this linear inaction-penalty

scheme LI P , although the probability of the optimal action grows higher than others, it can

never reach 1. In fact, it approaches 1/(r-1). Therefore, nothing can be said about the optimality

of the scheme. Since this reinforcement scheme is relatively unimportant, we give the proof of

this statement in Appendix B. If the parameter b = 0 while 0 < a < 1, the algorithm is called LR-I,

the linear reward-inaction scheme. Again, several researchers studied the behavior and possible

applications of this scheme [Viswanathan73, Shapiro69]. The case where the leaning parameters

are unequal (a b) have not studied rigorously by previous researchers2 because the proof of

optimality does not exist for this case.

The relative behavior of the LR I (b = 0), L=R P (a = b) and LR P (a b) schemes is shown

in Table 6.1 and Figure 6.1. The data show the convergence behavior of an optimal action in the

best case possible (i.e., c = 0, cj = 1) for different values of parameters a and b. We calculate

the time for the optimal actions probability to reach approximately 1. The number of actions is

3; initial probability values are the same. For every parameter set, the reinforcement scheme is

tested 500 times. The updating of the probabilities stops when the probability of the optimal

action reaches 0.995. The simulation is stopped automatically at time step 1500 even if the

convergence is not obtained. The top graph in Figure 6.1 shows the minimum, average, and the

maximum number of steps for the parameters space [01. , 09 . ] [00. , 09

. ] . The bottom graph

shows only the minimum and average number of steps; it is also scaled and rotated version of the

first plot to emphasize the effects of the parameter b for smaller values of parameter a.

Simulation data show that the time to reach the pure optimal strategy depends on the

learning parameters. Although Figures 6.1 and 6.2 show the whole parameter range, values larger

than 0.3 are not generally used. As seen from the data in Table 6.1, and Figures 6.1 and 6.2, the

convergence to pure optimal strategy is faster with L=R P than LR I . The difference between these

two reinforcement schemes is that the first one is expedient, while the latter is -optimal. Again,

the simulation results show that when b is slightly less than a (e.g., LR P with [a, b] = [0.2, 0.1]

or [a, b]=[0.3, 0.1]) the convergence to a pure optimal strategy is faster than with L=R P .

2

Except the case where the learning parameter b is much smaller than the parameter a, i.e., LR P scheme.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 95

1500

1000

500

0

1

b 0.5

1

0.8

0.6

0.4

0 0.2

0

a

100

80

60

40

20

0

0

b 0.5

0.2 0

0.6 0.4

1 1 0.8

a

Figure 6.1. Number of steps needed for popt to reach 0.995 for different values of learning

parameters a and b: (top) maximum, average and minimum number of steps; (bottom) average and

minimum number of steps (over 500 runs; 3 actions; initial probabilities same; perfect case; note

the axis shift and scaling in the second plot).

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 96

a b max. min. mean median std.dev. Nonconv.

LR I 0.1 0.0 79 50 59.418 59 4.557 0

L=R P 0.1 0.1 142 51 75.336 71 16.978 0

* 0.1 0.2 327 48 108.062 95 47.066 0

* 0.1 0.3 820 49 170.254 134 114.389 0

* 0.1 0.4 1338 48 266.16 204 218.733 0

* 0.1 0.5 1479 48 344.535 273.5 264.416 8

* 0.1 0.6 1497 47 425.219 326 338.649 17

LR I 0.2 0.0 42 23 29.124 28.5 3.532 0

* 0.2 0.1 76 23 32.548 32 6.328 0

L=R P 0.2 0.2 105 23 34.940 33 8.737 0

* 0.2 0.3 96 23 38.594 35 1.744 0

* 0.2 0.4 164 23 43.440 38 1.901 0

* 0.2 0.5 181 23 46.884 39 2.168 0

* 0.2 0.6 169 23 51.136 42.5 2.941 0

LR I 0.3 0.0 36 15 19.068 18 3.199 0

* 0.3 0.1 39 15 20.326 20 3.659 0

* 0.3 0.2 43 15 21.164 20 5.006 0

L=R P 0.3 0.3 47 15 21.966 21 5.754 0

* 0.3 0.4 60 15 22.814 21 6.988 0

* 0.3 0.5 97 15 25.130 22 10.211 0

* 0.3 0.6 72 15 24.964 22 9.943 0

LR I 0.4 0.0 26 11 14.278 14 2.767 0

* 0.4 0.1 27 11 14.788 14 3.223 0

* 0.4 0.2 29 11 15.090 14 3.233 0

* 0.4 0.3 31 11 15.308 14 3.642 0

L=R P 0.4 0.4 51 11 15.618 14 4.469 0

* 0.4 0.5 42 11 16.002 14 4.973 0

* 0.4 0.6 51 11 16.396 15 5.745 0

LR I 0.5 0.0 24 9 11.978 11 2.533 0

* 0.5 0.1 22 9 11.672 11 2.472 0

* 0.5 0.2 23 9 11.574 11 2.689 0

* 0.5 0.3 26 9 11.844 11 3.024 0

* 0.5 0.4 31 9 11.824 11 3.428 0

L=R P 0.5 0.5 30 9 11.772 11 3.243 0

* 0.5 0.6 27 9 11.886 11 3.325 0

LR I 0.6 0.0 26 7 9.540 9 2.750 0

* 0.6 0.1 24 7 9.536 9 2.349 0

* 0.6 0.2 20 7 9.240 9 2.096 0

* 0.6 0.3 19 7 9.322 9 2.132 0

* 0.6 0.5 31 7 9.814 9 2.982 0

L=R P 0.6 0.6 29 7 9.588 9 3.104 0

Table 6.1. Behavior of the automata under general LR P scheme: 500 runs;

maximum step =1500; 3 actions with c = [0 1 1] (This data is plotted in Figure 6.1).

3

The mark * indicates linear reward-penalty scheme L=R P (a b).

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 97

In some simulation runs, the probability popt did not reach 0.995. In this case, the

statistical values are calculated by discarding the data for these runs, and therefore, do not present

the actual behavior of the algorithm. The fact that the algorithm did not converge in 1500 steps

does not exactly mean that it would not converge if the limit was increased. Similarly, the fact

that we have convergence in all 500 runs does not guarantee convergence at all times

Figure 6.2 shows the result of the simulations for the parameters range 0 b 0.9, 0.1

a 0.9. For each parameter pair, 500 simulations are completed. The number of simulations

which did not converge in 1500 steps are indicated with o. The marker x indicates the fact that

the probability of the pure optimal action had converged to 0.995 in all 500 simulation runs. We

know that for a = 0, convergence is not obtained no matter what the value of parameter b is

(Appendix B). As seen from the plot, if the value of the parameter b is much larger than

parameter a for small values of a, convergence again is not guaranteed. The exact limit of the

non-convergence region is not known since the number of simulations is 500, and the limit of

time steps is 1500. Furthermore, this region is different for different numbers of actions.

1

0.9 65

0.8 48

0.7 47

0.6 17

0.5 8

b

0.4

0.3

0.2

0.1

0

0 0.2 0.4 0.6 0.8 1

a

Figure 6.2. For some parameter values, convergence is not obtained for all 500 runs

(Number of non-converging runs are shown next to marker o).

Table 6.2 gives the maximum, minimum and average number of steps for the optimal

actions probability to converge to 0.995, for a smaller range of learning parameters. As seen from

the table (and Figure 6.1), a slight change in the value of the parameter affects the speed of

convergence to the pure optimal strategy. Especially for smaller values of parameter a, the

maximum and consequently the average number steps decreases significantly with a decreasing

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 98

value of parameter b. For example, the maximum number of steps decreases from 105 to 48 when

we switch from the L=R P scheme to the LR P scheme by decreasing the value of the parameter b

from 0.2 to 0 while a = 0.2.

0.995

a b max. min. mean

LR I 0.10 0.00 79 50 59.418

LR P 0.10 0.05 122 49 67.080

L=R P 0.10 0.10 142 51 75.336

LR P 0.10 0.20 327 48 108.062

LR P 0.10 0.30 820 49 170.254

LR I 0.20 0.00 42 23 29.124

LR P 0.20 0.05 48 23 31.038

LR P 0.20 0.10 76 23 32.548

L=R P 0.20 0.20 105 23 34.940

LR I 0.30 0.00 36 15 19.068

LR P 0.30 0.10 39 15 20.326

LR P 0.30 0.20 43 15 21.164

Table 6.2. Behavior of the automata under general LR-P scheme:

500 runs; maximum step = 1500; 3 actions with c1 = 0, c2 = c3 = 1.

Changing the initial values of the action probability does not drastically change the

behavior of the automata except to increase the number of time steps for convergence to the pure

optimal strategy. Figure 6.3 shows the maximum, average and minimum number of steps for

convergence with initial probability vector of [0.005 0.4975 0.4975]. Besides the increase in the

number of steps needed for convergence, the comparison of Figures 6.1 and 6.3 shows that the

effects of the learning parameters are comparable.

Similarly, increasing the number of actions also increases the number of time steps needed

for convergence; the effect of the parameters is generally the same. As seen from Tables 6.2 and

6.3, the minimum number of steps for convergence is not affected significantly, but the maximum

number of steps, and therefore the average, increases with an increasing number of actions.

Furthermore, if the probabilities of penalty for all non-optimal actions (cj) are less than

1, the convergence is again slower as expected (Figure 6.4). The change in the convergence rate is

due to the decrease in the number of correct updates on the action probability vector. The

effects of the parameters is again similar to previous cases. The effect of the parameter b on the

number of non-converging runs and the average number of steps for convergence is significant

when learning parameter a is small.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 99

1500

1000

500

0

1

b 0.5

0.8 1

0.6

0 0.2 0.4

0 a

Figure 6.3 Number of steps needed for popt to reach 0.995 for different values of learning

parameters a and b: maximum, average and minimum number of steps (over 500 runs;

3 actions; perfect case; p(0) = [0.05 0 .4975 0.4975]).

actions 0.995

a b max. min. mean

4 0.10 0.10 292 50 98.2420

5 503 55 117.2060

7 459 65 153.9420

10 774 64 198.0060

4 0.20 0.00 50 24 31.3400

5 61 24 32.8320

7 85 25 36.1540

10 129 25 39.1120

4 0.20 0.10 70 24 36.2900

5 74 24 40.0940

7 101 26 43.9480

10 114 25 50.1940

4 0.20 0.20 138 24 44.5460

5 135 24 50.8560

7 256 25 64.1420

10 360 25 82.2700

4 0.30 0.30 92 16 27.2540

5 125 16 31.2400

7 117 16 38.5120

10 259 16 49.2380

Table 6.3. Effect of number of actions on behavior of the automata using general LR-P scheme:

500 runs; maximum step = 1500; perfect case; all action probabilities initially equal.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 100

1500

1000

500

0

1

b 0.5

0.8 1

0.6

0 0.2 0.4

0 a

Figure 6.4 Number of steps needed for popt to reach 0.995 for different values of learning

parameters a and b: maximum, average and minimum number of steps (500 runs; 3 actions;

ideal case: c =[0 0.8 0.5]; initial probabilities are the same).

used. Instead of increasing both learning parameters, decreasing the parameter associated with the

penalty (namely b) resulted in a better performance. However, LR P reinforcement scheme is not

widely used, nor there are complete results on learning behavior of an automata using this

scheme. We will now show that this reinforcement scheme is expedient and -optimal for a

special case.

Unequal Learning Parameters, L R-P

The linear reward-penalty reinforcement scheme with a b has been previously studied to some

extent. Narendra and Thathachar defined a special case of LR P , the LR P scheme where the

learning parameter associated with penalty (namely b) is much smaller than learning parameter

associated with reward. In [Narendra89], this scheme is treated as a special case of the linear

reward-inaction scheme LR I because only small values of the penalty parameter b = .a where

0 < << 1 are considered. The reason for such a definition is that the behavior of the scheme can

studied by using the methods given in [Narendrta89] provided that the parameters are small.

However, even for small values of the parameter b (as well as a), this linear approach does not

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 101

result in a complete proof; the establishment of -optimality of the LR P scheme in the r-action

case in general is incomplete [Narendra89].

Here, we will consider all values of the parameters a and b and use the general stability

theorem on nonlinear discrete-time systems [Kalman62] to prove the convergence for a specific

region, without limiting the parameter b to small values. However, there is a price we have to pay

for this; the use of the stability theorem limits this proof to a specific case where there is an

optimal action4.

To prove the convergence to pure optimal action, we start with the conditional

expectation of the action probabilities. From the definition of the LR P scheme (Equation 6.1), we

may write the conditional expectation for the probability of an action at the next step as:

r

[ ] [

E pi (n + 1) pi ( n) = E pi ( n + 1) pi (n) (n) = k p k (n )

k =1

]

r

[

= c k E pi (n + 1) pi ( n) ( n) = k (n) = 1 p k (n )

k =1

]

r

[

+ (1 c k ) E pi ( n + 1) pi (n) (n) = k (n) = 0 p k ( n)

k =1

]

r

b

= ci pi ( n)(1 b ) pi ( n) + c k pk (n) + (1 b) pi (n)

k i r 1

r

+ (1 ci ) pi (n)[pi (n) + a(1 pi (n)) ]+ (1 c k ) pk (n)(1 a) pi (n )

k i

(6.2)

We can write this equality as:

[ ]

r

b r r

E pi (n + 1) pi (n) = pi (n)(1 b) ck pk (n) + c p ( n ) + (1 a ) p ( n ) pk (n )

k =1 r 1 k i k k i

k =1

r

(6.3)

(1 a ) pi (n) ck pk (n ) + api (n)(1 ci )

k =1

r

Rearranging terms and using the fact that p ( n) = 1 , we get:

k

k =1

r r

[ ]

E pi (n + 1) pi ( n) = [(1 b ) (1 a )] pi (n ) pk c k +

k =1

b

pk (n) c k

r 1 k i

+ (1 a ) pi (n) 1 + a pi (n)(1 ci ) (6.4)

r r

b

= (a b) pi (n ) pk (n) c k +

r1

pk (n) ck + (1 aci ) pi ( n)

k =1 k i

4

The case where there is an optimal action is the most common case in stochastic automata applications. The

convergence of the probability vector to the pure optimal strategy in this case is proven for almost all linear and

nonlinear reinforcement schemes [Najim94, Narendra89].

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 102

For a = b, this corresponds to difference equation for the L=R P scheme [Narendra89]. Now,

taking the expected value of both sides, we obtain:

r r

b

E [ pi ( n + 1)] = ( a b) c k E [ pi (n ) p k (n) ] + c k E [pk (n )]+ (1 aci ) E[pi (n)] (6.5)

k =1 r 1 k i

At this point, we will state two theorems for the stability of nonlinear discrete-time

systems which will be used in our proof.

x ( n + 1) = f (x (n) ) (6.6)

where x is a vector, f is a vector with the property that f(0) = 0. Suppose there exists a scalar

function V(x) continuous in x such that:

(ii) V ( x) < 0 for x 0 where V ( x( n + 1)) = V ( x (n + 1)) V ( x( n)) = V ( f ( x (n ))) V ( x (n ))

(iii) V(0) = 0.

(iv) V(x) as x .

Then, the equilibrium state x = 0 is asymptotically stable in the large and V(x) is a Lyapunov

function. Condition (ii) may be replaced with:

(ii) V(x) 0 for all, and V(x) does not vanish identically for any solution sequence {x(n)}

satisfying Equation (6.6).

f(x) < x with f(0) = 0 (6.7)

for some set of values x 0 and some norm, then the system above is asymptotically stable and

one of its Lyapunov functions is:

V(x) = x (6.8)

( )

r r

ji k =1

where:

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 103

[

pi (n ) E pi ( n) ] i = 1,..., r

(

ij2 (n ) Cov pi ( n), p j (n) )

ik ( a b) ck k = 1,..., r (6.10)5

ii 1 a ci

b

ij c j i

r 1 j

We first give the proof for a three-action automaton for the sake of clarity. For a three-

action automaton with an optimal action (c1 = 0), Equation 6.9 (or Equation 6.5) can be written

as:

p1 (n + 1) = 11 p1 ( n) + 12 p2 ( n) + 13 p3 ( n)

+ 11 p1 (n) 2 + 11112( n) + 12 p1 ( n) p2 (n ) + 12 122 (n) + 13 p1 ( n) p3 (n) + 13132 (n)

p2 (n + 1) = 21 p1 (n) + 22 p2 ( n) + 22 p3 (n )

+ 21 p2 ( n) p1 (n ) + 21 212( n) + 22 p2 (n) 2 + 22222 (n) + 13 p2 (n ) p3 ( n) + 23232 ( n)

p3 ( n + 1) = 31 p1 ( n) + 32 p2 (n) + 33 p3 ( n)

+ 31 p3 (n) p1 (n) + 31312 (n ) + 32 p3 (n) p2 (n) + 32322 ( n) + 33 p3 (n) 2 + 33 332 (n)

(6.11)

where:

j1 = ( a b) 0 = 0 jk = (a b) c k

b (6.12)

j1 = 0= 0 jk = 1 ack j = 2,3 k = 2,3

3 1

Again, ii2 (n) are variances, and ij2 (n) are covariances for the action probabilities. Since the first

action has zero penalty probability, the difference equations for the last two action probabilities

does not include the probability of the first action. Therefore, the difference equations for the

probabilities of the last two actions can be written as:

b

p2 (n + 1) = (1 ac2 ) p2 ( n) + c p (n )

2 3 3

+ (a b )c2 p2 (n )2 + ( a b)c2 222 (n) + (a b) c3 p2 (n) p3 ( n) + (a b)c3 322 ( n)

(6.13)

b

p3 ( n + 1) = c2 p2 (n) + (1 ac3 ) p3 (n)

2

+ (a b )c2 p3 (n) p2 (n) + (a b) c2 322 ( n) + (a b)c3 p3 (n) 3 + ( a b)c3 332 (n )

5

We omit the operator E[.] for simplicity.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 104

Let us define a function f of the expected values of the last two probabilities as 6:

b 2 2

p2 (1 ac2 ) p2 + 2 c3 p3 + (a b)c2 p2 p2 + ( a b) c2 22 + (a b)c3 p2 p3 + (a b )c323

f = b

p3 c p + (1 ac ) p + (a b)c p p + (a b)c 2 + (a b )c p p + (a b)c 2

2 2 3 3 2 3 2 2 32 3 3 3 3 33

2

(6.14)

If we can prove that this function is a contraction using the 1-norm7, then we can state that the

expected value of the probabilities of the last two actions asymptotically converge to 0, by using

Theorems 3.1 and 3.2. Thus, the probability p1 of the optimal action asymptotically approaches

1.

The contraction condition is:

p2 ( n) b

f = (1 ac2 ) p2 + c3 p3 + (a b)(c 2 p22 + c 2 22

2 2

+ c3 p2 p3 + c3 23 )

3

p (n ) 2

1

b

+ (1 ac3 ) p3 + c p + (a b) (c3 p23 + c3 33

2

+ c2 p2 p3 + c2 32

2

)

2 2 2

? p (n )

< p2 (n) + p3 ( n) = 2

p3 (n) 1

(6.15)

Since, a, b, ci, pi are all parameters/variables in [0, 1], and the last terms in the absolute values are

always positive8 for all n, we can write:

p 2 ( n ) b

f (1 ac 2 ) p2 + c 3 p3 + a b (c2 p22 + c2 222 + c3 p2 p3 + c3 23

2

)

p3 (n ) 2

1

b

+ (1 ac3 ) p3 + c2 p2 + a b (c 3 p32 + c3 33

2

+ c2 p2 p3 + c2 23

2

)

2

(6.16)

Rearranging the terms, we have:

p2 ( n) b

f (1 ac 2 ) p2 + c 2 p2 + a b (c2 p22 + c2 222 + c2 p2 p3 + c 2 232 )

p3 (n ) 2

1

b

+ (1 ac3 ) p3 + c3 p3 + a b (c3 p 23 + c3 332 + c3 p2 p3 + c 3 232 )

2

6

By omitting the terms (n) for simplification.

Denoted here as 1 .

7

8

All the terms of the sum are positive: the first term includes the square of the expected value, and the second term,

the variance which are always positive; the third and fourth terms can be combined as ci ( pi pj + 2ij ) = cE

i

[ pi pj ] .

Expected value of the multiplication of any two action probabilities is always positive since 0 < pi <1.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 105

(6.17)

b

Replacing by b without invalidating the inequality, and using the identity

2

E [x y ] = E [x ] E [y ]+ xy2 , we obtain:

p2 (n)

f ([ ] [

< (1 ac2 ) p2 + bc2 p2 + a b c2 E p2 + E p2 p3

p3 ( n) 1

2

])

( [ ] [ ])

+ (1 ac3 ) p3 + bc3 p3 + a b c 3 E p23 + E p2 p3 (6.18)

[ ] [ ] [ ] [ ] [ ]

We know that E pi2 + E pi p j = E pi2 + pi p j = E pi ( pi + p j ) < E pi since the sum of all

probabilities is equal to 1. Therefore:

p2 (n)

f < (1 ac2 ) p2 + bc2 p2 + a b c2 p2 + (1 ac 3 ) p3 + bc3 p3 + a b c3 p3

p3 ( n) (6.19)

1

p2 ( n) p 2 (n)

f < (1 ac 2 + bc 2 + ac 2 bc2 ) p2 + (1 ac 3 + bc3 + ac3 bc3 ) p3 = p 2 + p3 =

p3 (n ) 1 p3 (n) 1

(6.20)

Therefore, for the region a > b (Figure 6.5), the expected values of the probabilities of the sub-

optimal actions all converge to zero. Since the sum of all probabilities is equal to 1, the expected

value of the optimal action must converge to 1. Figure 6.6a shows the changes in the probabilities

of the suboptimal actions for a linear reward-penalty scheme with a = 0.2 and b = 0.1 with initial

probabilities of p(0)=[0.05 0.50 0.45]. As seen from the plot, probabilities p2 and p3 do not

asymptotically converge to 0. In 83 steps, the probability p1 of the optimal action reaches 0.999.

n

(

The cumulative averages pi (n ) = pi ( k ) (n = 0, 1, , 83, i = 1,.2) are plotted in Figure 6.6b.

k=0

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 106

b

1

stable

region

0 a

0.5 1

Figure 6.5. The region for asymptotic convergence to

pure optimal strategy.

Action probabilities 2 and 3 from n=1 to n=83 Cumulative average of action probabilities

0.5 0.5

n=0

0.4 0.4

0.3 0.3

p3 p3

0.2 0.2

n = 83

0.1 0.1

p(0)=[0.05 0.50 0.45]

0 0

0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5

p2 p2

(a) (b)

Figure 6.6. Probabilities (a) and cumulative average (b) of the probabilities of the non-optimal

actions for a 3-action automata with LR P (a = 0.2, b = 0.1).

[ ]

For r = N actions with p( n) = p1 ( n) p2 (n) ... pr (n ) , the contraction condition can be written

as:

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 107

p2 (n) p2 (n)

p3 ( n) p3 (n) N

f < pi (n) (6.21)

. .

i=2

N

p ( n )

1

p N ( n ) 1

( )

N N N

b

f (p(n )) = (1 aci ) pi + c j p j + (a b )c j pi p j + ij2

i=2 j=2 N 1 j =2

j i j i

(6.22)

( )

N N N N N

b

(1 aci ) pi + c j p j + (a b )c j pi p j + ij2

i =2 i=2 j=2 N1 i = 2 j =2

ji

Again, (1 aci), p i(n), c i , b, (a b) are all positive values, and the sum of all ( pi p j + ij2 ) has

positive values for all time steps. Therefore:

( )

N

b N N N N

f (p(n )) (1 aci ) pi + c j p j + (a b) c j pi p j + ij

2

(6.23)

i =2 N 1 i =2 j =2 i =2 j =2

j i

More explicitly:

(1 ac 2 ) p2 ( + c3 p3 + c4 p4 + ...........+ c N p N ) +

(1 ac ) p (c p + + c4 p 4 + ...........+ c N p N ) +

3 b 2 2

( )

f p(n )

3

+

... N 1 ...

(1 ac N ) p N ( c2 p2 + c3 p3 + .........+ c N 1 p N 1 + )

c2 ( p22 + 222 + p 2 p 3 + 232 + ...........+ p2 p N + 22 N ) +

c 3 ( p2 p3 + 322 + p32 + 332 + ...........+ p3 p N + 22N ) +

+ (a b)

...

c N ( p 2 p N + N2 2 + p3 p N + N2 3 + ...........+ p 2N + 2NN )

(6.24)

Rearranging, and using E [x y ]= E [x] E[x ] + xy , we have:

2

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 108

(1 ac2 ) p2 + ( N 2)c2 p2 +

(1 ac ) p +

b ( N 2)c3 p3 +

f (p(n ))

3 3

+

... N 1 ...

(1 ac N ) p N ( N 2)c N pN

9

([ ] [ ])

c2 E p22 (n ) + E p2 ( n) p3 (n) + ........... + E p2 (n) pN (n ) + (6.25)

] [

+ (a b)

([ ] [ ] [ ])

c3 E p2 ( n) p3 (n) + E p32 (n) + ........... + E p3 ( n) pN (n) +

...

N ([2 n ] [ 2 N ]

c E p (n ) p (n) + E p ( n) p (n) + ........... + E p 2 ( n)

N [ ])

[ ]

N N

[ ]

Since E pi ( n) p j (n ) = E pi (n) p j ( n) < E pi (n) 1 , we may write:

j= 2 j= 2

(1 ac2 ) p2 + c2 p2 + c2 p2 +

(1 ac ) p + c p +

( N 2)b c3 p3 + 3 3

f (p(n )) <

3 3

+ + ( a b ) (6.26)

... N 1 ... ...

(1 acN ) pN c N pN cN pN

N 2

Combining the terms above and omitting the coefficient N 1 < 1, we have:

N N

f (p(n )) < [(1 aci ) + bci + (a b)ci ]pi = pi p(n ) (6.27)

i= 2 i= 2

We have therefore proven that for the region a > b (Figure 6.5), the expected values of the

probabilities of the sub-optimal actions all converge to zero. This, in turn, implies that the

probability of the optimal action goes to 1, i.e., the pure optimal strategy is obtained. The

conditions for convergence are that learning parameter associated with reward be greater than the

learning parameter associated with penalty, and that there is one optimal action with c = 0. In

short, we have the following:

Theorem 3.3: An automaton using the general linear reward-penalty scheme with unequal

learning parameters in a stationary environment reaches the pure optimal strategy if:

ii) the learning parameter associated with reward is greater than the one associated with

penalty (i.e., a > b)

9

Note that pi stands for the expected value of pi , i.e., the terms E[.] are omitted. The notation p(n) defines the actual

value of the probabilities at time step n.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 109

And, since the probability vector converges to the unit vector corresponding to the optimal

strategy, the expected value of the average penalty converges to zero. Therefore, such an

automaton is expedient and optimal (see the definitions in Chapter 3).

Although LR P is not absolutely expedient like nonlinear schemes given in Chapter 3 and

Section 6.2, it is optimal for a specific but widely encountered environment condition. It still

remains to extend the proof to the region {(a , b) [ 01, ] [01, ] a < b }(Figure 6.5) if at all

possible. Also, the proof can again be extended to all possible environments by considering

values copt > 0. However, in this case, previous linearization approaches [Narendra89] and the

contraction theorems 3.1 and 3.2 cannot be used.

The proof given in the previous section for general the linear reward-penalty reinforcement

scheme is valid for an automaton in an ideal environment. The word ideal here refers to the

fact that there is an optimal action for which the probability of receiving a penalty is zero. In

such a case, the automaton is expected to reach pure optimal strategy. In the literature, both

the conditions where min {ci} = 0 and min {ci} > 0 are studied. When there is no optimal action,

the action probabilities converge to values conversely proportional to probabilities of penalty

[Narendra89, Najim94]. We have studied the ideal environment in depth because this case is often

encountered in applications. Almost all of the learning automata applications are defined in ideal

P-model environments [Baba85, Najim94]. In all the applications discussed in this thesis, the

environment is defined so that there is always one action with zero probability of penalty. This

action is the optimal action for the current situation or the solution to the problem at hand.

Throughout this dissertation, we defined the probabilities of penalty for each action as a

constant in the range of [0,1] where zero corresponds to no penalty at all, and 1 corresponds to

guaranteed penalty. Although these values may be time varying, most of the literature except

[Najim94, Baba85] consider these to be deterministic. On the other hand, it is interesting to note

that all proofs in this chapter and in the previous literature carry directly to a noisy environment

by defining the penalty probabilities as random variables. Assuming that the probability of

penalty from the environment for any action is a random variable of mean mi and variance v i, all

cis in difference equations can be simply replaced by mis to accommodate this new definition10.

For all the linear reinforcement schemes we mentioned so far, it has been shown here or

elsewhere that convergence to optimal or pure optimal strategy is guaranteed, sometimes based

on several conditions on the parameters and the environment. However, there is not much to say

about the expediency of these schemes, especially absolute expediency. For a learning algorithm,

it is important to be able to show that the learning characteristic is kept at every step, i.e., the

automaton does better than the previous step at every time iteration. This need has led to a

10

This is because the probabilities of penalty ci are independent from action probabilities pi .

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 110

synthesis approach toward reinforcement schemes, and the first result was the absolutely

expedient nonlinear scheme of Lakshmivarahan and Thathachar [Lakshamivarahan73].

Definitions of several nonlinear reinforcement schemes were given in Section 3.5.2. In the

next section, we will introduce a new nonlinear scheme that works better than the previously

defined schemes [Baba83, Baba85] for several applications where fast convergence under certain

conditions is necessary.

Consider the general nonlinear reinforcement scheme given in Section 3.5.2 (see Equations 3.20

and 3.27). For a multi-teacher P-model environment the general absolutely expedient scheme can

be written as:

if (n) = i ,

i1 + + i N r i1 + + i N r

pi (n + 1) = pi (n) +

N

( )

j p(n ) 1

N

(

j p(n) )

jj=1i jj =1i

i1 + + i N i1 ++ i N

p j (n + 1) = p j (n )

N

( )

j p (n) + 1

N

(

j p(n) ) for all j i

(6.28)

where the functions i and i satisfy the following conditions:

1 ( p( n)) ( p( n))

= = r = ( p (n))

p1 ( n) pr ( n)

1 ( p (n)) ( p( n))

= = r = ( p(n ))

p1 (n ) pr (n)

p j (n) + j ( p(n)) > 0 (6.29)

r

pi (n ) + j ( p(n)) > 0

j= 1

j i

for all i, j = 1,..., r.

The functions ( x ) and( x) can be chosen arbitrarily as long as they satisfy the

conditions given above. Previous attempts to find absolutely expedient schemes led to nonlinear

functions. The only absolutely expedient linear scheme is the LR I (b =0) scheme. In this case,

( x ) = a and ( x ) = 0 , and they satisfy the above conditions. The following theorem given in

[Narendra89] and [Najim94], states the necessary and sufficient conditions for absolute

expediency:

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 111

( p) 0

(p) 0 (6.30)

( p) + ( p) < 0

then the automaton with the reinforcement scheme in Equations 6.28 and 6.29 is absolutely

expedient in a stationary environment.

multiple (sensor) responses. However, the theorem is valid for any number of teachers in an

environment. A well-known description of functions ( p) and ( p) is given in Section 3.5.2.

The resulting reinforcement scheme is called multi-teacher general absolutely expedient (MGAE)

scheme in [Baba85]. Our reinforcement scheme differs from the one given in Section 3.5.2

(Equations 3.27 and 3.28) by the definition of these two functions. In the form of Equation 6.28,

we can write our update algorithm as:

if (n) = i

pi (n + 1) = pi (n) + f ( k H(n )) [1 pi ( n)] [1 f ] ( ) [1 pi (n)] (6.31)

p j (n + 1) = p j (n ) f ( k H(n )) p j (n) + [1 f ] ( ) p j ( n) for all j i

i.e.:

k ( p(n)) pk ( n)

(6.32)

k ( p (n)) k H( n) pk (n)

where learning parameters k and are real valued and satisfy:

0 < <1

(6.33)

0 < k < 1

The next section introduces the function H used in our reinforcement scheme and the

reasoning behind this choice. We also prove that the definitions above satisfy all necessary and

sufficient conditions for absolute expediency. The comparison of the new nonlinear scheme with

the general absolutely expedient scheme is concludes this chapter.

In Equation 6.31, the feedback f is some combination of teacher outputs 11, and the function is

defined as (see also Figure 6.7):

pi(n)

H( n ) min1; max ; 0 (6.34)

k (1 pi ( n))

11

It could be the average of all teacher outputs as in Equation X.28 or some nonlinear combination such as the one

we define in Section 4.x.x.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 112

Parameter is an arbitrarily small positive real number. Also note that the function H includes p i

which is the action probability corresponding to the current action.

H(p)

1/k

0

0.5 1 p

We now show that the defined algorithm (Equations 6.31-6.34) satisfies all the conditions

in Equations 6.29. From Equation 6.32, we have:

k (P ( n)) k H( n ) p k ( n)

= = k H ( n) ( P( n)) (6.35)

p k ( n) pk (n )

and:

k (P( n )) pk ( n )

= = ( P (n )) (6.36)

pk (n) pk (n)

That is, our definition is consistent with the first two conditions of Equation 6.29. There are

three remaining necessary and sufficient conditions for absolute expediency. Using Equations

6.31 and 6.32, the rest of the conditions on i and i translates to the following:

(b) p j ( n) p j (n ) > 0 (6.37)

(c) p i ( n ) k H ( n) (1 p i (n )) > 0

Conditions (a) and (b) are associated with the reward updates while condition (c) is associated

with the penalty updates. These conditions guarantee that the probabilities stay in the range (0,

1) at all times (with the assumption that none of the probabilities is initially 0 or 1). Conditions

(a) and (b) can be shown to be satisfied using the fact that the sum of all probabilities is 1:

r r

(a) p i + (1 p i ) < pi + p j <p i + p j = 1 since 0 < < 1 (6.38)

j =1 j =1

j i j i

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 113

p i ( n ) k H (n ) (1 pi ( n )) > 0 k H( n ) (1 pi ( n)) < p i ( n )

(c) pi ( n ) (6.40)

H( n ) <

k (1 pi ( n ))

This condition is already satisfied by the previous definition of the function H(n). For the

limiting values of H = 0 and H = 1 (Equation 6.34), we have the following:

H = 0 pi ( n ) > 0 . This is true for all values of the action probabilities pi.

the definition of the function, we conclude that H = 1 implies:

pi (n)

1

k (1 pi (n) )

This inequality can be rewritten as:

pi (n ) (1 + ) k (1 pi ( n))

or, omitting the time step variable and rearranging:

pi k (1 pi ) k (1 pi ) 0

Since all the factors of the third term are strictly positive real, we may omit this term without

affecting the inequality to obtain:

pi k (1 pi )> 0 (6.41)

which is exactly what we must have for H = 1.

Thus, condition (c) is satisfied for the limiting values of function H too. With this, we

conclude that all five conditions of the Equation 6.29 are satisfied. Therefore, the reinforcement

scheme NLH is a candidate for absolute expediency.

Furthermore, the functions and for nonlinear scheme NLH satisfy the following:

(p(n) )= < 0

(p(n )) = k H 0 (6.42)

(p(n) )+ (p( n)) = k H < 0

because 0 < < 1 , 0 < k < 1 , and 0 < H < 1 . Since the conditions above are sufficient for

absolute expediency, and we know that the nonlinear reinforcement scheme NLH satisfies all the

conditions listed in the definition of a candidate nonlinear reinforcement scheme, we state the

algorithm given in Equations 6.31-6.34 is absolutely expedient in a stationary environment.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 114

The choice of the function H is also due to another factor besides the conditions given in

Equation 6.29. Our definition is found to work better than the general MGAE Algorithm given in

[Baba85], considering the convergence to a solution. We compared two reinforcement schemes

using three actions and two different initial conditions (Table 6.1). Our definition results in a

faster update in general as shown in Table 6.4. The average number of steps for the pure optimal

actions probability to reach 0.995 are given for different values of the learning parameters k

and . Two different cases are considered.

The definition of the function H (Equation 6.34) is based on the fact that the update rate

for a penalty response from the environment is higher when the probability of the current action

is close to 1. (Note that H is a function of the probability of the current action pi; it is not related

to the index j in Equation 6.28.) This provides a much faster convergence rate when the actions

receiving a penalty from the environment have high probability values associated with them.

(a) 3 actions with (b) 3 actions with

Parameters p(0) = [1/3 1/3 1/3] p opt (0)=0.005,

p jopt (0)=0.4975

algorithm in New algorithm algorithm in New algorithm

k [Baba85] with H [Baba85] with H

1 0.010 544.99 304.48 952/94 * 534.42

0.035 155.67 155.59 358.46 ** 159.11

0.050 108.79 66.22 263.84 ** 113.98

0.100 54.11 35.13 184.85 60.13

0.200 26.85 19.21 141.64 31.16

2 0.050 102.80 48.36 259.77 77.60

0.100 51.42 27.57 185.83 41.70

0.200 25.58 15.73 117.84 23.53

*

80 runs did not converge in 1200 steps. H function is defined in Equation 6.34.

**

1 run did not converge in 1200 steps. 500 runs for each parameter set.

Table 6.4. Convergence rates for a single optimal action of a 3-action automaton

in a stationary environment.

The data shown in Table 6.4 are the results of two different initial conditions: (a) all

probabilities are initially the same, only one action is receiving reward, and (b) the only action

receiving a reward12 from the environment has a very small probability value. The difference in

convergence rate is more distinct in the situation where the probability of the optimal action is

initially very close to 013. As seen in Table 6.4, when popt = 0.005, the number of iteration steps

12

I.e., the optimal action.

13

This situation occurs frequently in our application to automated highway systems; for example, while the

probability of the lateral action shift left is converging to 1, a vehicle may enter the left sensor range. In this case,

we need a strong penalty update to decrease the probability of this action, while encouraging the action stay in

lane.

Cem nsal Chapter 6. New Reinforcement Schemes for Stochastic Learning Automata 115

to reach p opt = 0.995 is reduced drastically for relatively large values of the learning parameter .

Especially when both learning parameters are large (see the shaded area in Table 6.4), the

difference between the average number of steps for two schemes is threefold or more. In order to

have a fast update on the probability vector, the function H is set to the highest possible value

(see Equation 6.34) satisfying the conditions for i in Equation 6.29.

Chapter 7

In previous chapters, we introduced an intelligent control method for autonomous navigation and

path planning. The decision system mainly uses local information, and as a result, the actions

learned by the intelligent controller are not globally optimal; the vehicles can survive, but may not

be able to reach some of their goals. To overcome this problem, we visualize the interaction

between vehicles as sequences of games played between pairs of automata. Every game

corresponds to a state of the physical environment (the highway and the vehicles) as described

in Section 7.3. By evaluating these games, it is possible to design new decision rules, and analyze

the interactions by predicting the behavior and the position of each vehicle. Some of the

additional modules in Chapter 5 are actually the result of the approach described in detail here.

Automata are, by design, simple gadgets for doing simple things. As a control tool, a learning

automaton has limited applications since increasing the number of actions results in a decrease in

the speed of responses [Norman68]. The full potential of the stochastic automaton is realized

when multiple automata interact. For example, the separation of lateral and longitudinal actions

by using two automata leads to faster convergence, while bringing the issues of interaction and

coordination to the design problem. As stated in [Narendra89], the first question that needs to be

answered is whether the automata can be interconnected in useful ways for desired group

behavior for modeling or controlling complex systems. Narendra and Thathachar also ask if an

individual automata can be designed to function as a distributed yet coordinated intelligent control

system. The application of learning automata described in previous chapters partially answers

that question. This chapter introduces the idea and methodology behind the design of automata as

interacting agents.

Although the interaction between automata in our application is slightly different than the

definitions given in the literature, we will briefly introduce the basic ideas and previous results on

interacting automata. A detailed treatment of the subject can be found in [Narendra89,

Lakshmivarahan81].

Similar to other definitions in game theory, two or more automata are assumed to play a

game formulated in terms of automata actions and reward-penalty responses. At every step,

automata send an action to the environment which in turn evaluates the actions and sends out a

feedback. This feedback is called the outcome or payoff as in the general game theory. The

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 117

probability vector of actions defines the mixed strategy of a player/automaton; each element of

the probability vector corresponds to a specific action, or pure strategy.

Standard game definitions such as equilibrium point(s), dominant strategy, and pareto

optimality, are defined similarly. In our application, we are interested in equilibrium points where

no player has a positive reason to change its strategy, assuming that none of the other players is

going to change strategies. Such equilibrium points are also called Nash equilibria. All N-player

games with finite mixed strategy sets have at least one equilibrium. In a two-person zero-sum

game, there may be more than one equilibrium point with the same payoff. This is not

necessarily true for two-person non-zero sum games.

All descriptions above are defined on a game matrix. A game matrix is used to show the

payoff structure of a game. For example, for a two-player game where each player has two

actions (pure strategies), the game matrix D can be shown as:

(d 11

1

, d112 ) (d12

1

, d122 )

D= 1 2

(d 21 , d 21 ) (d 22 , d 22 )

1 2

where dijk is the payoff1 to player k when the players play pure strategies i and j. For a two-

player zero-sum game, the equality dij1 = d ij2 is true. There were several learning schemes

designed for zero-sum games; the solution to the problem (of obtaining the pareto optimal2 or

equlibrium point outcome) is given in [Lakshmivarahan82]. It is known that linear reward-

inaction LR I and -optimal linear reward-penalty LR P (with b << a) schemes guarantee that the

automata reach the best possible solution, i.e., the expected penalty reaches its minimum value.

The results are also valid for identical payoff games [Lakshmivarahan81], and may be extended to

N-player games [Narendra89].

For non-zero sum games, however, unique rational solutions are difficult to define. While

considering non-zero sum automata games, it is not possible to evaluate the performance of the

automata using game theoretic arguments. The role of the learning schemes must be reevaluated

in such situations [Narendra89]. Unfortunately, relatively little work exists in this area. Here,

we will attempt to provide some insight on automata interaction of several vehicles by defining

games of automata and vehicles. The results obtained for non-zero sum games can be extended to

N-player non-zero sum games to show that the mixed strategies will converge to the equilibrium

point(s). In the case of multiple equilibrium points, the convergence depends on the initial

conditions of the probability vectors.

It is also important to note that, in an automata game, the players are not aware of the

mixed strategy used by the other player(s) nor its previous actions. In fact, the players do not

even have the knowledge on the distribution of random payoff structure as a function of the

strategy combinations. The interaction medium between automata is the payoff function.

1

This is still the probability of receiving penalty, and therefore a player wants to minimize its payoff.

2

In an N-person game, an outcome is said to be pareto optimal if there is no other outcome in which all players

simultaneously do better, i.e., receive less penalty from the environment.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 118

As described in Chapters 4 and 5, the planning layer of an autonomous vehicle includes two

automata for lateral and longitudinal actions. Furthermore, the actions of each automaton, sooner

or later, affects the physical environment, thus creating another level of interactions although

indirect. In this section, we will describe the interaction between automata in the same vehicle,

and the automata in different vehicles. Furthermore, the nature of the games played by learning

automata is defined.

Each vehicles intelligent path controller consists of two automata. The interaction between these

two automata can be visualized as in Figure 7.1. As described in Section 5.3.3 (Figure 5.19),

lateral and longitudinal automata synchronously update their action probabilities based on the

response(s) of the environment. Furthermore, the value of the current longitudinal action changes

the environment response to the lateral automaton. In other words, the lateral environment3 is

determined by the current longitudinal actions. This is different than an automata game since the

interaction is not via the environment, but the automata directly interact. The idea of interacting

automata was first introduced in [Wheeler85]. The resulting configurations can be viewed as

games of automata with a particular payoff structure. Some of the results mentioned in Section

7.1 can then be applied to such interconnected automata. The model given in Figure 7.1 is an

example of interconnected automata with a changing environment.

LNG LAT

E 1LAT , E 2LAT

A LAT LAT

Lateral

Lateral Environment

LNG Automaton

ALNG E LNG

Longitudinal Longitudinal

Automaton Environment

environment (Adapted from [Narendra89]).

We know that lateral automaton ALAT can operate in two stationary environments E iLAT

(i = 1, 2). The difference between these two automaton environments is the response of the

headway module (see Section 5.3.3). In some situations, the choice of longitudinal action LONG

affects the response of the lateral environment. All other environment changes are due to changes

in the physical environment, and we visualize those changes as switching environments described

in Section 7.3. Due to interconnections, one would expect the longitudinal automaton ALNG to

3

The word environment here describes the automata environment, not the physical one.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 119

converge to its best action using absolutely expedient schemes. Lateral automaton ALAT in turn

would converge to the best action in the environment determined by ALNG.

Let us assume that an automated vehicle finds itself in the rightmost lane of a two-lane

highway, say by merging from an exit. Consider the situation in the first few seconds where the

automata environment stays stationary. With a relatively fast update rate, this assumption is

possible. The lateral action SR will receive a penalty until the vehicle shifts lane. Assuming the

vehicle is in its desired lane and speed range, the environment response for the longitudinal and

lateral actions depends on the output of the headway and left side sensor modules as given in

Table 7.1. Let us also assume that the probability that a sensed vehicle is moving away from the

autonomous vehicle is 1/3, and we know the probability of a vehicle being in a particular sensor

range4 as given in Table 7.2.

Range A B C D E none

Action

ACC 1 1 1/3 0 0 0

DEC 0 0 0 0 0 0

SM 1 1/3 0 0 0 0

SL 0 0 0 1 1/3 0

SR 1 1 1 1 1 1

SiL 5 1 1/3 1/3 0 0 0

0 0 1/3 0 0 0

Table 7.1. Assumed probabilities of penalty for each action based on the front

and side sensors (see Figure 5.1 for range definitions).

.

Probability A B C D E none

(a) 1/10 1/10 1/10 1/10 1/10 _

Table 7.2. Probability of sensing a vehicle in the sensor range.

Based on the numbers given in Tables 7.1 and 7.2, the probabilities of receiving penalty

for both automata can be calculated. For example, for longitudinal action ACC, the payoff is

1 1 1 1 1 1 1 1 7

1 + + + 0 + 0 + 0 = . Therefore, the game matrix for longitudinal and

10 10 3 10 3 10 10 2 10

lateral automata is:

4

Note that this is not required for the actual application, it is just an assumption to illustrate the effect of changing

environment for interconnected automata.

5

The second row indicates the probability of penalty when the longitudinal action is DEC.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 120

SL SR SiL

ACC 7 4 7 7 5

30 , 30 , 1

30

,

30 30

DEC

4 1

0,

30

(0 ,1 ) 0,

30

SM 4 4 4 4 5

, , 1 ,

30 30 30 30 30

Entries in the first and third row correspond to the first environment for the lateral

automaton while the entries of the second row to the second environment. The difference

between the two environment is the longitudinal action DEC: when the longitudinal automaton

chooses DEC, lateral environment ALAT is switched to second environment which changes the

response to the lateral action SiL. If the vehicle is slowing down, lane shifts are discouraged

(Figure 5.19). If the automata were not connected, absolutely expedient algorithms would be

expected to converge to actions DEC and SL (without the connection, the lateral action SL is

optimal since the penalty response from the front sensor in not suppressed.) When the automata

are connected as described, the optimal actions are DEC and SiL. Based on the probabilities given

in Tables 7.1 and 7.2, this solution is pareto optimal and an equilibrium point for this game. In a

situation where the probability of sensing a vehicle in region C is much larger than others, the

payoff of the lateral action SL is always less than SiL, and the action pair (DEC, SL) becomes the

pareto optimal solution. The two-automata game will converge to this new optimal solution in

this case.

Note that the situation above is very specific when we consider all sensors: the vehicles

must be cruising at their desired speed range and lane, the pinch sensor must not send a penalty

response, etc. For all other situations, the two automata may be considered unconnected. Using

the algorithms given in Chapter 6, the automata will converge to their best (optimal) actions

separately. The interaction between automata is via the physical environment, and for the

duration of a specific game we consider it to be stationary, resulting in a stationary automata

environment. Of course, the solution of such a disjoint game will be an equilibrium point (and a

pareto optimal solution) due to the convergence characteristics of the reinforcement schemes.

While two automata in each autonomous vehicle form an interconnected automata pair that is

guaranteed to reach the optimal solution for a stationary (physical) environment, interaction

between autonomous vehicles creates another level of interaction via the physical environment.

As seen in Figure 7.2, automata actions from other autonomous vehicles changes the physical

environment which in turn affects the feedback responses sent to an automaton. This type of

interaction is indirect, and therefore cannot be formulated using a game matrix. Furthermore, the

fact that the game matrix needs to be time variant when considering multiple interacting vehicles

complicates the matter.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 121

physical environment as a switching environment. Based on a certain change in the physical

world, the automata environment is changing from one state to another. Every automata

environment state includes a different set of feedback responses to lateral and longitudinal

automata. All these different states of the environment are assumed to be stationary since the

automata converge to the optimal actions before another change takes place. As discussed

previously, the automata pair in every vehicle is absolutely expedient (or optimal when using

linear schemes) in each stationary environment, and therefore, the autonomous vehicles are

guaranteed to reach an intelligent decision. Once a decision is made and sent to the regulation

layer, corresponding actions are fired and the physical environment changes, forcing the automata

environment to switch to another state.

ALNG ELNG

Regulation

Layer

ALAT ELAT

Vehicle

Physical

Environment

Vehicle

ALAT ELAT

Regulation

Layer

ALNG ELNG

It is important to realize that lateral or longitudinal actions need not be fired for the

automata environment to switch from one state to another. For example, if two vehicles cruising

at their desired speed are in adjacent lanes, and if their idle actions SiL and SM are optimal, the

physical environment may change due to the speed difference between the vehicles. The moment

that one of the vehicles clears the others side sensor detection area, the lateral automata

environment for both vehicles change. Similarly, when ideal actions are fired, the physical and

automata environments may not change. The interactions between the actions and the physical

environment, and the physical and automata environment, are fairly complicated. In the next

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 122

section, we will introduce a representation scheme that will facilitate analysis of the changes in

the physical environment in relation to the automata environment.

Although illustrating vehicle interactions as automata games for every (stationary) instance of the

automata environment is not feasible, it may be possible to define a similar matrix for all the

actions of autonomous vehicles. Consider a situation where two autonomous vehicles interact via

their sensors, and communication (or signaling) devices. The physical presence of one vehicle is

affecting the automata environment of the other via range sensors. Note that the vehicles are not

actually aware of the presence of others: the automata in each vehicle are simply trying to find

the best action to take given a set of feedback responses from the modules. For example, consider

two vehicles cruising at their desired speeds and occupying the spot next to each other without

any other vehicles in the sensor ranges (Figure 7.3a). Let us also assume that the vehicles have no

lane preferences, resulting in a constant reward from the lane detection module. In this case, the

penalty-reward structure for both vehicles is given in Figure 7.3c. The matrix in this table gives

the conditions in a particular automata environment resulting from physical location, current

conditions, and predefined vehicle parameters. For example, lateral action SR for vehicle 1

receives penalty from the right sensor, and the combined response is a penalty. This penalty

response is shown as a shaded lower triangle in the matrix; reward responses are shown by white

triangles. Combined feedback responses are evaluated using the decision structure given in Section

5.4.1. Again, note that the entries in this matrix are the environment responses, not the

probabilities of receiving a penalty.

As seen from the Figure 7.3c, the only intelligent choice for vehicle 2 is the idle action

(SiL and SM) while vehicle 1 may shift the left lane or cruise on its current lane while keeping its

speed. Let us assume that vehicle 1 stays in its current lane, and it is moving faster than vehicle 2.

After a short period of time, the physical situation shown in Figure 7.3b is reached; vehicle 1

moves out of the left sensor range of vehicle 2. In the mean time, the idle actions may be fired

repeatedly in both vehicle. This new situation affects the automata environment as shown in

Figure 7.3d. The vehicles now have more choices: vehicle 2 may also shift left while it is possible

for vehicle 1 to shift right, left or stay in the middle lane. There are six pairs of possible (vehicle)

actions; five of these force the automata environment to switch to another state. The idle action

pairs (SiL and SM) do not affect the physical environment instantaneously; in fact as far as the

two vehicles are concerned, these actions do not change the automata environment at all. For our

analysis of vehicle interactions, all the physical situations where a vehicle is outside of another

vehicles sensor range lead to the same automata environment since the reward-penalty structure

does not change.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 123

(a) (b)

v2 v2

v1 ACC DEC SM SiL SL SR v1 ACC DEC SM SiL SL SR

ACC ACC

DEC DEC

SM SM

SiL SiL

SL SL

SR SR

(c) (d)

Figure 7.3. Situations for two interacting vehicles (a, b), and combined environment responses

(c, d) for situations (a) and (b) respectively (shaded triangles indicate a penalty; upper triangles

are associated with vehicle 2; optimal action pairs are indicated with black borders).

Consider a situation with three vehicles as shown in Figure 7.4a. Vehicle 1 and 2 are

autonomous; vehicle 3 is not automated and cannot sense nor communicate. It is just an obstacle

as far as the intelligent vehicles are considered. Vehicle velocities are given as V1 = V 3 > V 2.

Vehicle 2 has no lane preference while vehicle 1 wants to shift to the middle lane. However,

vehicle 1 cannot shift immediately to the middle lane since vehicle 2 is in the side sensor range

(Figure 7.4a); the automata environment for this situation is given in Figure 7.4d (The actions SM

and SiL are combined as a single action IDLE. If a lateral action other than SiL is chosen, the

row/column for combined action IDLE refers to the lateral idle action, and vice versa. If both SiL

and SM are chosen the table shows the OR-ed response).

Due to velocity differences, vehicle 2 drifts away from vehicle 1s sensor range (Figure

7.4b), and the automata environment switches (Figure 7.4e). In the mean time, the idle actions are

fired repeatedly. With the new environment, the number of possible actions for vehicles 1 and 2

increases, and lateral action SL becomes the optimal solution for vehicle 1. As a result, vehicle 1

changes lane (Figure 7.4c) which in turn causes another automata environment change (Figure

7.4f).

Using the same reasoning, we can establish which automata environment corresponds to

what physical situation-vehicle condition pairs. Since the automata in each vehicle uses optimal

or absolutely expedient algorithms, convergence to the optimal solution is guaranteed for all these

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 124

situations, provided that the automata have enough time to learn. Using this method, it is then

possible to predict how the vehicle will react to a specific physical situation. This will enable us

to define highway scenarios as described in the next section, and find solutions for intelligent

path planning.

v2

v1 ACC DEC Idle SL SR

ACC

3 V3

DEC

SL

Des. Spd. = Cur. Spd. 1 V1

Des. Lane = middle SR

(a) (d)

v2

v1 ACC DEC Idle SL SR

ACC

3 V3

DEC

2 V2 Idle

SL

1 V1

SR

(b) (e)

v2

v1 ACC DEC Idle SL SR

ACC

3 V3

DEC

2 V2 1 V1 Idle

SL

SR

(c) (f)

Figure 7.4. Changes in the physical (left) and automata environments (right): vehicle 1 shifts to

the middle lane.

Autonomous vehicles described in previous chapters are able to avoid collisions by making

decisions based on local information available to them. However, the resulting vehicle path may

not be the best solution toward the problem of congestion. Furthermore, some of the vehicle

paths may conflict, and prevent the vehicles from reaching their desired goals. The problem exists

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 125

for both hierarchical control and autonomous vehicle approaches. However, the second approach

is a decentralized control method in nature, and finding a solution may be much more difficult

than finding a globally optimal path strategy with a hierarchical architecture that has all the

information about the highway situation.

This problem with the autonomous vehicle approach has not yet been answered by the

previous research efforts, while the hierarchical control structure inherently possesses the

methodology to solve the problem. We visualize a possible situation with multiple interacting

vehicles as a sequence of environment states. For all states of the physical environment which

includes the positions of the vehicles and current parameters defining their behavior a

corresponding automata environment can be defined. The automata environment is analyzed to

predict possible physical environment changes. These changes will be illustrated as state

transitions. State diagrams formed using possible environment state transitions can then be used

for analysis as well as design purposes.

Consider two vehicles sharing a 3-lane highway. The possible physical situations are

given in Figure 7.5. Besides the relative lateral positions, we assumed that only three possibilities

exist for relative longitudinal positions. The distinguishing factor between these positions is

whether a vehicle is in the side sensor range or not. Also note that for each state given in Figure

7.5, there is a reciprocal state with switched vehicle positions, denoted by an asterisk (e.g., B1*;

therefore, total number of states is 24). Figure 7.5 is basically a list of situations that are

interesting for the analysis of vehicle interactions; there may be more situations. Appendix D.1

shows the relative positions of the two vehicles in a larger physical environment. Situations

where the vehicles fall outside of the sensor range are combined into a single state, and further

simplification gives the twelve situations given in this section.

Note that for any physical environment state in Figure 7.5, there may be multiple

corresponding automata environments due to several factors such as desired speed, desired lane

as indicated in Figure 7.3 and 7.4.

To analyze the behavior of autonomous vehicles and the conflicts resulting from their

interactions, we define highway scenarios. A scenario is a specific situation with physical

locations of vehicles, their sensor outputs, and internal parameters such as desired lanes and

desired speeds. Once we know the automata environment at the beginning of a scenario, we can

predict the (state) changes in the physical environment. Then, all possible changes are combined

to form a state diagram showing the progress of the physical environment. The transitions from

one state are the direct results of the automata environment given by the matrices such as those in

Table 7.4.

To illustrate the idea, let us consider the situation A1 in Figure 7.5. Assume the

probabilities of possible vehicle actions are equal. For example, vehicle 1 may shift right, move

ahead of the vehicle 2, slow down or do nothing while vehicle 2 keeps its speed and lane the

same. These vehicle actions will cause transitions from A1 to A3, B1, C1 and A1 respectively.

All possible transitions between defined states are shown in Figure 7.6. The links indicated with

* show a transition to a reciprocal state, e.g., D1 B1*, B2 D2* or B3* D1*.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 126

1 1 1

A1 B1 C1 D1

1 1 1 1

A2 B2 C2 D2

1 1 1

1

A3 B3 C3 D3

Figure 7.5. Possible physical environment states for 2 vehicles in 3-lane highway.

A3

A1

A2

C3 C1 C2 B2 B1 B3

*

*

D2

*

*

D1

D3

highway; reflexive transitions are not shown (see also Figure 7.5).

Assuming equal probability for all vehicle actions in state A1, we can define a transition

matrix as follows. At state A1, vehicle 1 can shift right, move ahead, stay back or keep its speed

and lane. Similarly, vehicle 2 can shift left, move ahead, stay back or keep its speed and lane. We

assume that vehicles do not take actions simultaneously, i.e., only one vehicle can fire an action at

a given time. Thus, there are eight possible transitions with equal probability (Table 7.3).

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 127

Actions Transition to

Vehicle 2 idle, Vehicle 1 move ahead B1

Vehicle 2 idle, Vehicle 1 slips back C1

Vehicle 2 idle, Vehicle 1 idle A1

Vehicle 2 idle, Vehicle 1 shifts right A3

Vehicle 1 idle, Vehicle 2 move ahead C1

Vehicle 1 idle, Vehicle 2 slips back B1

Vehicle 1 idle, Vehicle 2 idle A1

Vehicle 1 idle, Vehicle 2 shifts left A2

Table 7.3. Action-transition pairs for state A1.

Therefore, the probabilities of transition from state A1 to states B1, C1, A2 and A3 are

0.250, 0.250, 0.125, and 0.125 respectively. The environment stays at state A1 with probability

of 0.250. All other transitions for state A1 are not possible. Similarly, all probabilities of

transition for other states can be written, and the transition matrix T becomes:

A B C D A* B* C* D* K

A1 14 1 1 1

0 0 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 2

8 8 4 4

A2 2 2 2

9 9

0 0 9

0 0 9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9

A3 1 0 2

0 0 2

0 0 2

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

91 9

1 1

9

1

9 9

B1 4 0 0 2 8 8

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

B2 0 2

0 1 4

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1

0 0

9 9 9 9 9

B3 0 0 2 1

0 4

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

0 1

0

9 9 9 9 9

C1 14 0 0 0 0 0 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 8 8

C2 0 2

0 0 0 0 1 4

0 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0

9

2

9

1

9

4

9

1

9

1

C3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

9 9 9 9 9

D1 0 0 0 0 0 0 0 1 1 2

0 0 0 0 0 1 1

0 0 0 0 0 0 0 1

10

1

10 5

1

10

1

10 5

1

D2 0 4

0 0 0 0 0 0 8

0 0 2

0 0 0 0 8

0 0 0 0 0 0 0 0

D3 0 0 0 0 0 0 0 0 1

0 0 1

0 0 0 0 0 1

0 0 0 0 0 0 1

8 2 8 4

A1* 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1

0 0 1

0 0 0 0 0 0

4 8 8 4 4

A2 * 0 0 0 0 0 0 0 0 0 0 0 0 1 2

0 0 2

0 0 2

0 0 0 0 2

9 9 9 9 9

A3* 0 0 0 0 0 0 0 0 0 0 0 0 1

0 2

0 0 2

0 0 2

0 0 0 2

9

1

9

1 1

9

1

9 9

B1 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4 2 8 8

B2 * 0 0 0 0 0 0 0 0 0 1 1

0 0 2

0 1 4

0 0 0 0 0 0 0 0

9 9 9 9 9

B 3* 0 0 0 0 0 0 0 0 0 1

0 1

0 0 2 1

0 4

0 0 0 0 0 0 0

9 9 9 9 9

C1 * 0 0 0 0 0 0 0 0 0 0 0 0 1

0 0 0 0 0 1 1 1

0 0 0 0

4 2 8 8

C2 * 0 0 0 0 0 0 0 0 0 0 0 0 0 2

0 0 0 0 1 4

0 1 1

0 0

9 9 9 9 9

C 3* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2

0 0 0 1

0 4 1

0 1

0

9 9 9 9 9

D1* 0 0 0 0 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2

0 0 1

10

1

10 10

1

10 5

1 1

5

D2* 0 4

0 0 0 8

0 0 0 0 0 0 0 0 0 0 0 0 0 0 8

0 0 2

0

D3 * 0 0 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

0 0 1 1

8 8 2 4

K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 128

Each row of the matrix T shows the probabilities of transitions from a state, and sums up

to 1. Such a matrix is called a stochastic matrix. As seen above, it also includes the reciprocal

states obtained by changing the locations of the vehicles in states shown in Figure 7.5. The last

row and column of the matrix corresponds to a collision state, denoted by K.

It is also possible to consider simultaneous vehicle actions. For example, if vehicle 1 shifts

to the middle lane while vehicle 2 move ahead in state A1, a transition to state C3 will occur. The

probability of simultaneous actions is relatively small, and also, even if this is the case, the

transition may be thought of as a sequence of to separate transitions from state A1 to C1 or A3,

and then to state C3. Thus, we assumed that only one vehicle may take an action at one time.

For the scenario above, we may assume that the probability of transitions are constant for

all steps. If the probabilities of finding the environment at a given state are known, the row vector

which consists of these probabilities can then be multiplied with the transition matrix to evaluate

the probability distribution between states at the next time step. Note that the use of the

transition matrix suggests that probability of the environment being in one state at time n

depends only on the probability distribution at the previous step n-1. Therefore, the state

transitions of the physical environment given above constitute a Markov Chain.

Let us assume that the environment is at state A1 initially (n = 0). The row vector

defining the probability distribution for the states is then:

[

s(0) = 1 0 0 1 x25 ]

The probabilities of being at a specific state is then calculated by:

A1 A2 A3 B1 B2 B3 C1 C 2 C 3 K

[

s(0) T = s(`1) = 0250

. 0125

. 0125

. 0250

. 0 0 0250

. 0 0 0 ]

For consecutive steps, the probability of being at state K increases. At n = 20, we have:

s = [23 15 15 23 18 18 23 18 18 11 7 7 7 6 6 7 8 8 7 8 8 10 7 7 716].10-3

The state K for this Markov chain is an absorbing state 6. Then, we conclude that two vehicles

randomly choosing one of the possible actions in a three-lane highway will collide sooner or later,

if, of course, our probabilistic model is correct.

Now, let us consider the situation A1 in Figure 7.5, now with two intelligent vehicles equipped

with the sensor and teacher modules, except for the flag structures discussed in Chapter 5. The

speeds of the vehicles, and their lateral positions are the same. Vehicle 1 wants to shift to lane 3,

and vehicle 2 to lane 1 (Figure 7.7). Since they are initially traveling at the same speed, with the

A subset C of the state space for a Markov process is said to be closed if for every state si C and sj C, and the

6

transition probability tij is zero. If the subset C consists of a single state, that state is called an absorbing state. Only

state K has that probability in our example.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 129

basic sensors and modules, there are no decisions/actions that would lead to a goal state (or

states). Starting at state A1, possible transitions are to states A1, A2, and A3. For transitions to

states A2 or A3, one of the vehicles must fill its memory vector with a lane shifting action much

faster than the other. If one of the pinch modules detects the other vehicle signaling for lane shift,

the state A1 will be permanent, and idle action SM will fire repeatedly.

Des Lane = 3 1 V Lane 1

Lane 2

Des Lane = 1 2 V Lane 3

If the vehicles were to change their speed, the transitions shown in Table 7.4 are possible

with the current internal parameters. Figure 7.8 shows the corresponding state diagram where

arrows indicate the direction of transition. Two possible chains with the solution states as

absorbing states B1* and C1* are distinguished with solid and dashed lines.

From To

Chain 1

A1 A1,A2, A3

A2 A2

A3 A3

Chain 2

B1 B1,B2, B3

B2 B2,D1*, D2*

B3 B3,D1*, D3*

D1* D1*,C2*, C3*

D2* D2*,C2*

D3* D3*,C3*

C2* C1*, C2*

C3* C1*,C3*

C1* C1*

Chain 3

C1 C1, C2, C3

C2 C2, D1, D2

C3 C3,D1,D3

D1 D1,B2*, B3*

D2 D2,B2

D3 D3,B3

B2* B1*, B2*

B3* B1*,B3*

B1* B1*

Table 7.4. Possible transitions for Scenario 1.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 130

A3

A1

A2

C3 C1 C2 B2 B1 B3

*

*

D2

*

*

D1

D3

Figure 7.8. Possible chains for Scenario 1 (reflexive transitions are not shown; two

chains are distinguished with dashed and solid lines).

From Table 7.4 and Figure 7.8, it is obvious that if the initial state is A1, the vehicles will

not be able to reach their desired lanes under the current circumstances. The final situation is then

A1, A2 or A3. The states C1* and B1* represent the goal state for this situation and are

reachable from all other states in their respective chains. Furthermore, these two states are

absorbing states for chains 2 and 3 respectively. Given the vehicles desire to shift lanes, the

physical environment is guaranteed to switch to states C1* or B1* starting at any state in chain 2

or 3. Therefore, all we need to do is to force the physical environment to switch from A1, A2 or

A3 to any of the states in chain 2 or 3. Possible transitions are from Ai to Bi or Ci where i =

1,2,3. All these transitions require a speed change for at least one of the vehicles.

Consider the situation A1 with the automata environment given in Figure 7.9a. If the

penalty-reward structure can be changed to one of the matrices shown in Figures 7.9b-7.9e, or

any suitable combination of longitudinal actions, the physical environment will then switch to

state B1 or C1 depending on the chosen action(s). If at least one of the vehicles can be forced to

change its speed by changing its automata environment, the physical environment will switch

after some time, leading to a goal state. The same reasoning is true for states A2 and A3.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 131

v2 v2 v2

v1 ACC DEC Idle SL SR v1 ACC DEC Idle SL SR v1 ACC DEC Idle SL SR

ACC ACC ACC

SL SL SL

SR SR SR

(a) (b) (c)

v2 v2

v1 ACC DEC Idle SL SR v1 ACC DEC Idle SL SR

ACC ACC

DEC DEC

Idle Idle

SL SL

SR SR

(d) (e)

Figure 7.9. Possible penalty-reward structures (b-e) to force physical environment to switch

to states B1 or C1 from current state A1.

Therefore, in order to introduce a change to the automata environment, the flag structure

given in Section 5.3.1 is designed. If a vehicle cannot shift to its desired lane in a predefined time

interval, a flag is set. This flag changes the vehicles desired speed, usually to a value smaller than

the current speed. The learning automata environment then changes as shown in Figure 7.9b or

7.9c; the transition in physical environment state comes later, attaching the states Ai to chain 2 or

3 (Figure 7.8). Consecutive state transition are automatic for this situation.

In this section, we consider three intelligent vehicles in a situation similar to the one given in

previous section. As seen in Figure 7.10, the speed and lateral positions of the vehicles are the

same, and their desired lane parameters create a conflict. Similar to scenario 1, the solution to the

problem lies in changing the relative speeds of the vehicles. Again, the lane flag is used to decrease

the speeds of vehicles 1 and 2 to a smaller value than that of vehicle 3.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 132

Des Lane = 2 1 V Lane 1

Des Lane = 3 2 V Lane 2

Des Lane = 1 3 V Lane 3

All possible environment states for three vehicles on a three-lane highway are given in

Appendix D.2. Again, assuming that only one vehicle take can take an action at a time, the state

transitions leading to a solution would be similar to the one given in Figure 7.11. This transition

diagram only shows a solution when vehicles 1 and 2 slow down. A few other solutions are also

possible if different speed adjustment are considered.

C2 C3 D5 K2 J3 F2

C6 A5

Figure 7.11. A possible chain for Scenario 2: lane flag forces vehicles 1 and 2 to slow down.

All the transitions except the first one are automatic under current circumstances. For the

first transition, on the other hand, the lane flag needs to be set in at least one vehicle (if it is

vehicle 3), breaking the symmetry. The problem and the solution for this case is similar to the

two-vehicle situation given in the previous section. This is not a coincidence; it is due to the

superposition of two two-vehicle situations.

The term superposition indicates that a three-vehicle situation given here can be treated

as three separate two-vehicle interactions. In terms of the two-vehicle states, the state transition

diagram above can be written as separate transition diagrams using the two-vehicle situations

previously described, as shown in Figure 7.12.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 133

Scenario 2

Vehicles: C2 C3 D5 K2 J3 F2

1, 2 and 3

C6 A5

Scenario 1a

Vehicles: A2 A2 A1 A3 A3 A3

1 and 2

A1 A1

Scenario 1b

Vehicles: A1

1 and 3

C1 C1 C3 D1 B2*

C2 D2

Scenario 1c

Vehicles:

2 and 3

A3 C3 D3 D3 B3* B1*

B3* B1*

two-vehicle transition diagrams using the definitions in Figure 7.8.

As seen from Figure 7.12, the three-vehicle situation is nothing more than three

asynchronous Markov chains representing two-vehicle situations. In Figure 7.12, the

corresponding two-vehicle states are aligned with the three-vehicle environment states.

Transitions that need to be forced by the lane flag are shown in gray, and they are (and must be)

between corresponding states in both three-vehicle and two-vehicle transition diagrams. The two-

vehicle scenario including vehicles 1 and 2 is automatic, i.e., there are no conflicts. The other two

scenarios both have a synchronous forced transition.

Similarly, for the three-vehicle scenario of Section 5.3.1, it is possible to view the

situation as three two-vehicle transition diagrams. Figure 7.13 shows the corresponding two- and

three-vehicle chains. Again, the transition breaking the symmetry occurs at the between the

corresponding states of three- and two-vehicle scenarios. When three-vehicle situation is forced

from state K2 to state L2, two-vehicle scenario of vehicles 1 and 2 is forced from state A3

(corresponding to state K2) to state B3* (corresponding to state L2). Two-vehicle scenario

including vehicle 1 and 3 is automatic, the one between 2 and 3 is not significant, i.e., there are no

conflicts, nor transitions. The situation between vehicle 1 and 2 is the defining scenario.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 134

Scenario 2

Vehicles: K2 L2 J5 D1

1, 2 and 3

Scenario 1a

Vehicles: A3 B3* D1 C2

1 and 2

Scenario 1b

Vehicles: D3

1 and 3

D3 C3 C1

Scenario 1c

Vehicles:

2 and 3

C3 C3 C3 C3

two-vehicle diagrams for the example in Section 5.3.1 (See Figure 5.17).

Analysis of the example situations given in this section indicates that it is possible to

define complex situations of multiple interacting vehicles as a group of many (conflicting and

non-conflicting) two-vehicle situations. A complex scenario is nothing more than a superposition

of multiple two-vehicle scenarios. When the state transitions of significant two-vehicle scenarios

corresponding to the transition diagram of the N-vehicle (N > 2) scenario are studied, it is seen

that the key transition that breaks the symmetry in N-vehicle situation is synchronous with at

least one of the two-vehicle symmetry-breaking transitions 7.

As we discussed in the previous section, a scenario of multiple vehicles can be treated as multiple

asynchronous two-vehicle transition diagrams. In the previous example, interaction between

vehicle 2 and 3 was not significant since their relative positions and speeds, as well as other

parameters, did not change during the time interval of interest. Similarly, the example with four

vehicles given in this section will be equivalent to a single two-vehicle situation due to relative

position and speed of vehicles. This example also emphasizes the need for a speed flag as

described in Section 5.3.2.

Consider the situation given in Figure 7.14. Vehicles 3 and 4 are traveling at their desired

speed and lane, unaware of the fact that vehicles 1 and 2 are approaching. Assume that speeds of

vehicles 1 and 2 are equal to each other and greater than those of vehicles 3 and 4 while vehicles 3

and 4 are traveling at the same lateral position with equal speeds. This situation is the most

complicated among similar situations. For vehicles 1 and 2 to keep their desired speed, it is

necessary that they shift to the middle lane and pass vehicles 3 and 4. Although they will be able

7

The term synchronous here means that corresponding transitions occur between the corresponding states in N-

and two-vehicle scenarios.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 135

to slow down to avoid collisions, the fact that they do not have a lane preference will prevent

them from shifting to the middle lane.

Des. Lane = 2

Des. Lane = none 1 V1 3 V3 Lane 1

Lane 2

Des. Lane = none 2 V2 4 V4 Lane 3

Des. Lane = 3

For our analysis of the situation, we only consider vehicles 1 and 2 while defining the

environment states since interactions of vehicles 3 and 4 with others are unaffected by any

possible actions vehicles 1 and 2 may take. Therefore, by analyzing the interactions between the

first two vehicles, we must be able to find a solution to this conflict. The states in which we are

interested are listed in Appendix D.3. Initially the environment is in state B2. Since vehicles 1

and 2 are traveling at the same speed and faster than vehicles 3 and 4, the transition to state D1 is

automatic (Figure 7.15).

B2 D1 C1 D1

F2 G2

At state D1, two transitions that will solve the conflict, but need to be forced, are

transitions to form state D1 to states C1 and F2. From these states, the chain will move to goal

states C3, E4, or G3 (see Appendix D.3). Then, vehicles 1 and 2 will increase their speed and

pass other vehicles. Since the desired lane for vehicles 1 and 2 are not set, another method needs

to be found to force the environment state to switch to one of the chains leading to a goal state.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 136

The solution to the problem is what we defined as speed flag in Section 5.3.2. Under current

circumstances, vehicles 1 and 2 both fire their idle actions repetitively at state D1. With the

addition of the speed flag, both vehicles decide to shift to another lane after a predefined time

interval. Since shifting to the middle lane becomes the only possible lateral action, one of the

states C1 or F2 is reached.

If both vehicles start filling their memory vectors with conflicting actions, then the pinch

modules will start returning a penalty for these actions. At this point one of the vehicles will

decide against shifting to the middle lane, and the other will be the first to shift as described in

Section 5.2.2, therefore changing environment state to either C1 or F2. Figure 7.16 includes the

mpeg movies of the two separate runs for this highway scenario where vehicle 1 and 2 mutually

find a solution through their pinch modules. The time interval for the speed flag is chosen as 6

sec. In the first simulation (Figures 7.16a, c, e), vehicle 1 shifts to the middle lane immediately

after the speed flag is set, returning a penalty response to lateral action SiL. In the second

example (Figures 7.16b, d, f), although the flag is again set after 6 seconds, vehicles 1 and 2 need

some time to negatiate the shift via their pinch modules. Since the last deviation from the

desired speed of 85kmh is approximately the same for both vehicles (Figure 7.16b), they attempt

to shift to the middle lane at approximately the same time, leading to a pinch condition.

Therefore, reaching the solution takes slightly more time in the second simulation.

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 137

90 90

88 88

86 86

84 84

82 82

80 80

78 78

76 76

74 74

72 72

0 10 20 30 0 10 20 30 40

time (sec) time (sec)

(a) (b)

Vehicle Lanes Vehicle Lanes

4 4

3 3

2 2

1 1

0 10 20 30 0 10 20 30 40

time (sec) time (sec)

(c) (d)

ch7m1.mpg ch7m2.mpg

(0.065 Mb; every frame represents (0.177 Mb; every frame represents

one fifth of a second) one fifth of a second)

(e) (f)

Figure 7.16. Speeds (a, b) and positions (c, d) of vehicles 1 and 2 for scenario 3:

(left) vehicle 1 shifts first; (right) vehicle 2 shifts first. Mpeg movies of the

simulations are accessible via icons (e, f).

Figure 7.15 shows only two of the possible transitions from state D1; there are two other

possibilities switching to state B3 or to state C4. These two possibilities become a solution to

the problem at hand when both vehicles cannot shift lanes because of their pinch modules

respective penalty outputs. The probability of occurrence for this situation is very small;

however, it is not zero. If this is the case, the vehicles will not be able to change to the middle

Cem nsal Chapter 7. Interacting Vehicles: Rules of the Game 138

lane defined now as the desired lane by the speed flag interference. Then, the lane flag will be set

(see Section 5.3.1) after a time period, forcing the vehicles to slow down. Since the desired

speeds set by the lane flag are different, the symmetry will be broken, and one of the goal states

will be reached via state B3 or C4.

7.5 Discussion

This chapter introduced the treatment of path decisions for autonomous vehicles as interacting

automata, and then at a higher level, as interacting vehicles. In each intelligent vehicle, lateral and

longitudinal automata create an interacting system which can be visualized as an automata game.

In our application, this game is usually disjoint i.e., the mixed strategies of automata do not

depend on others strategy. Therefore, the optimal action pair is reached. For the case where the

longitudinal automatons actions affect the lateral automatons environment, the interacting

system again reaches the equilibrium point which happens to be the optimal solution for a given

environment.

The visualization of multiple automata interactions as games is based on the assumption

that the physical (and therefore, the automata) environment is stationary. Assuming that the

physical environment does not change for a relatively long period of time, and provided that the

learning rates for all automata are fast enough, it is possible to assign a stationary automata

environment to any physical environment consisting of locations and internal parameters of

vehicles in which we are interested for a given situation. Changes of the physical environment can

then be treated as a Markov chain whose transitions are direct results of the automata

environment evaluated for each state of the physical environment. Situations involving multiple

vehicles can be analyzed by investigating the transition diagrams resulting from a specific

situation, as described in Section 7.4.

Despite the fact that we used automata environments to evaluate a physical

environments state transitions, the method can also be used for other decision mechanisms. As

long as we can define a formal way of describing the decision/control procedure resulting from a

specific environment condition, the transition diagrams for physical environment states can be

created.

The treatment of the highway scenarios is based on 3x3 or 3x4 location matrices. Larger

matrices can also be used, however those may be redundant for our analysis purposes of the

situations given in Section 7.4. For a larger number of vehicles or different sensor definitions,

larger location matrices may be needed, but again, it is possible to represent multi-vehicle

scenarios as a simultaneous two-vehicle interactions as described in Section 7.4.2.

Chapter 8

Conclusions

This chapter opens with a brief discussion of the automated highway system as a possible

solution to current traffic problems and a short evaluation of the reinforcement schemes as an

automata learning process. The results and contributions of this dissertation are listed in the

second section. The chapter concludes with the recommendations for future research.

8.1 Discussion

The automated highway system envisioned by researchers and institutions around the world

suggests a complex structure that requires extensive research efforts. Among many issues, the

realization of AHS depends mainly upon the realization of intelligent vehicle control methods.

No matter what design approach is taken for vehicle control, there are two important issues that

will haunt the researchers. The first problem in front of the AHS is a social issue: user

acceptance. Without the user acceptance, the aim of automating the highway system is bound to

fail. We will briefly discuss the possible future of the AHS deployment in the light of the current

influence of the technology on society. The second problem is the technical feasibility of the

designed systems. While designing the intelligent path planner described in this work, the

transition from abstraction to implementation was one of our concerns. We will therefore briefly

discuss the technical issues that hinder the realization of an automated highway system.

We designed our intelligent vehicle controller using an AI technique called the learning

automata. The adaptation of the automata to existing conditions is provided by the reinforcement

schemes mentioned in previous chapters. We will thus conclude our discussion with a few ideas

on reinforcement schemes.

In the last five years, we have witnessed the birth of the automated highway system. Highway

automation efforts showed themselves in various forms of technological gadgets (e.g., automatic

vehicle identification, electronic toll collection, emergency signaling devices, video enforcement

systems, and visual vehicle detection systems [ITSW96, TrafT96]) as far as the user is

concerned1. The first question in the mind of those familiar with the computation and

1

The author of this dissertation has traveled more than 40,000 miles on US interstate highways while working on

design of intelligent vehicle and highway systems, and on the evaluation tools for intelligent transportation systems.

Cem nsal Chapter 8. Conclusions 140

communication aspects of AHS (and ITS) is the information content of such a system. Most of

the projects for full automation of the highway system envision total knowledge about the

highway and the vehicles.

Present AHS and ITS applications such as traffic monitoring, electronic toll collection and

license plate recognition are already employing identification/classification techniques that can be

used to generate information about the drivers/owners of the vehicles. Human factors research on

AHS systems shows that drivers would not want the infrastructure to know everything about

their actions, locations, and daily routines. This problem is called the big brother syndrome.

The objections to usage of this type of data include fear that the security of the data will be

compromised, and overall discomfort with the modern trend toward routine collection of personal

data. Initial system analyses indicated that check-in procedures envisioned for AHS

implementation may include processing of financial, medical, or driving records to verify

potential users' qualifications [Delco94].

The information content of the data collected for AHS purposes is very important for the

user acceptance of the automated highway system. While the collection of driver data may be

perceived as an invasion of privacy, some of the problems may be solved by choosing the least

invasive AHS implementation with minimal data storage. Such issues are important because they

may influence the design of the communication and processing functions of the (longitudinal)

control system for a hierarchical infrastructure [Delco94]. For the autonomous vehicle approach

that will most probably include some form of visual control system, similar problems exist.

With the current technological progress and its applications to daily life, the invasion of

privacy is becoming an important issue. A recent example is the implementation of a global

computer network, a.k.a. Internet or World Wide Web. More and more individuals are starting to

use the network in their daily routine, and in return, they are providing the WWW server, the

Internet service provider or the routing system with some form of information. This could be

login data, an e-mail message, or a server log of the visited pages with the address of the client

machine. In most cases, it is possible for a computer user to monitor the activities of another

from his console. A similar and more common example for today is the use of credit cards. Every

time a charge card is used, the corresponding computer system records the time, date, location

and type of the transaction which can later be sold to third parties. Yet, many are comfortable

with the accessibility that a charge card or an online computer system provides (to all interested

parties).

The use of AHS or ITS systems such as automatic toll collection or automatic vehicle

identification (or ATIS, ATMS, AVL systems in the near future) is no different than

technologies mentioned above. A trade-off between comfort and privacy exists, and will exist in

the future. The evolution of the AHS will stop at a level where the majority of the users are no

longer willing to trade their privacy with the additional comfort the system may bring. As AHS

becomes more visible, its complexity may lead to misunderstandings and misrepresentations that

The reflections in this section are the results of his AHS experience as a researcher and his driving experience as a

user of the present highway system.

Cem nsal Chapter 8. Conclusions 141

may jeopardize public acceptance [Batelle94]. Considering the recent mistakes made with the

latest technological revolution, the Internet, it is obvious that a balanced and accurate picture of

AHS must be presented to the public.

Recent investigations addressing the user acceptance of AHS, and legal and technical

privacy issues in AHS technologies can be found in [Stevenson95], [Wright95], [Garfinkel96],

[McGurrin96], and [Wigan96].

The technology needed to create an intelligent vehicle control system is already available,

although still expensive for full implementation. The term technology here refers to our ability

to implement the necessary components for vehicle control. On the other hand, vehicle control

problems related to AHS have been vigorously investigated by many researchers, as discussed in

Chapter 2. However, the research on sensing and communication methodologies for AHS is

relatively new, and many problems remain. These problems are solvable, yet they require

significant research effort for safe and successful implementation of AHS. Several observations

and ideas related to technical feasibility of the AHS implementation are discussed in this section.

Since the vehicles in an automated highway will be autonomous, they will require on-

board sensing equipment to gather information about their surroundings. Sensing is vital for AHS

approaches that do not consider hierarchical control structures where most of the information is

relayed from the higher levels of the hierarchy. Non-hierarchical control approaches, on the other

hand, do require simple communication systems. No matter what the sensing and communication

requirements are for a specific realization of AHS, several issues such as headway measurements,

lateral sensing, local communications, and sensor fusion must be addressed.

The most important issue for autonomous vehicle navigation are the sensor capabilities.

The question of can we trust our observations? needs to be answered. For example, the

headway measurement is the heart of longitudinal vehicle control. There are several different

technologies such as sonar, microwave radar and optic sensors for headway sensing, but a single

method is not reliable for all environment conditions. Many researchers were able to implement

longitudinal and lateral control methods [Lui89, zgner95, zgner96, Pomerlau96, Weber96],

however operability under various environmental conditions is still an important design problem.

The solution seems to be the sensor fusion. Although more reliable observations are possible by

combining distinct sensor information, finding an intelligent way of fuse sensors is very

difficult. Furthermore, an autonomous vehicle must have redundant sensors, and an additional

back-up mechanism to guarantee safe operation [Singer95, Agogino96]. Thus, the complexity and

the cost are the two main factors in autonomous/intelligent vehicle implementation.

Current commercially available sensing technologies include radar-based headway sensors

for adaptive cruise control, global positioning sensors for intelligent navigation, and vision

systems for active driver support for lane keeping [ITSW96, TrafT96]. Magnetic sensors for lane

keeping, adaptive vision systems for lateral and longitudinal vehicle control, laser radar for

Cem nsal Chapter 8. Conclusions 142

headway control, and proximity sensors for lane departure warning are currently used in research

vehicles [zgner96, Pomerlau96, Tan96, Weber96, Yanagisawa92, Construction96].

The research on sensing technologies is gaining momentum due to the ongoing research on

AHS, and due to the fact that most of these technologies find quick commercial applications. The

research on sensing technologies for AHS was initially driven by the robotics field; during the last

few years more research has been carried out specifically for intelligent vehicle applications. The

sensor capabilities for autonomous navigation will increase with the investigation of new

alternatives and the progress of the previous technologies.

Communication with the roadside and other vehicles is another important factor for AHS

implementation. Again, a reliable communication system is vital for a hierarchical control

approach. A hierarchical control structure requires three types of communications:

communications for control, maneuvers, and navigation information. For the first type, high

bandwidth is required. Furthermore, the loss of data must be minimized to maintain stability of

platoons, and to assure collision-free lane changes [Foreman95]. Communicated data must be

trustworthy. A real-life analogy to the problem of reliability of the received information is to

wait at the intersection until another car approaching from left makes a right turn although its

right turn signal flashes while approaching the intersection. This type of behavior is a clever one

on the part of a human driver and it is useful in avoiding collisions, but it is against everything

AHS tries to achieve. An automated system repeatedly checking every bit of information against

errors cannot be successful in enhancing the traffic flow. On a greater scale, a hierarchical

automated highway system must optimize traffic flow by coordinating vehicle actions in an

intelligent way. The need to increase traffic flow theoretically increases the risk of accidents. The

reason for this is a decreasing possibility of recovering from a wrong decision due to a smaller

reaction window.

Commercial communication systems for AHS and ITS include new radar detectors with

road hazard and emergency vehicle signals, traffic information systems, fleet-management

systems, in-vehicle mobile offices and on-line services, MAYDAY devices and emergency

services, and electronic toll collection [ITSW96, TrafT96]. On the other hand, vehicles currently

used for research purposes are equipped with 2.5GHz band radio transceivers, vehicle-to-vehicle

radio modem communications [Construction96, PATH96]. Ultra wide band communication is

also considered as an alternative [James96]. The network architecture for vehicle-to-vehicle

communications is currently being defined [Bolla96, Fuji96], and related problems such as

interference and scattering are being investigated [Gavan96, Zoratti95].

Progress in the electronics filed, especially in wireless communications, will positively

affect the deployment of AHS2. AHS is actually a combination of many engineering fields. For

example, the design of the ITS/AHS technologies are carried out by the government organizations

in conjunction with the Institute of Electrical an Electronics Engineers (IEEE), the Society of

2

AHS will use emerging technologies as soon as they become available during its evolution to fully automated

highways. For example, drivers with cellular phones are already acting as probes in traffic via emergency phone

lines. Such an information source was not available ten years ago.

Cem nsal Chapter 8. Conclusions 143

Automotive Engineers (SAE), the Institute of Transportation Engineers (ITE), and the American

Society for Testing and Materials (ASTM) [ITSst96]. National AHS Consortium consists of

many aerospace and automotive companies as well as civil, mechanical, electrical engineering, and

computer science departments of several educational institutions. AHS is a multi-disciplinary

research filed. In addition to technologies mentioned above, its realization depends on many

engineering applications such as systems and control, fiber optics, automotive design, data

mining, data scheduling, human factors. It slowly creates its own niche in many related fields of

engineering boosting the research efforts and resources.

Automotive companies are hoping to make AHS technologies such as cruise control and

lane departure warnings commercially available in the next five years. Lane tracking and vision

enhancement applications are expected to be available in ten years [Cogan96]. It is interesting to

note that the current AHS research has already found answers to the problems related to these

applications. The transition from research to commercial availability is only a matter of cost

reduction. Considering the rate of technological progress in our century, the future of the AHS

technologies looks bright.

Returning to our assumptions on the vehicle capabilities, it is fair to say that the

assumptions made on sensing and communications for autonomous vehicles are feasible, but not

widely available.

The intelligent vehicle path controller discussed in this work is based on learning automata

techniques. Learning stochastic automata are capable of adapting to changing environments by

using reinforcement schemes described in Chapters 3, 4 and 6.

Vehicle control in the context of AHS is a very complex problem. Automated vehicles are

part of a dynamic environment where the control decisions must be made very quickly and by

coordinating with other vehicles. In an environment where there are multiple fast-moving

vehicles, making the right decision to avoid collisions and optimize the traffic flow is difficult.

The inherent complexity of the AHS has forced researchers to look for new methods

capable of creating flexible and powerful high level control systems. The artificial intelligence

methods mentioned in Section 1.2.1 are the results of this need. Precursor System Analyses for

AHS concluded that these techniques, especially knowledge-based systems and learning methods,

can make a strong contribution to AHS especially in the areas of malfunction detection and fault

tolerance, high level control of vehicle actions, overall traffic management, and sensor fusion

[Rayteon94]. The work presented here attempts to make contributions to three of these

problems except the traffic management.

In any eventual AHS, there will be widely varying capabilities among the possibly

millions of AHS vehicles because there will be a mixture of new and old models, plus there will be

widely varying levels of maintenance. Moreover, the world is complex and unexpected things can

happen, such as a moose entering the highway, or a refrigerator falling off of a truck?

[Rayteon94] It is clear from this argument that the AHS environment will be more chaotic than

Cem nsal Chapter 8. Conclusions 144

structured. It will not be easily predicted, nor analyzed. In the case of mixed traffic (human

drivers and automated vehicles coexisting on the same highway), the problem grows enormously.

As a result, an automated vehicle must be able to deal effectively with a large number of possible

traffic situations.

The last statement above emphasizes the importance of learning and adaptation. What is

needed is a system that can handle situations unforeseen by the designers. For this type of

capability, the choice of artificial intelligence technologies is obvious. While expert systems, and

knowledge-based planning [Forbes95] are effective, learning methods are capable of discovering

new situations and optimal responses using simulated environments [Sukthankar96].

One of the most important characteristics of the reinforcement learning is the choice

between the actions that are known to return positive responses and the actions with unknown

performance. For example, a learning automata starting with a priori action probability values

needs time to evaluate the actions: it learns from its mistakes.

This process can be examined from two different points of view [Ashby60]. It can be

thought of simply as an attempt to find the best solution. In case of failure, the attempt is not

rewarded (or is penalized). This view of the learning process is not very attractive. On the other

hand, the trial and error method is invaluable in gathering information necessary to achieve

optimal action. Choosing non-optimal actions to gain information improves the long-term

performance while resulting in temporary penalty responses from the environment. Learning

automata is one of the successful examples of such methods.

Another important aspect of the reinforcement learning is that the actions performed by

an automaton influence the environment response that the automaton will receive in the future.

Rather than receiving an independently chosen set of action-response pairs, the agent has some

form of control over what response it will receive and complete control over what actions will be

generated in response. In addition to making it difficult to make distributional statements about

the inputs to the agent, this degree of control makes it possible for what seem like small

experiments to cause the agent (automaton) to discover an entirely new part of its environment

[Kaelbling94].

The results of the work described in this dissertation can be divided into two groups: (a) learning

automata and reinforcement schemes, and (b) intelligent vehicle control and path planning. This

section lists the significant results of the completed work. The contributions are indicated

together with the associated results.

The application of learning automata techniques to intelligent vehicle control for

automated highway systems provided the following conclusions about learning automata,

reinforcement schemes, and games of automata:

The reinforcement scheme is the heart of the learning process, and its characteristics define

the behavior of the automata. However, besides the reinforcement scheme, the definition of

Cem nsal Chapter 8. Conclusions 145

the reward-penalty structure (i.e., the teacher) is vital in the success of the learning automata

application. The automata cannot be expected to learn the best actions with a defective

teacher structure. The significance of the reward-penalty structure is emphasized in

environments with multiple teachers. We have spent a significant amount of time to design a

function that intelligently combines multiple environment responses.

The lack of applications using continuous environment response is well understood. The

problem of combining such responses as well as the difficulty in finding a meaningful

representation of the several factors with the same measure hinders the use of S-model

environments. There are no known applications that use this model.

The use of learning algorithms in our application is based on the assumption that the physical

environment around an autonomous vehicle can be associated with a stationary automata

environment (or multiple switching automata environments). We assume that the automata

adaptation to the environment is faster than the changes in the physical environment.

Although the current computational technology is sufficient for such an assumption, we

defined two new reinforcement schemes for our purposes. These new algorithms have

convergence rates up to four times faster than previously designed counterparts, especially in

situations frequently occurring in AHS applications.

One of these new reinforcement schemes has been previously known. However, its

convergence characteristics have not been proven due to the difficulty in analytical modeling

the learning process. By using nonlinear methods and a special condition resulting from our

design, we were able to prove optimality of the algorithm.

The second algorithm is also the result of our attempts to improve the convergence

characteristics of the learning automaton. It is an extension of a previously designed nonlinear

reinforcement scheme, but performs better. This algorithm is proven to be absolutely

expedient.

Our work on the design and analysis of an intelligent vehicle control using learning

automata enables us to conclude the following:

The intelligent vehicle path controller consisting of two stochastic learning automata is

capable of adapting to its dynamic environment by choosing the best (lateral and longitudinal)

actions to be sent to the lower level of control.

This non-model based approach would be especially useful in situations wherein the

complete knowledge about the overall traffic flow is not provided by higher level of the

control hierarchy (if such levels exists at all). Instead of trying to foresee all possible traffic

situations, we take the approach of defining a mechanism that can make intelligent decisions

based on local sensor information, keeping in mind the fact that the initial phases of the AHS

will include non-automated vehicles as well as intelligent vehicles capable of communicating

with others.

Cem nsal Chapter 8. Conclusions 146

The method is capable of capturing the overall dynamics of the system that include the

vehicle, the driver and the roadway. Definitions of the learning and sensor parameters

determine the behavior of each vehicle.

Initial results indicated that capabilities of the vehicle sensors must be extended. As presented

in Chapter 5, these extensions are not too tasking, although some may be relatively expensive

to realize. A simple rate-feedback method for the headway distance measurements can

overcome the oscillatory behavior in longitudinal control.

Simulations of intelligent vehicles also indicated the need for additional information sources

besides the local sensors. No matter what control structure is used for AHS deployment,

some form of communications must exists between vehicles. Although visual clues can be

used to coordinate lateral actions, the lane changing capabilities of the intelligent vehicles as

well as the safety of the actions increase with local vehicle-to-vehicle communications. We

have found that, in order to avoid pinch situations, vehicles may coordinate their lane

changing actions by simply sending an intention signal to neighboring vehicles.

Our attempt to design an intelligent path controller inadvertently extended, to some degree, to

other levels of vehicle control. For example, we have found that if a higher level of

control/decision mechanism provides desired lane information, many local solutions (that are

not globally optimal) may be solved to optimize overall traffic flow. Furthermore, the

capabilities of the lower layers of control are significantly important when designing the

intelligent path controller. Although we did not consider feedback from the control layer that

actually carries out the vehicle actions, the design parameters of the lower layer affects the

design parameters of the path planning layer. All levels of control are interconnected and

cannot be treated as a single free-running entity.

There is a trade-off between what the automated vehicle can accomplish and how simple the

sensing/information system is. The more global the information content of the decision

mechanism is, the more the vehicle can accomplish autonomously. It is our belief that neither

a fully decentralized control method, nor a completely hierarchical system can solve the

problem completely. While a decentralized system will require some form of local

communications, a hierarchical system will suffer from being unable to foresee all conflicts

between vehicle paths.

Our approach to the vehicle path control problem can be viewed as an autonomous vehicle

approach. Automated vehicles are making their own decisions on which action to take by

using information from their own sensors and limited communication systems. Although each

vehicle is capable of avoiding collisions, the combined actions of multiple vehicles are not

always optimal. Previous research efforts in autonomous vehicles discussed in Section 1.2.1

have not considered multiple vehicle interactions. In this work, we also attempted to find a

structured methodology for the interactions of multiple autonomous vehicles.

The method of evaluating possible environment state transitions based on associated

automata environments enabled us to define additional decision mechanisms we called flags.

Cem nsal Chapter 8. Conclusions 147

Speed and lane flags are used to solve the conflict situations arising from the multiple teacher

responses and vehicle interactions

Although our method of evaluating the physical environments state changes are based on the

learning automata environment, similar methods can also be used with other decision

mechanisms. By formal descriptions of the decision/control procedure, transition diagrams

similar to those given in Chapter 7 can be created to analyze the highway situations.

8.3 Recommendations

AHS research, especially high level vehicle path control, is a relatively new research area, and

there are an extensive number of questions to be answered. There are multiple possibilities for

extending the work discussed in this dissertation. Here, we will emphasize some of these

possibilities that seem relatively important for realization of intelligent vehicle control. Our

recommendations for future research efforts on the subject are the following:

Simulation:

Simulation is an indispensable way of testing the effectiveness of vehicle control methods

since the system becomes highly complex due to the presence of large number of vehicles.

Furthermore, the ideas investigated in this work cannot be tested with real vehicles, partly

due to the reasons listed in Section 2.5, but mainly because of the futuristic character of the

problem. The overall system is neither continuous nor discrete-time; the combination of

vehicle dynamics and the computational modules forms a hybrid system whose design and

analysis is inherently difficult. For the work described here, we simplified the vehicle

dynamics drastically since the main purpose of this research was the study of learning

automata techniques for vehicle control. More realistic vehicle dynamics must be

incorporated with the computational system, and the study of the interaction between the

discrete and continuous parts of the overall system needs to be carried out for a precise

control system simulation.

Sensor modeling:

For the purpose of studying the learning automata as an intelligent vehicle controller, we

assumed almost perfect sensor models. Our sensor module subroutines in the simulation

incorporate small percentage of measurement error. Beyond the addition of the measurement

noise, we assumed that the sensors defined in this work can be implemented using the current

technology. More realistic models of the sensors based on existing radar, laser radar, infrared

and sonar techniques may be created to further the simulations ability to imitate real-life

situations. The issues of sensor degradation and back-up methods also have to be answered

for a fault-tolerant autonomous navigation applications.

High level vehicle path control:

Our study of vehicle path planner indicated that this level of vehicle control cannot be

designed nor analyzed without considering lower and higher level of the control methodology.

Especially for the hierarchical control structure, feedback from the regulation layer and the

Cem nsal Chapter 8. Conclusions 148

information sent from the link layer of the hierarchy must be taken into account. This study

of the planning layer did not consider a possible unable to comply signal from the low level

control modules. Furthermore, we assumed that global information such as the desired lane

and speed are available. Previous and current research on AHS has not yet answered such

issues. Lane and speed assignments by a higher layer in the control hierarchy may prove to be

crucial for obtaining optimal traffic flow.

Vehicle interactions:

The problem of coordination between autonomous vehicles is inherently addressed by a

centralized control approach. However, decentralized AHS methods such as autonomous

vehicle approach [Bayouth96] or cooperative approach [McKendree96] must still find an

answer to this problem. In case of mixed traffic which may be a possibility in the future of

AHS, the problem is far more serious. Analysis and design methods described in this work

address the vehicle interaction by simplifying multiple vehicle scenarios (situations) into

multiple two-vehicle interactions by using additional simplifications on the relative position

of the vehicles (the matrices in Section 7.4 and Appendix D). This is possible because of our

definition of sensors and the reward-penalty structure used for the learning automata. The

idea of analyzing multiple vehicle interactions can be applied to other vehicle control

methodologies provided that the states of the physical environment, and the decision

environment resulting from these are carefully defined. More detailed and complex

definitions for highway scenarios may be necessary.

Learning automata:

The idea behind the adaptive vehicle controller presented here is very simple: two automata

synchronously adapt to their (teacher) environment. During our study, we have observed

that the automata interactions and multiple teacher learning processes in multi-teacher

environments are relatively untouched areas of research. The intelligent controller may be

extended to multiple automata associated with a single teacher (i.e., sensor module). In this

case, the analysis of the interactions between automata will be more difficult. However,

translating the problem of combining the environment response to the problem of combining

the multiple actions may have its advantages. The application of the S-model environment

then may prove to be relatively easier, bringing the advantage of continuously mapping the

sensed physical environment to automata environment. Yet, the idea of combining automata

actions have not been investigated previously.

Cem nsal Bibliography 149

Bibliography

[Agagino96] Agogino, A., K. Goebel, and S. Alag, Intelligent Sensor Validation and Fusion,

Intellimotion, vol. 5, no. 2, pp. 6-7, 1996.

http://www.volpe.dot.gov/ahs/ (Dec 6, 1996).

http://www.esri.com/base/products/arcinfo/arcinfo.html (Jan 2, 1996).

[Aoki] Aoki, T., T. Suzuki, and S. Okuma, Acquisition of Optimal Action Selection in

Autonomous Mobile Robot Using Learning Automata (Experimental Evaluation), Proceedings

of the IEEE Conference on Fuzzy Logic and Neural Networks/Evolutionary Computation, pp.56-

63, Nagoya, Japan, Nov. 1995.

[Ashby60] Ashby, W. R., Design for a Brain: The Origin of Adaptive Behaviour, New York,

NY: John Wiley and Sons, 1960.

[Asher96] Asher, H. J., and B. A. Galler, Collision Warning Using Neighboring Vehicle

Information, Proceedings of the 6th Annual Meeting of ITS America, vol. 2, pp. 674-684, 1996.

[Atkinson65] Atkinson, R.C., G.H. Bower and E.J. Crothers, An Introduction to Mathematical

Learning Theory, New York: Wiley, 1965.

Automata to the Investment Game, International Journal of System Sciences, vol. 11, pp.1447-

45, 1980.

[Baba83] N. Baba, The Absolutely Expedient nonlinear Reinforcement Schemes under the

Unknown Multi-Teacher Environment, IEEE Transactions on Systems, Man and Cybernetics,

SMC-13, pp. 100-108, 1983.

[Baba85] N. Baba, New Topics in Learning Automata Theory and Applications, Number 71 in

Lecture Notes in Control and Information Sciences, Berlin: Springer-Verlag, 1985.

by Batelle, AHS Precursor Systems Analyses, Compendium of Research Summaries, compiled by

Information Dynamics, Inc. & Viggen Corp., McLean, VA, February 1994.

Cem nsal Bibliography 150

Architecture, Proceedings of the 3rd World Congress in ITS, Orlando, FL, October 1996.

During Transitions into the Automated Lane, Technical Report FHWA-RD-95-107, Federal

Highway Administration, Turner-Fairbank Highway Research Center, McLean VA, 1996.

[Bloomfield96b] Bloomfield, J. R., Human Factors Aspects of Transferring Control from the

Driver to the Automated Highway System with Varying Degrees of Automation, Technical

Report FHWA-RD-95-108, Federal Highway Administration, Turner-Fairbank Highway

Research Center, McLean VA, 1996.

[Bolla96] Bolla, R., F. Davoli, and C. Nobile, A Mobile Network Architecture for Vehicle-to-

Vehicle Communications, Proceedings of the 3rd World Congress in ITS, Orlando, FL, October

1996.

Future Road Traffic, Smart vehicles, Lisse, Netherlands: Swets & Zeitlinger, pp. 3-36. 1995.

[Bush58] Bush, R. R., and F. Mosteller, Stochastic Models for Learning, New York: Wiley,

1958.

Simulation of Road Traffic, Journal of Supercomputing, vol. 10, no. 1, pp. 25-53. 1996.

[Card93] Statement of Andrew H. Card, President and Chief Executive Officer of American

Automobile Manufacturers Association, November 10th, 1993 Hearing before the Subcommittee

on Investigations and Oversight of the Committee on Science, Space and Technology, US. House

of Representatives, 103 Congress, First Session, pp. 108-109. US Gov. Printing Office,

Washington 1994. (Doc. Y4.sci2 no.103/75)

[Chand68] Chandrasekharan, B., and D.W.C. Shen, On Expediency and Convergence in Variable

Structure Stochastic Automata, IEEE Transactions on System Science and Cybernetics, vol.

SSC-5, pp. 145-149, 1968.

[Chand69] Chandrasekharan, B., and D.W.C. Shen, Stochastic Automata Games, IEEE

Transactions on System Science and Cybernetics, vol. SSC-5, pp. 145-149, 1969.

[Chee95] Chee, W. Lane Change Maneuvers for Automated Highway Systems, Research

Report, University of California, Berkeley, 1995.

Cem nsal Bibliography 151

[Chen96] Chen, C., B. Foreman, A Discussion of the WaveLAN Radio as Relevant to AVCS,

Technical Note 96-1, Institute of Transportation Studies, University of California, Berkeley,

1996.

Study of Advanced Technology HOV Systems, vol. 4: Implementation of Lateral Control

Systems in Transitways, Technical Report UCB-ITS-PRR-92-6, Program on Advanced

Technology for the Highway, Institute of Transportation Studies, University of California,

Berkeley, 1992.

[Cogan96] Cogan, R., Wheels in Motion, ITS World, Technology and Applications for

Intelligent Transportation Systems, pp. 18-22, July/August 1996, ISBN 1086-2145.

Institute, Intelligent Transport Systems Division, Ministry of Construction, Ibaraki, Japan,

1996.

[Cremer94] Cremer, J., J. Kearney, Y. Papelis, and R. Romano, The Software Architecture for

Scenario Control in the Iowa Driving Simulator, Proceedings of the 4th Computer Generated

Forces and Behavioral Representation, May 1994.

[CSIM96] Mesquite CSIM 18- A Development Toolkit for Simulation and Modeling,

Mesquite Software, Inc., http://www.mesquite.com/csim18.htm (Jan 2, 1997).

Delco , AHS Precursor Systems Analyses, Compendium of Research Summaries, compiled by

Information Dynamics, Inc. & Viggen Corp., McLean, VA, February 1994.

[Deshpande96] Deshpande, A., Design and Evaluation Tools for Automated Highway

Systems, in Hybrid Systems III : Verification and Control. New Brunswick: Springer-Verlag, pp.

138-148, 1996.

International, pp. 62-66, 1996.

Simulator, Transportation research. Part C, Emerging technologies, vol. 3C, no. 1, pp. 1-17,

1995.

Cem nsal Bibliography 152

Report ANL/RA/CP-85716, Argonne National Laboratory, Argonne, IL, 1995.

[Field92] Intelligent Vehicle Highway systems: Review of Field Trials, Organization for Economic

Cooperation and Development, 1992.

[Forbes95] Forbes, J., T. Huang, K. Kanazawa, and S. Russell, The BATmobile: Towards a

Bayesian Automated Taxi, Proceedings of the 14th International Joint Conference on Artificial

Intelligence, Montreal, Canada, 1995.

Vehicle Control, Systems and Issues in ITS, pp. 73-79, 1995.

[Frank89] Frank A. A., S .J. Liu, and S. C. Liang, Longitudinal Control Concepts for Automated

Automobiles and Trucks Operating on a Cooperative Highway, in Vehicle/Highway Automation:

Technology and Policy Issues, Society of Automotive Engineers, pp. 61-68, 1989.

[Fu65a] Fu, K.S., and G.J. McMurtry, A Study of Stochastic Automata as Models of

Adaptive and Learning Controllers, Technical Report TR-EE 65-8, Purdue University,

Lafayette, IN, 1965.

[Fu65b] Fu, K.S., and G.J. McMurtry, An Application of Stochastic Automata to the

Synthesis Learning Systems, Technical Report TR-EE 65-17, Purdue University, Lafayette,

IN., 1965.

[Fu67] Fu, K.S., Stochastic Automata as Models of Learning Systems, in Computer and

Information Sciences II, J.T. Lou, Editor, New York: Academic, 1967.

[Fu69a] Fu, K.S., and T.J. Li, Formulation of Learning Automata and Automata Games,

Information Science, vol. 1, no. 3, pp. 237-256, 1969.

[Fu69b] Fu, K.S., and T.J. Li, On Stochastic Automata and Languages, Information Science,

vol. 1, no. 4, pp. 403-419, 1969.

[Fu71] Fu, K.S., Stochastic Automata, Stochastic Languages and Pattern Recognition, Journal

of Cybernetics, vol. 1, no. 3, pp. 31-48, 1971.

[Fuji96] Fuji, H., O. Hayashi, and Y. Hirao, Experimental Research on Protocol of Inter-Vehicle

Communications for Multiple Vehicles, Proceedings of the 3rd World Congress in ITS, Orlando,

FL, October 1996.

Cem nsal Bibliography 153

Vehicle Electronics Meeting Societys Needs: Energy, Environment, Safety, Society of Automotive

Engineers, pp. 95-102, 1992.

[Garfinkel96] Garfinkel, S. L., Why Driver Privacy Must Be a Part of ITS, Converging

Infrastructures: Intelligent Transportation and the National Information Infrastructure,

Cambridge, MA: MIT Press, pp. 324-340, 1996.

[Gavan96] Gavan, J., and F. Handler, Analysis, Computation and Mitigation of the Interference

to a Remote Receiver from Two Collocated Vehicular Transceivers, IEEE Transactions on

Vehicular Technology, vol. 45, no. 3, pp. 431-442, 1996.

[Gilbert92] Gilbert, V., J. Thibault, and K. Najim, Learning Automata for the Control and

Optimization of a Continuous Stirred Tank Fermenter, IFAC Symposium on Adaptive systems in

Control and Signal Processing, Grenoble, France, July 1992.

[Godbole95] Godbole, D., J. Lygeros, and S. Sastry, Hierarchical Hybrid Control : a Case

Study, in Hybrid systems II, Berlin: Springer-Verlag, pp. 166-190, 1995.

[Godbole96] Godbole, D., Towards a Fault Tolerant AHS Design. Part II, Design and

Verification of Communication Protocols, Technical Report UCB-ITS-PRR-96-15, PATH

Program, Institute of Transportation Studies, University of California, Berkeley, 1996.

Assisted Concept for AHS, Proceedings of the 3rd World Congress in ITS, Orlando, FL,

October 1996.

and Oversight Committee, in November 10th, 1993 Hearing before the Subcommittee on

Investigations and Oversight of the Committee on Science, Space and Technology, US. House of

Representatives, 103 Congress, First Session, p.3. US Gov. Printing Office, Washington, 1994.

(Doc. Y4.sci2 no.103/75)

[Hedrick96] Hedrick, J. K., J.C. Gerdes, D. B. Maciuca, D. Swaroop, and V. Garg, Longitudinal

Control Development for IVHS Fully Automated and Semi-Automated Systems: Phase II,

Technical Report UCB-ITS-PRR-96-01, PATH Program, Institute of Transportation Studies,

University of California, Berkeley, 1996.

Cem nsal Bibliography 154

Vehicle Guidance, Technical Report UCB-ITS-PRR-91-18, PATH Program, Institute of

Transportation Studies, University of California, Berkeley, 1991.

[Hessburg95] Hessburg, T., and M. Tomizuka, Fuzzy Logic Control for Lane Change

Maneuvers in Lateral Vehicle Guidance, Technical Report UCB-ITS-PWP-95-13, PATH

Program, Institute of Transportation Studies, University of California, Berkeley, 1995.

[Ho96] Ho, F.-S., and P. Ioannou, Traffic Flow Modeling and Control using Artificial Neural

Networks, IEEE Control Systems, vol. 16, no. 5, pp. 1626, October 1996.

[Ikonen97] Ikonen, E., and K. Najim, Use of Learning Automata in Distributed Fuzzy Logic

Processor Training, IEE Proceedings of Control Theory and Applications, in press, 1997.

[ITSJ96] ITS Handbook in Japan, Highway Industry Development Organization, Tokyo, Japan.

Systems (ITS) Standards Program: 1996 Status Report, http://www.its.dot.gov/standards/docs/

REPORT96.htm (January 9, 1996).

[ITSW96] ITS World, Technology and Applications for Intelligent Transportation Systems, issues

July-August, September-October 1996, Advanstar Communications, Eugene, OR, ISBN 1086-

2145.

Concept: The Road and Vehicle United, Technical Report, Center for Transportation Research,

Virginia Tech, August 1994.

[James96] James, R.D., Widening the Net, COMTrans (Communication Networks for ITS),

supplement to Traffic Technology International, pp. 60-64 l, August 1996.

Automated Scaled Highway (FLASH) Laboratory for Studies on Automated Highway Systems,

Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 1, pp.

771-776, 1995.

[Kachroo95b] Kachroo, P., Sliding Mode Full-state Feedback Longitudinal Control of Vehicles

in an Automated Highway System (AHS), Mobile Robots X, SPIE, Bellingham WA, pp. 260-

268, 1995.

Cem nsal Bibliography 155

[Kachroo95c] Kachroo, P., and M. Tomizuka, Vehicle Control for Automated Highway

Systems for Improved Lateral Maneuverability, Proceedings of the IEEE International

Conference on Systems, Man and Cybernetics, Vancouver, B.C., vol. 1, pp. 777-782, 1995.

[Kachroo96] Kachroo, P., A. Walimbe, and B. Benham, Longitudinal and Lateral Vehicle

Control Setup in the Flexible Low-Cost Automated Scaled Highway (FLASH) Laboratory, in

the Proceedings of ITS America, Sixth Annual Meeting, 1996.

Traffic Software (DYNAVIMTS) Package for Studies on Automated Highway Systems (AHS),

3rd ITS World Congress, Orlando, Florida, 1996.

the Sliding Mode Control of a Class of Nonlinear Systems, IEEE Transactions on Automatic

Control, vol. 41, no.7, July 1996.

(http://www.tnt.uni-hannover.de/data/info/www/tnt/subj/sci/ai/machine-learning/reinforcement-

learning.html)

[Kalman62] Kalman, R.E, and J. E. Bertram, Control System Analysis and Design via the

Second Method of Lyapunov: II. Discrete-Time Systems, ASME Journal of Basic Engineering,

ser. D, 82, pp. 394-400, 1962.

Autonomous Highway Vehicles, in Vehicle Navigation and Information Systems Conference

Proceedings, Society of Automotive Engineers, pp. 1143-1151, 1991.

Automatona Counterexample, Dec. 1979; Appendix to Viswanathan R, and K.S. Narendra,

A Note on the Linear Reinforcement Scheme for Variable Structure Stochastic Automata, IEEE

Transactions on Systems, Man and Cybernetics, SMC-2, pp.292-294, Apr 1972.

Complex Freeway Segments on a Personal Computer, Proceedings of the 6th Vehicle Navigation

and Information Systems Conference, Seattle, Wash., pp. 342-345, 1995.

Learning Algorithms for Stochastic Automata, IEEE Transactions on Systems, Man and

Cybernetics, vol. SMC-6, pp. 281-286, 1973.

Cem nsal Bibliography 156

York: Springer-Verlag, 1981.

person Zero-sum Stochastic Games with Incomplete Information: A Unified Approach, SIAM

Journal of Control and Optimization, vol. 20, pp. 541-552, 1982.

[Lasky93] Lasky, T. L., and B. Ravani, A review of Research Related to Automated Highway

System (AHS), Interim Report for Federal Highway Administration, Contract no. DTFH61-93-

C-00189, University of California, Davis, October 25, 1993.

[Lee95] Lee, H., D.W. Love, and M. Tomizuka, Longitudinal Maneuvering Control for

Automated Highway Systems, Proceedings of the American Control Conference, Seattle, Wash.,

vol. 1, pp. 150-154, 1995.

[Levitan95] Levitan, L., Preliminary Human Factors Guidelines for Automated Highway

System Designers Volume I, Guidelines for AHS Designers, Technical Report FHWA-RD-94-

116, Federal Highway Administration, McLean VA, 1995.

[Lin96] Lin, C.-F., and A.G. Ulsoy Time to Lane Crossing Calculation and Characterization of

its Associated Uncertainty, ITS journal, vol. 3, no. 2, pp. 85-98, 1996.

[Liu89] Liu, S. M., and A. A. Frank, On Lateral Control of Highway Vehicles Guided by a

Forward Looking Sensor, Proceedings of the 1st International Conference on Applications of

Advanced Technologies in Transportation Engineering, San Diego, pp. 119-124, 1989.

[Lubin92] Lubin, J.M., E. C. Huber, S.A. Gilbert, and A. L. Kornhauser, Analysis of a Neural

Network Lateral Controller for an Autonomous Road Vehicle, in IVHS Issues and Technology,

Society of Automotive Engineers, pp. 23-44, 1992.

[Lygeros94] Lygeros, J., and D. N. Godbole, An Interface Between Continuous and Discrete-

event Controllers for Vehicle Automation, Proceedings of American Control Conference,

Baltimore, MD, vol. 1, pp. 801-805, 1994.

for Degraded Modes of Operation of IVHS, , Proceedings of American Control Conference,

Seattle, Wash., vol. 5, pp. 3592-3596, 1995.

Cem nsal Bibliography 157

[Lygeros96] Lygeros, J., D. N. Godbole, and M. Broucke, Towards a Fault Tolerant AHS

Design. Part I, Extended Architecture, Technical Report UCB-ITS-PRR-96-14, PATH Program,

Institute of Transportation Studies, University of California, Berkeley, 1996.

[McGurrin96] McGurrin, M. F., The Privacy Enhancement Protocol (PEP), 3rd World

Congress in ITS, Orlando, FL, October 1996.

[Malik95] Malik, J., D. Koller, and T. Luong, A Machine Vision Based System for Guiding

Lane-change Maneuvers, Technical Report UCB-ITS-PRR-95-34, PATH Program, Institute of

Transportation Studies, University of California, Berkeley, 1995.

20, 1996).

[Marsh93] Marsh, C., Gordon, T. J., and Q. H. Wu, Stochastic Optimal Control of Active

Vehicle Suspensions using Learning Automata, Proceedings I. Mech. Eng. Part I, Journal of

Systems and Control Engineering, vol. 207, pp.143-152, 1993.

[Marsh95] Marsh, C., and Gordon T. J., The Application of Learning Automata to Controller

Design in Slow-Active Automobile Suspensions, International Journal for Vehicle Mechanics

and Mobility, vol. 24, no. 8, pp. 597-616, 1995.

Collision Avoidance Systems for Lane Change, Merging and Backing: Human Factors

Assessment of the Driver Interfaces of Existing Collision Avoidance Systems, Technical Report

DOT HS 808 433, US DOT, National Highway Traffic Safety Administration, Springfield, VA,

1996.

Proceedings of the 3rd World Congress in ITS, Orlando, FL, October 1996.

Systems, IEEE Transactions on System Science and Cybernetics, SSC-2, pp. 109-114, 1966.

[Moon96] Moon, S.-H., C.-W. Kim, and M.-H. Han, Navigation Control for an Autonomous

Road Vehicle Using Neural Network, Proceedings of the 3rd World Congress in ITS, Orlando,

FL, October 1996.

Cem nsal Bibliography 158

[Mourou92] Mourou, P., and B. Fade, Multi-agent Planning and Execution Monitoring:

Application to Highway Traffic, Proceedings of the AAAI Spring 92 Symposium, pp. 107-

112, 1992.

Highway System, M. S. Thesis, Virginia Tech, September 1996.

[Najim77] Najim, K., and Y.M. El Fattah, Use of Learning Automaton in Static Control of a

Phosphate Drying Furnace, Proceedings of the 5th IFAC/IFIP International Conference on

Digital Computer Applications to process Control, The Hague, Netherlands, June 1977.

International Journal of Adaptive Control and Signal Processing, vol. 5, pp. 335-345, 1991.

[Najim 91b] Najim, K., and G. Oppenheim, Learning Systems: Theory and Applications, IEE

Proceedings of Computer and Digital Techniques, vol. 138, no. 4, pp. 183-192, 1991.

[Najim94] K. Najim and A. S. Poznyak, editors, Learning Automata: Theory and Application,

Tarrytown, NY: Elsevier Science Ltd., 1994.

[Najim96] Najim, K., and A. S. Poznyak, Multimodal Searching Technique Based on Learning

Automata with Continuous Input and Changing Number of Actions, IEEE Transactions on

Systems, Man and Cybernetics, Part B, vol. 26, no. 4, pp.666-673, 1996.

Automata, Proceedings of 1972 International Conference of Cybernetics and Society,

Washington, DC, Oct. 9-12, 1972.

[Narendra74] Narendra, K.S., and M.A.L. Thathachar, Learning AutomataA Survey, IEEE

Transactions in Systems, Man and Cybernetics, vol. SMC-4, no. 4, July 1974.

Englewood Cliffs, NJ: Prentice Hall, 1989.

[Naruse93] Naruse, K., and Y. Kakazu, Strategy Acquisition of Path Planning of Redundant

Manipulator using Learning Automata, IEEE International Workshop on Neuro-Fuzzy Control,

pp.154-159, 1993.

nahsc/ (Dec. 6, 1996).

Cem nsal Bibliography 159

Highway Systems, IEEE Transactions on Vehicular Technology, vol. 43, no. 3, pp. 626-634,

1994.

Sciences, Part 2, A. Dantzig, and A. Veinott, Editors, Providence, RI: American Mathematical

Society, 1968, pp. 283-313.

[OBrien96] OBrien, R. T., P. A. Iglesias, and T. J. Urban, Vehicle Lateral Control for

Automated Highway Systems, IEEE Transactions on Control Systems Technology, vol. 4, no. 3,

pp. 266-273, 1996.

[Oommen94a] Oommen, B. J., and E. V. de St. Croix, String Taxonomy using Learning

Automata, Technical Report TR-234, School of Computer Science, Carleton University, Ottawa,

Canada, March 1994.

[Oommen94b] Oommen, B. J., and E. V. de St. Croix, Graph Partitioning using Learning

Automata, Technical Report TR-250, School of Computer Science, Carleton University, Ottawa,

Canada, July 1994.

[zgner95] zgner, ., Combined Longitudinal and Lateral Controller Design for a Vehicle

with Radar Sensors, IVHS and Advanced Transportation Systems, Society of Automotive

Engineers, Warrendale PA, pp. 51-57, 1995.

Craig, Experimental Results of a Lane Following Controller Based on a Vision Sensor,

Proceedings of the 3rd World Congress in ITS, Orlando, FL, October 1996.

Schemes for Learning Automata: Stochastic Estimator Learning Algorithms, IEEE Transactions

on Knowledge and Data Engineering, vol. 6, no. 1, pp. 649-654, Feb. 1994.

[PATH] Joh, J., et al. (PATH Design Team), California Partners for Advanced Transit and

Highways, http://www-path.eecs.berkeley.edu/ (Dec. 6, 1996).

[PathDb96] Morris, S., and S. Petrites, PATH Database, California Partners for Advanced

Transit and Highways, http://sunsite.berkeley.edu/~path/search.html (Dec. 6, 1996).

[Peng93] Peng, J., and R. J. Williams, Efficient Learning and Planning within the Dyna

Framework, Adaptive Behavior, vol. 1, no. 4, pp. 437-454, 1993.

Cem nsal Bibliography 160

10th, 1993 Hearing before the Subcommittee on Investigations and Oversight of the Committee on

Science, Space and Technology, US. House of Representatives, 103 Congress, First Session, pp.

35-36. US Gov. Printing Office, Washington 1994. (Doc. Y4.sci2 no.103/75)

[Pomerlau96] Pomerlau, D., and T. Jochem, Rapidly Adapting Machine Vision for Automated

Vehicle Steering, IEEE Expert, vol. 11, no. 2, pp. 19-27, 1996.

[Poznyak96] Poznyak, A., K. Najim, and E. Ikonen, Adaptive Selection of the Optimal Model

Order of Linear Regression Models using Learning Automata, International Journal of Systems

Science, vol. 27, no. 2, pp. 151-159, 1996.

Information Dynamics, Inc. & Viggen Corp., McLean, VA, February 1994.

[Puri95] Puri, A., and P. Varaiya, Driving Safely in Smart Cars, Proceedings of the American

Control Conference, Seattle, WA, vol. 5, pp. 3597-3599, 1995.

[Rajamaran96] Rajaraman, K., and P. S. Sastry, Finite Time Analysis of the Pursuit Algorithm

for Learning Automata, IEEE Transactions on Systems, Man and Cybernetics, Part B, vol. 26,

no. 4, pp.590-598, 1996.

[Rault95] Rault, A., and M. Muffat, Short-term and Mid-term Applications in the Automotive

Sector : Prometheus Activities of Peugeot SA and Renault, Presentations of the Joint California

PATH and France Workshop, Pichmond, CA, 1995.

[Rayteon94] Knowledge Based Systems and Learning Methods for AHS, Technical Report

FHWA-RD-95-045 by Raytheon, AHS Precursor Systems Analyses, Compendium of Research

Summaries, compiled by Information Dynamics, Inc., and Viggen Corporation, McLean, VA,

February 1994.

Considerations, Technical Report IVHS-AMER-ATMS-92-1, IVHS AMERICA, 1992.

Concepts through a Team of Stochastic Automata, IEEE Transactions on Systems, Man, and

Cybernetics, July-Aug. 1993.

[Schlegel96] Schlegel, N., and P. Kachroo, Telerobotic Operation Combined with Automated

Vehicle Control, Proceedings of the SPIE-Photonics East 96, Boston, MA, November 1996.

Cem nsal Bibliography 161

[Schuster96] Schuster, S., and C. Jacoby, An AHS Concept Based on Maximum Adaptability,

Proceedings of the 3rd World Congress in ITS, Orlando, FL, October 1996.

Manufacturing Plant Using Learning Automata, Proceedings of the IMACS/IFAC International

Symposium, Parallel and Distributed Computing in Engineering Systems, Corfu, Greece, pp. 327-

332, June 1992.

[Shapiro69] Shapiro I.J., and K.S. Narendra, Use of Stochastic Automata for Parameter Self-

Optimization with Multi-Modal Performance Criteria, IEEE Transactions on System Science

and Cybernetics, vol. SSC-5, pp. 352-360, 1969.

Vehicles I: Linear Model, Technical Report UCB-ITS-PRR-89-3, PATH Program, Institute of

Transportation Studies, University of California, Berkeley, 1989.

Vehicles with No Communication of Lead Vehicle Information: A System-Level Study, PATH

Technical Memorandum 91-2, Institute of Transportation Studies, University of California,

Berkeley, 1991.

[Shiller95] Shiller, Z., and S. Sundar, Emergency Maneuvers for AHS Vehicles, Systems and

Issues in ITS, Society of Automotive Engineers, Warrendale PA, pp. 1-11, 1995.

[Singer95] Singer, R. M., K. C. Gross, and S Wegerich, A Fault Tolerant Sensory Diagnostic

System for Intelligent Vehicle Application, Intelligent Vehicles '95 Symposium, Detroit, MI, pp.

176-182, 1995.

[Slotine91] Slotine, J.-J. E., and Li, W., Applied Nonlinear Control, Englewood Cliffs, NJ:

Prentice-Hall, 1991.

[Smart94] Smart Road Development Effort Report, Center for Transportation Research, Virginia

Tech, 1994.

[Spooner95] Spooner, J.T., and K. M. Passino, Fault Tolerant Longitudinal and Lateral Control

for Automated Highway Systems, Proceedings of the American Control Conference, Seattle,

Wash., vol. 1, pp. 663-667, 1995.

Nonlinear Systems, IEEE transactions on Automatic Control, vol. 41, no. 2, pp. 280-284, 1996.

Cem nsal Bibliography 162

ETTM Agreements, Proceedings of the International Electronic Toll & Traffic Management

Symposium, New York, N. Y., pp. 91-102, 1995.

Intelligent Vehicle Modules for Tactical Driving, Proceedings of the 13th National Conference on

Artificial Intelligence, Portland, OR, 1996.

[Sukthankar96b] Sukthankar, R., D. Pomerlau, and C. Thorpe, SHIVA: Simulated Highways for

Intelligent Vehicle Algorithms, Proceedings of the IEEE Intelligent Vehicles Symposium, Detroit,

MI, pp. 332-337, 1995.

Design System for Tactical Driving Algorithms, Proceedings of the Artificial Intelligence,

Simulation and Planning in High Autonomy Systems Symposium, pp. 3-10, 1996.

[Tan96] Tan, H.-S., and S. Patwardhan, Magnetic sensors for automatic steering control,

Intellimotion, vol. 5, no. 2, pp. 10-11, 1996.

in Paralel Environment, Journal of Cybernetics and Information Science, vol. 1, p.121-127,

1977.

[TrafT96] Traffic Technology International, issues April, May 1996, UK & International Press,

Surrey, UK, ISSN 1356-9252.

[TRB95] January 25, 1995 meeting of the Committee on AI, during the 74th Annual Meeting of

Transportation Research Board, Washington, D. C., 1995.

[Tsetlin73] Tsetlin, M.L., Automaton Theory and Modeling of Biological Systems, vol. 102 in

Mathematics in Science and Engineering, New York: Academic Press, 1973.

Noisy Workspaces Using Learning Automata, Proceedings of the IEEE International Symposium

of Intelligent Control, Chicago, 1993.

[Tsypkin71] Tsypkin, Ya. Z., Adaptation and Learning in Automatic Systems, New York:

Academic, 1971.

Cem nsal Bibliography 163

[nsal95] nsal, C., J. S. Bay and P. Kachroo, Intelligent Control Of Vehicles: Preliminary

Results on the Application of Learning Automata Techniques to Automated Highway System,

Proceedings of Tools with Artificial Intelligence Symposium, Washington DC, November 1995.

[nsal96] nsal, C., P. Kachroo, and J. S. Bay, Multiple Stochastic Learning Automata for

Vehicle Path Control in an Automated Highway Systems, in revision for IEEE Transactions on

Systems, Man, and Cybernetics, first submission May 1996.

[Varaiya91] Varaiya, P., and S.E. Shladover, Sketch of an IVHS System Architecture, Vehicle

Navigation and Information Systems Conference Proceedings, Society of Automotive Engineers,

pp. 909-922, 1991.

[Varaiya93] Varaiya, P., Smart Cars on Smart Roads: Problems of Control, IEEE Transactions

on Automatic Control, vol. 38., no. 2, pp. 195-207, Feb. 1993.

[Varshavski63] Varshavski, V.I., and I.P. Vorontsova, On the Behavior of Stochastic Automata

with Variable Structure, Automat. Telemekh., vol. 24, pp. 253-360, 1963.

[Viswanathan72] Viswanathan R., and K.S. Narendra, Comparison of Expedient and Optimal

Reinforcement Schemes for Learning Systems, Journal of Cybernetics, vol. 2, pp. 21-37, 1972.

Variable-Structure Stochastic Automata, Journal of Cybernetics, vol. 3, pp. 1-23, 1973.

Problemi Peredachii Informarsii, vol. 1, pp. 122-126, 1965.

[Weber96] Weber J, and M. Atkin, Further Results on the Use of Binocular Vision for Highway

Driving, Proceedings of the 1996 SPIE Conference on Intelligent Systems and Controls, vol.

2902, Nov. 1996.

Decision Making, Automatica, vol. 21, pp. 479-484, 1985.

[Wigan96] Wigan, M., Problem of Success: Privacy, Property, and Transactions, Converging

Infrastructures: Intelligent Transportation and the National Information Infrastructure,

Cambridge, MA: MIT Press, pp. 341-354, 1996.

[Wright95] Wright, T., Eyes on the Road : Privacy and ITS, Traffic Technology International,

pp. 88-93, 1995.

Cem nsal Bibliography 164

[Xu96] Xu, T., DYNAVIS, a Dynamic Visualization Environment for Automatic Vehicle

Control Systems, Intellimotion, vol. 5, no. 1, pp. 2, 10, 1996.

System for Automobiles, in Electronic Display Technology and Information Systems, Society of

Automotive Engineers, pp. 73-85, 1992.

[Yanakiev96] Yanakiev, D., and I. Kanellakopoulos, Speed Tracking and Vehicle Follower

Control Design for Heavy-duty Vehicles, Vehicle system dynamics, vol. 25, no. 4, pp. 251-276,

1996.

[Yang96] Yang, Y.-T., and B.H. Tongue, A New Control Approach for Platoon Operations

during Vehicle Exit/Entry, Vehicle System Dynamics, vol. 25, no. 4, pp. 305-319, 1996.

[Yu95] Yu, J., and A. Sideris, Control of Combined Vehicle Motion : a Gain-scheduling

Approach,. Proceedings of the 5th Annual Meeting of ITS America, vol. 1, pp. 109-115, 1995.

[Zoratti95] Zoratti, P., Millimeter Wave Scattering Characteristics and Radar Cross Section

Measurements of Common Roadway Objects, Collision Avoidance and Automated Traffic

Management Systems, SPIE, Bellingham, WA, pp. 169-179, 1995.

Cem nsal Appendices 165

Appendix A. Glossary

a, b Learning parameters for linear reinforcement scheme

Set of outputs of an automaton

i ith action of the automaton

(n ) Automaton action at time step n

( n) Environment response at time step n

ci Probability of (receiving) penalty for ith action (P-model

environment)

e rj Unit vector of dimension r with jth element equal to 1

si Penalty strength for ith action

E[] Mathematical expectation

E[ | pi(n)] Mathematical expectation given the probability pi(n)

F() State transition function for an automaton (Chapter. 3); mapping for

teacher responses (Chapter. 4)

f ij Probability of transition from state I to state j given the environment

response

g, h Continuous nonnegative functions of probability vector p

G() Output function for an state-output automaton

H(,) Output function for an automaton

H(p) Update function used in nonlinear reinforcement scheme NLH

k Learning parameter for nonlinear reinforcement schemes

LI-P Linear inaction-penalty reinforcement scheme (a = 0)

LR-I Linear reward-inaction reinforcement scheme (b = 0)

LR-P or L= R-P Linear reward-penalty reinforcement scheme with a = b

LR-P Linear reward-penalty reinforcement scheme with a b

M(n) Average penalty to the automaton at time step n

Mo Average penalty for a pure chance automaton

ij ,ij Coefficients of difference equation for action probabilities

NLH Nonlinear reinforcement scheme with update function H

xk k th norm of vector x

p(n) Action probability vector

pi (n) Probability of choosing ith action at time step n

= {1 , 2 ,... s} Set of internal states of an automaton

Cem nsal Appendices 166

schemes

r Number or automaton actions

ij2 Covariance of pi(n) and pj(n)

Learning parameter for nonlinear reinforcement schemes

V ( x) Lyapunov function

AHS Automated Highway System

ASTM American Society for Testing and Materials

ATIS Advanced Traveler Information System

ATMS Advanced Traffic Management System

AVC Automatic Vehicle Control

AVCS Advanced Vehicle Control System

AVI Automatic Vehicle Identification

AVL Automatic Vehicle Location

DOT Department of Transportation

ELP Electronic License Plate

ETT Electronic Toll Tags

FHWA Federal Highway Administration

FLASH Flexible Low-cost Automated Scaled Highway (Laboratory)

GPS Global Positioning System

ITE Institute of Transportation Engineers

ITS Intelligent Transportation Systems

IVHS Intelligent Vehicle Highway System

LA Learning Automata

NAHSC National Automated Highway System Consortium

PATH Program on Advanced Technology for the Highway, California

PSA Precursor Systems Analyses

RIC Remote Intelligent Communications

RVC Roadside-Vehicle Communications

SAE Society of Automotive Engineers

TRB Transport Research Board

VRC Vehicle-to-roadside Communications

VORAD Vehicle On-Board Radar

Cem nsal Appendices 167

Action with the Linear Inaction-Penalty Scheme LI-P

The linear inaction-penalty scheme LI-P can be derived by a modification of the general linear

reward-penalty scheme LR-P (See Section 6.1). This reinforcement scheme, as the name suggests,

does not update the action probabilities when the environment response is affirmative:

if ( n) = i

pi (n + 1) = pi (n)

for (n) = 0

p j (n + 1) = p j (n) j i

(B.1)

pi (n + 1) = (1 b) pi (n)

for (n) = 1 p (n + 1) = b + (1 b) p ( n)

j j i

r1 j

From the definition above, the expected value of the probability of an action at the next

step can be written as:

[ ] [ ]

r

E pi (n + 1) pi ( n) = E pi ( n + 1) pi (n) (n) = k pk ( n)

k =1

[ ]

r

= ck E pi (n + 1) pi (n ) (n) = k ( n) = 1 pk (n)

k =1

[ ]

r

+ (1 ck ) E pi (n + 1) pi (n) (n) = k (n ) = 0 pk ( n)

k =1

b

r

= ci pi (n)(1 b) pi ( n) + ck pk (n) + (1 b) pi ( n)

ki r 1

r

+ (1 ci ) pi ( n) pi (n) + (1 ck ) pk (n )pi ( n)

ki

(B.2)

We will again consider the ideal case where the probability of penalty for the optimal action is

equal to zero, i.e., c= 0 and 0 < cj < 1. Then, for i = , we have:

[ ] b

r r

E p ( n + 1) p (n) = 0 + ck pk (n ) + (1 b) p (n) + p2 (n ) + (1 ck ) pk (n )p (n )

k r 1 k

(B.3)

Cem nsal Appendices 168

More explicitly:

[ ] b r r r

E p ( n + 1) p (n) = c p

r 1 k k k

( n ) + p ( n ) c p

k k ( n ) bp i ( n ) ck pk (n)

k k

r r

+ p2 (n ) + pk (n) p (n) ck pk ( n) p ( n)

k k

r r r

b

= c p (n) bp ( n)

r 1 k k k

ck pk (n) + p2 (n ) + pk (n) p (n)

k k

(B.4)

r

Now, taking the expectation of both sides, and using the fact that p ( n) = 1, we obtain:

i

i =1

[ ] b r

[ ] [ ]

r

E p ( n + 1) =

r 1 k

ck E pk (n) b E p (n ) ck p k ( n) + E p2 ( n)

k (B.5)

[

+ E (1 p ( n))p (n ) ]

The simulations of the linear inaction-penalty scheme showed that the probability of the

optimal action converges to a steady-state value, as seen in Figure B.1 For such a steady-state

result, we can assume that:

E [p (n) ]= cons tan t p

2

As n p ( n ) = 0 (B.6)

[ ]

E p (n) p j (n) = E[p (n)] E p j (n) [ ]

Conditions above state that, as we approach the steady-state value for the probability of

the optimal action, the expected value approaches the same value; the variance of the probability

is negligibly small and the probability of all other actions are independent of the optimal actions

probability, although the sum of probabilities must be equal to 1. (Since the probability of the

optimal action is a constant value, this value and other probabilities can be treated as independent

r

variables.) Using these facts, the identity E [x 2 ]= E [x ]2 x2 , and renaming E ck pk (n) A ,

k

we can simplify the equality in Equation B.5 as follows:

r

E [p (n + 1) ]= c k E [pk (n)] b E [p ( n)] E c k p k (n)

b r

r 1 k k (B.7)

+E 2

[p (n)]+ E [p (n) ] E [p (n )]

2

Or:

Cem nsal Appendices 169

b b

p= A bpA + p + p p 0 = A bpA

2 2

r 1 r 1

(B.8)

bA 1

p= =

bA (r 1) (r 1)

where r is the number of actions. This proves that a learning automaton using the linear inaction-

penalty scheme in a stationary environment where there is one (and only one) optimal action is

not -optimal (except where r = 2). Furthermore, the probability of the optimal action converges

to 1/(r-1). Figure B.1 shows the probabilities of five actions where the optimal actions

probability converges to 1/(5-1) = 0.25.

0.26

0.24

0.22

0.2

0.18

0.16

0.14

0.12

0 50 100 150

iterations

Figure B.1. Probabilities of five actions in the LI-P scheme; only c1 = 0.

Cem nsal Appendices 170

Appendix C. Simulation

The simulation for the work described in this dissertation is written in Matlab [Matlab96]. It

consists of a group of subroutines and functions coded as m-files, and can be run in Matlab

version 4.2 or higher. Graphic user interfaces are designed on a Sun workstation running Solaris

2.4 and Openwindows , but can be displayed on any platform running Matlab 4.2 or higher,

after making minor changes in the GUI subroutines.

Figure C.1. shows the overall structure of the simulation. The description of the files are

given in Table C.1. Subroutine m_run.m is the main simulation program which calls several

subroutines at each time step. At each time step, sensor outputs are evaluated and action

decisions are made. Then, those actions are carried out for each automated vehicle on the

highway. Sensor modules and flags are implemented as subroutines that take vehicle index and

returns corresponding output. The learning, decision (planning), and regulation subroutines work

similarly. When evaluations are complete, the highway is updated. The main program repeats the

loop until the final time is reached or the program is interrupted by the user.

Learning algorithm, learning parameters, processing speed and memory vector sizes

cannot be changed during the simulation run. Other parameters such as current vehicle speed,

desired speed, current vehicle lane, desired lane, permitted speed variations can be changed during

the simulation as well as other display parameters. For detailed description of some of the

graphic user interfaces, see Figures C.3-C.7. Command line interface can be used to change

parameters before the simulation run. It also displays several parameters and/or actions during

the simulation run (Figure C.2).

Files described in Table C.1 and Figure C.1 are provided here as zipped archives in two

different formats:

Unix: mfiles.tar.Z

Windows: mfiles.zip

Cem nsal Appendices 171

laneb.m Vehicle index Integer indicating Evaluates current vehicle lane based on link layer

desired lane shift information

mlma.m - - Main program

move2.m Vehicle index - Updates vehicle position

m_init.m - - Initializes all parameters, arrays, etc.

m_mvec.m - - Plots simulation data

m_movie.m - - Shows the movie of the simulation

m_plap.m - - Plots simulation data

m_plot.m - - Subroutine for Plot GUI

m_redo.m - - Subroutine for Scenario GUI

m_reinit.m - - Reinitializes all parameters, arrays, etc.

m_run.m - - Main subroutine for simulation runs

m_sandl.m - - Plots simulation data

m_setgui.m - - Displays the main graphic user interface

m_traj.m - - Plots simulation data

pinch.m Vehicle index Binary value Function for pinch module

indicating pinch

condition

phmin.m Action probabilities, New action, new Function for nonlinear nonlinear scheme of Baba

action, environment action probabilities [Baba85].

response

Phfun.m Action probabilities, New action, new Function for nonlinear reinforcement scheme of

action, environment action probabilities Section 6.2.

response

plan2.m Vehicle index - Subroutine for planning layer

plot_h2.m Vehicle positions - Subroutine for display updates

plrp.m Action probabilities, New action, new Function for general linear reward-penalty learning

action, environment action probabilities scheme.

response

Reg2.m Vehicle index - Subroutine for regulation layer

sb_fr.m Vehicle index Function for headway module

sb_lft.m Vehicle index Function for left sensor module

sb_lftn.m Vehicle index Binary value Function for extended left sensor module

sb_sp.m Vehicle index indicating module Function for speed module

response

sb_rgt.m Vehicle index Function for right sensor module

sb_rgtn.m Vehicle index Function for extended right sensor module

speedb.m Vehicle index Speed flag Evaluates vehicle speed condition

Table C1. Description of the subroutines and functions for multiple lane, multiple automata

intelligent vehicle simulation.

Cem nsal Appendices 172

mlma.m

m_setgui.m m_init.m

m_reinit.m

plot_h2.m m_sandl.m

move2.m m_plap.m

reg2.m m_traj.m

plan2.m m_mvec.m

m_movie.m

sb_rgt.m sb_lftn.m

pinch.m phmin.m

Cem nsal Appendices 173

Cem nsal Appendices 174

Learning

Extended side Learning

algorithm menu

sensor switch parameters

Processing

Front sensor speed (1/sec)

ranges and length of

memory

vectors

Index of the

current vehicle Speed and lane parameters

for current vehicle

On/off switches

for the current

Speed

vehicle

increment

Permitted

Data file I/O speed

variation

Display

update rate

Switch for

Opens scenario saving flow

GUI (Fig. C.6) data

Opens plot

GUI (Fig C.4) Highway

window size

adjustment

Final time (sec)

Cem nsal Appendices 175

Memory vector

and pinch module Indices of the

outputs vehicles to be

plot

Speed and lane

positions

Indices of the first

Timed snapshots and last vehicle to

(Fig. C.5) be plot

trajectories for snapshots

platoons

Time interval for

Speed and headway

plots

for 2 vehicles

curve for single lane per frame

Switch for movie matrix

Matlab movie of

the simulation run Time interval for plots

Relative positions

of vehicles during Indices of the first

run and last vehicle to

be plot

Reference

vehicle

Cem nsal Appendices 176

Time interval:

t = 1sec to t = 16sec. Reference vehicle

1 to 4 then vehicle 4, shifts to

Reference position: middle lane and

vehicle 4 (at t = 1sec.) continues

initialize several different scenarios.

Cem nsal Appendices 177

at t = [0 10 20 28 30 40 45]

Cem nsal Appendices 178

Additional simulation examples of the different highway situations are given in this

section. The following table gives the description of the situations and the mpeg movie file of the

test run.

appCm1.mpg 20 vehicles with various initial speed definitions changing to the same lane:

- Desired speed and lane for all vehicles are 85kmh and lane 2 respectively.

0.263Mb - Algorithm: NLH with k = 3, = 0.05.

40sec - Memory vector sizes: 9 (lateral and longitudinal)

5 frames/sec - Sensor ranges: 10m side, 11m, 15m, and 20m front sensor.

- All vehicles shift to lane 2, but due to speed variations, some of the vehicles

shift left and right to avoid collision. At t = 40sec, average platoon speed is

approximately 85kmh.

AppCm2.mpg Three platoons with various memory vector sizes:

- 3 ten-vehicle platoons with same initial speeds of 80mph and inter-platoon

0.264Mb distances of 15m. Leaders have a desired speed of 70mph.

40 sec. - Algorithm: LR-P with a = 0.15 and b = 0.10.

5 frames/sec - Memory vector sizes:

Platoon 1 (yellow) - lateral: 25, longitudinal: 13

Platoon 2 (green) - lateral: 13, longitudinal: 13

Platoon 3 (red) - lateral: 13, longitudinal: 9

- Followers in the platoon try to match leaders speed change. In platoons 1 and 2,

some of the vehicle (4, 5, and 10) shifts lane to avoid collisions. The size of the

lateral memory vector affects the time to shift lanes slightly. The effect of the

longitudinal memory vector is more significant. No vehicle shift lane in platoon 3

because of faster reactions to headway changes.

appCm3.mpg Three separate group of four vehicles in the situation of Section 7.4.3:

- V1 = V2 =85mph, V1 = V2 =805mph; initial vehicle separations are 16m; no lane

0.209Mb preferences are set for vehicles 1 and 2.

40 sec. - Algorithms and memory vector sizes:

5 frames/sec Group 1 (yellow) - LR P (a = 0.15, b = 0.10) lateral: 25, longitudinal: 13

Group 2 (green) - NLH (k = 10, = 0.05) lateral: 25, longitudinal: 13

Group 3 (red) - NLH (k = 10, = 0.05) lateral: 13, longitudinal: 9

- Speed flags in groups 1 and 2 are set approximately at the same time (10.04 and

8.8 seconds for group 1; 9.64 and 9.36 for group 2). The lane changes are made

also made approximately at the same time. For group 1, lane changes occur before

the flags are set, due to short lateral memory vector.

Sensor ranges for the last two simulations are 10m side, 10m, 15m, and 20m front. Processing

speed for all simulations is 25Hz.

Table C.2. Description of the situations and simulation files.

Cem nsal Appendices 179

Vehicle Interactions

The tables given in this section show the possible positions of vehicles in a highway for the

scenarios described in Chapter 7. They indicate relative positions of multiple vehicles with

respect to each others sensor ranges. Each row in a matrix corresponds to a lane; each square

illustrates a road section which falls into the side sensor range of an automated vehicle. A dark

square indicates the presence of a vehicle. Not all possibilities are considered; instead, only the

situations that are of interest for a specific scenario are listed. Similar situations are then

combined into a single state and simplified if necessary. Two situations are said to be similar if

the sensor module outputs and/or possible actions are the same for both. Three scenarios of the

Chapter 7 are:

Considering the immediate neighborhood of the vehicles, there are C(9,2) = 36 possible

situations shown in Figure D.1. Most of these situations are similar and, therefore, can be

combined as shown in Figure D.2. Further simplifications are illustrated in Figure D.3. These

final 12 situations are the states given in Section 7.4.

Considering the immediate neighborhood of the vehicles, there are C(9,3) = 84 possible

situations shown in Figure D.4. Some of these situations are similar and, therefore, are

combined as shown in Figure D.5. These final 64 situations are the states given for 3-vehicle

scenario in Section 7.4.

This four vehicle situation of Figure 7.13 can be treated as a two-vehicle situation with other

vehicles treated as obstacles. The two vehicles in front are traveling at their desired speed and

lane, unaware of the other approaching vehicles (See Section 7.4.3). Therefore, there are

C(7,2) = 28 possible situations shown in Figure D.6. Since the conflict will be solved when

the two trailing vehicles are both in the middle lane, the states E3, E4, and/or F3 are goal

states. Again, we considered only immediate neighborhoods, and therefore, the physical

locations of the vehicles are shown using a 3x4 matrix.

Cem nsal Appendices 180

Figure D.1. All possible immediate neighborhood situations for two vehicles.

Figure D.2. Combined states for two vehicles: states not shown

are identical to those given here.

Cem nsal Appendices 181

Cem nsal Appendices 182

vehicles in a three-lane highway.

Cem nsal Appendices 183

1 2 3 4 5 6 7

E A2 A3 A5 A6

F B7 C2

G C3 C6 C7 D5

gg

H E7

I F2 F3 F6 C2

J G5

K J3 J4

L K2 L1

Figure D.5. Combined states for three vehicles: states not shown are

identical to those indicated.

Cem nsal Appendices 184

1 2 3 4

a two-vehicle situation.

185

Vita

Cem nsal was born on November 4, 1967 in Ankara, Turkey. He is an alumnus (118) of

Galatasaray Lisesi (Lyce de Galatasaray), Istanbul. He received a B.Sc. degree with honors in

electrical and electronics engineering from Bogazii University, Istanbul, in 1991, and an M.S.

degree in electrical engineering from Virginia Polytechnic Institute and State University (Virginia

Tech), in 1993. His M.S. thesis "Self-Organization in Large Populations of Mobile Robots"

received the Paul E. Torgersen Graduate Student Research Excellence Award (3rd place) in 1994.

From 1993 to 1996, he was a Ph.D. candidate in the Bradley Department of Electrical

Engineering, and research assistant at the Center for Transportation Research at Virginia Tech.

His research interests include automated highway systems, automated vehicle control systems,

artificial intelligence, learning automata, sliding mode control and nonlinear observers.

He will continue his research career at the Robotics Institute at Carnegie Mellon

University, Pittsburgh, PA. Mr. nsal has been a student member of AMS, IEEE, ITS America

and TRB during his graduate studies. He is currently a member of IEEE and ITSA.

29 January 1997

- Igc3Загружено:Asad Ayaz
- A Study on Rural Road Widening Project Based on Prediction of Traffic Volume - A Case Study in Rajkot Morbi State Highway 24Загружено:International Journal for Scientific Research and Development - IJSRD
- Driving logЗагружено:Daniel Verkamp
- Plaint John Nganga KigothoЗагружено:mheshmiwaalekikim
- Project Profile to Be Funded by the CityЗагружено:Hge Barangay
- Traffic Congestion in Mumbai, Will Public Authorities Take the Opportunity to Leapfrog - Gaële LestevenЗагружено:Hoang Vu Truong
- Transportation Systems EngineeringЗагружено:waelsharba
- Congestion in the CommonwealthЗагружено:Greg Saulmon
- SchoolЗагружено:waseemshahkhan
- fkfkfkfkfkЗагружено:anothersomeguy
- Determination of traffic delay at selected intersection within Ilorin MetropolisЗагружено:AJER JOURNAL
- UPDATES OF ROAD SAFETY STATUS IN MALAYSIAЗагружено:tuongvyvy
- rtaq.pdfЗагружено:Sachin Chourasia
- 25887782Загружено:UDay
- Paper ID-71201985Загружено:Editor-in-Chief IJRAT
- A Public Health Perspective of Road Traffic AccidentsЗагружено:yohannes kefyalew
- Reading TipsЗагружено:Tran Minh Tu
- Acc Injury Prev2Загружено:Eliana J Yazo C
- spec_4750_498Загружено:kumbrov
- Behavior Based Safety ProcessЗагружено:Hadrien Faryala
- 20190523 Quantifing Mobility in Indonesian Metro AreasЗагружено:Jati Akagami'Yonkou Shiinichienzu
- 1. Drinking and Driving Global Status Ward VanlaarЗагружено:JHRG
- Napier Street upgrade - July 2017 updateЗагружено:Ashley Fritsch
- 18Vol60No3Загружено:Abhishek sharma
- ITS SolutionsЗагружено:Holly West
- North Carolina DMV Questions and AnswersЗагружено:mathuravneet
- actual senior project paper-effect of age on driving safety 1Загружено:api-404349006
- Executive Summary FinalЗагружено:Chittaranjan Baral
- Transportation EngineeringЗагружено:adeolaodukoya

- Project Report on Automated Test EquipmentЗагружено:Gaurav Pooniya
- CSC-121 Breaker Protection IED Product Guide__V1.10Загружено:Log in 2007
- Honeywell YTH9421C1002 Install GuideЗагружено:hammy408
- 06012013052040 Intelligent Management of Electrical Systems in IndustriesЗагружено:h_akbarshariff
- World Smallest AC-DC Air Conditioner Heat Pump 4200 BTU - MPS Marine.pdfЗагружено:mica
- Discussion CADCAMЗагружено:Anonymous OyZEvN03MF
- Project List 2Загружено:OliverQueen773s
- Pcs7 Compendium Part b en-US en-USЗагружено:Hugo Diaz
- Industrial ElectronicsЗагружено:shambhu32
- vci_341_c-api-manual_1.5.pdfЗагружено:Ahmed Ariche
- Life Sciences Building Management Solutions BrochureЗагружено:Simon Madreado
- 914853Загружено:Susana Mujica
- RENAULT COMPANYЗагружено:eftimova_elizabeta
- 10 IT-Enabled Business Trends for the Decade AheadЗагружено:IntroCPA
- PLC Siemens S7-200 Cpu 226Загружено:Yeimi Lorena Velasquez Sosa
- Capturing Value From Digital and Advanced Analytics in the O&G IndustryЗагружено:Gabriel Tortolero Jimenez
- Control Valves an Evolution in Design_CE_Aug 2012Загружено:onizuka-t2263
- Ventilation Calculation Indoor PoolsЗагружено:valentinlupascu33
- VAV Checkout Sheet 10-22-12Загружено:archangel571
- Communication on Artificial Intelligence European CommissionЗагружено:ProfLoresio
- RPAЗагружено:Azizul Huq
- Springer - Factory Planning Manual - Situation-Driven Production Facility Planning - 2010Загружено:GuidoMascia
- Movement Goods IndiaЗагружено:Farhan
- 12 Services SupportЗагружено:Rafael Furquim
- Diferencijalni termostatЗагружено:Darko Rendulić
- Pedrollo CatalogueЗагружено:Kenny Martinez
- JAVA Project List.docxЗагружено:swathi manthena
- Proximity Card Reader -ss 06044Загружено:Anonymous FUzh9lUg
- 09 Picture Block EnЗагружено:Ismael Gracia
- 20100223_05!10!40EE264 Distributed Control System - DCS Design & Operation Course OutlineЗагружено:thirumba08

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.