Smart Traffic Lights That Learn !: of Adaptive Traffic Signal Controllers

Smart Traffic Lights that Learn !
Multi-Agent Reinforcement Learning Integrated Network

of Adaptive Traffic Signal Controllers
M A R L I N
Samah El-Tantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed
ACGM 2013- Intelligent Transport for Smart Cities

Outline
2
1. In a Nutshell
2. Theory in Brief
 Reinforcement Learning and Game Theory
3. Applications
 City of Toronto Testbed
4. Hardware in the Loop Testing
 Approach
 Integration with PEEK ATC-1000
 Next Steps
 Q&A
In a Nutshell
3
 Grand objective
 Intersections "talk to each other",
 Each is affected by what is happening upstream
 Each affects what is happening downstream –
 Whole network control in one shot from a grand brain is the dream
 Issue
 Intractable theoretically,
 Too complex practically,
 Requires massive and very expensive communication
 Solution
 Decentralized,
 Self learning: agents learn to control their local intersection, and
 Game theory based: agents learn to collaborate

What is MARLIN?
4
 Artificial-intelligence-based control
software
 Enables traffic lights to self-learn
and self-collaborate with
neighbouring traffic lights
 Cuts down motorists’ delay, fuel
consumption and the negative
environmental effects of congestion
 Easier to operate (self learning)
 Less expensive communication if
even necessary (less costly)
Evolution of “Adaptive” Signal Control
5
MARLIN-ATSC: Level 4
Level 4
Level 3 • Distributed
Self-Learning
• Distributed Control
Level 2 Control, Model- • MARLIN-ATSC
Based • 2011, Canada
• Centralized • OPAC, RHODES
Control, On-line • 1992, USA
Level 1 Optimization • 5 installations in
• SCOOT USA
• Centralized • 1981, UK
Control, Off- • >170
line installations
Optimization
Level 0 • SCATS
worldwide
• Fixed-Time • 1979,
and Australia
Actuated • >50
Control installations
• TRANSYT worldwide
• 1969, UK
Issues with Leading ATSC Technologies?
6
• Expensive
Centralized • Not scalable
• Not robust
• Relying on an accurate traffic modelling

Model-Based framework
• the accuracy of which is questionable
Curse of • Increasing the complexity of the system

exponentially with the increase in the number of
Dimensionality intersections/controllers
Human
• Requiring highly skilled labour to operate due to
Intervention their complexity.
Requirements
Why is MARLIN Different?
7
Human Intervention Requirements

Self-Learning
Specific Design Generic Decentralized Centralized
MARLIN
Prediction Pattern Model-Based
Model-Free
Sensitive
Requirement
Curse of Dimensionality Scalable Coordinated Inefficient Coordination

Learning the Control Law:
Reinforcement Learning Architecture
8
RL Architecture
Environment
Action State Reward

Agent
Goal: Optimal Control law = mapping between states and actions

Q Table
Q k 1
( s , a )  Q ( s , a )   [r
k k k k k k 1
  max Q (s
k k 1
, a)  Q (s , a )]
k k k
a Q a1 a2
a k 1  arg max Q k ( s k 1 , a) Balancing exploration and exploitation s1 -10 -5
a
s2 -3 -15
RL-based ATSC Architecture
9
Traffic Simulation
Environment
Action
(Extend
State /Switch)
(Queue Lengths)
RL Software
Reward
(Delay Savings)
Agent
MARLIN- ATSC: Coordination Principle
10
 Each agent plays a game with each adjacent

intersection in its neighborhood
I1 I2 I3 I1 I2 I3 I1 I2 I3
I4 I5 I6 I4 I5 I6 I4 I5 I6
I7 I8 I9 I7 I8 I9 I7 I8 I9
Example for Edge Intersection Example for Corner Intersection

Example for Intermediate Intersection
( 3 Games) ( 2 Games)
(4 Games )
MARLIN-ATSC Available Modes
11
 MARLIN-ATSC: (a) Independent Mode, (b) Integrated Mode

MARLIN-ATSC
Queue Length 1
(a)
Queue Length 2
Extend 1 Delay 1
Extend 2 Delay 2
(b) Queue Length 1

Queue Length 2
Extend 1 Delay 1
Extend 2 Delay 2
Large-Scale Application
Network-Wide MOE in the Normal Scenario
12
% %
% Improvments
Improvments Improvments
System BC MARLIN-IC Vs.
MARL-TI Vs. MARLIN-IC Vs.
MOE MARL-TI
BC BC
Average Intersection
35.27 27% 38% 14%
Delay (sec/veh)
Throughput (veh) 23084 3% 6% 3%
Avg Queue Length (veh) 8.66 24% 32% 11%
Std. Avg. Queue Length

2.12 23% 31% 10%
(veh)
Avg. Link Delay (sec) 9.45 10% 47% 41%

% Improvement in Average Delay
13
MARLIN-IC vs BC
Area 2
Area 3
% Improvement
Area 1
Average Route Travel Time for Selected Routes
14
8
Gardiner EB
7
Average Travel Time (min)
5
Freeway
0
1 2 3 4 5 6 7 8 9 10 11 12
Time Interval (5 min)
BC MARL-TI MARLIN-IC
Average Route Travel Time for Selected Routes
15
20
LakeShore EB to Spadina NB
18
Average Travel Time (min)
16
Major Arterial
14
12
10
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12
Time Interval (5 min)
BC MARL-TI MARLIN-IC
MARLIN-HILS Architecture
16
Traffic Signal Controller

Controller Interface Device(CID)
RS485 to USB
RS485 -
SDLC protocol
USB -
SDLC protocol
Ethernet -
NTCIP protocol
Industrial Computer
Paramics
Modeller
HILS Setup: Demo
17
Conclusion
18
 MARLIN state of the art gen4+

 Thoroughly developed and tested
 Patent Pending Status
 On going:
 HILS & PEEK ATC-1000 Integration
 Potential Field Operation Test
 Productization
 From TSP to People Priority (PSP)

Samah El-Tantawy samah.el.tantawy@utoronto.ca
Baher Abdulhai baher.abdulhai@utoronto.ca
Hossam Abdelgawad h.abdel.gawad@utoronto.ca

Smart Traffic Lights that Learn !
Multi-Agent Reinforcement Learning Integrated Network
of Adaptive Traffic Signal Controllers
M A R L I N
Samah ElTantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed

Smart Traffic Lights That Learn !: of Adaptive Traffic Signal Controllers

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Smart Traffic Lights That Learn !: of Adaptive Traffic Signal Controllers

Загружено:

Авторское право:

Доступные форматы

Smart Traffic Lights that Learn !

Multi-Agent Reinforcement Learning Integrated Network

Samah El-Tantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

ACGM 2013- Intelligent Transport for Smart Cities

 Each is affected by what is happening upstream

 Each affects what is happening downstream –

 Too complex practically,

 Requires massive and very expensive communication

 Self learning: agents learn to control their local intersection, and

 Game theory based: agents learn to collaborate

• Relying on an accurate traffic modelling

Curse of • Increasing the complexity of the system

Human Intervention Requirements

Specific Design Generic Decentralized Centralized

Curse of Dimensionality Scalable Coordinated Inefficient Coordination

Action State Reward

Goal: Optimal Control law = mapping between states and actions

 Each agent plays a game with each adjacent

Example for Edge Intersection Example for Corner Intersection

 MARLIN-ATSC: (a) Independent Mode, (b) Integrated Mode

(b) Queue Length 1

Throughput (veh) 23084 3% 6% 3%

Avg Queue Length (veh) 8.66 24% 32% 11%

Std. Avg. Queue Length

Avg. Link Delay (sec) 9.45 10% 47% 41%

Traffic Signal Controller

 MARLIN state of the art gen4+

 From TSP to People Priority (PSP)

ACGM 2013- Intelligent Transport for Smart Cities

Samah ElTantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

ACGM 2013- Intelligent Transport for Smart Cities

Вам также может понравиться