Вы находитесь на странице: 1из 20

Smart Traffic Lights that Learn !

Multi-Agent Reinforcement Learning Integrated Network


of Adaptive Traffic Signal Controllers
M A R L I N

Samah El-Tantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering


Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed

ACGM 2013- Intelligent Transport for Smart Cities


Outline
2

1. In a Nutshell
2. Theory in Brief
 Reinforcement Learning and Game Theory

3. Applications
 City of Toronto Testbed
4. Hardware in the Loop Testing
 Approach
 Integration with PEEK ATC-1000

 Next Steps
 Q&A
In a Nutshell
3

 Grand objective
 Intersections "talk to each other",

 Each is affected by what is happening upstream

 Each affects what is happening downstream –

 Whole network control in one shot from a grand brain is the dream

 Issue
 Intractable theoretically,

 Too complex practically,

 Requires massive and very expensive communication

 Solution
 Decentralized,

 Self learning: agents learn to control their local intersection, and

 Game theory based: agents learn to collaborate


What is MARLIN?
4

 Artificial-intelligence-based control
software
 Enables traffic lights to self-learn
and self-collaborate with
neighbouring traffic lights
 Cuts down motorists’ delay, fuel
consumption and the negative
environmental effects of congestion
 Easier to operate (self learning)
 Less expensive communication if
even necessary (less costly)
Evolution of “Adaptive” Signal Control
5

MARLIN-ATSC: Level 4

Level 4
Level 3 • Distributed
Self-Learning
• Distributed Control
Level 2 Control, Model- • MARLIN-ATSC
Based • 2011, Canada
• Centralized • OPAC, RHODES
Control, On-line • 1992, USA
Level 1 Optimization • 5 installations in
• SCOOT USA
• Centralized • 1981, UK
Control, Off- • >170
line installations
Optimization
Level 0 • SCATS
worldwide
• Fixed-Time • 1979,
and Australia
Actuated • >50
Control installations
• TRANSYT worldwide
• 1969, UK
Issues with Leading ATSC Technologies?
6

• Expensive
Centralized • Not scalable
• Not robust

• Relying on an accurate traffic modelling


Model-Based framework
• the accuracy of which is questionable

Curse of • Increasing the complexity of the system


exponentially with the increase in the number of
Dimensionality intersections/controllers

Human
• Requiring highly skilled labour to operate due to
Intervention their complexity.
Requirements
Why is MARLIN Different?
7

Human Intervention Requirements


Self-Learning

Specific Design Generic Decentralized Centralized

MARLIN
Prediction Pattern Model-Based
Model-Free
Sensitive
Requirement

Curse of Dimensionality Scalable Coordinated Inefficient Coordination


Learning the Control Law:
Reinforcement Learning Architecture
8

RL Architecture
Environment

Action State Reward


Agent

Goal: Optimal Control law = mapping between states and actions


Q Table
Q k 1
( s , a )  Q ( s , a )   [r
k k k k k k 1
  max Q (s
k k 1
, a)  Q (s , a )]
k k k
a Q a1 a2
a k 1  arg max Q k ( s k 1 , a) Balancing exploration and exploitation s1 -10 -5
a
s2 -3 -15
RL-based ATSC Architecture
9

Traffic Simulation
Environment

Action
(Extend
State /Switch)
(Queue Lengths)
RL Software
Reward
(Delay Savings)
Agent
MARLIN- ATSC: Coordination Principle
10

 Each agent plays a game with each adjacent


intersection in its neighborhood
I1 I2 I3 I1 I2 I3 I1 I2 I3

I4 I5 I6 I4 I5 I6 I4 I5 I6

I7 I8 I9 I7 I8 I9 I7 I8 I9

Example for Edge Intersection Example for Corner Intersection


Example for Intermediate Intersection
( 3 Games) ( 2 Games)
(4 Games )
MARLIN-ATSC Available Modes
11

 MARLIN-ATSC: (a) Independent Mode, (b) Integrated Mode


MARLIN-ATSC
Queue Length 1
(a)
Queue Length 2
Extend 1 Delay 1
Extend 2 Delay 2

(b) Queue Length 1


Queue Length 2
Extend 1 Delay 1
Extend 2 Delay 2
Large-Scale Application
Network-Wide MOE in the Normal Scenario
12

% %
% Improvments
Improvments Improvments
System BC MARLIN-IC Vs.
MARL-TI Vs. MARLIN-IC Vs.
MOE MARL-TI
BC BC
Average Intersection
35.27 27% 38% 14%
Delay (sec/veh)

Throughput (veh) 23084 3% 6% 3%

Avg Queue Length (veh) 8.66 24% 32% 11%

Std. Avg. Queue Length


2.12 23% 31% 10%
(veh)

Avg. Link Delay (sec) 9.45 10% 47% 41%


Large-Scale Application
% Improvement in Average Delay
13

MARLIN-IC vs BC

Area 2

Area 3

% Improvement
Area 1
Large-Scale Application
Average Route Travel Time for Selected Routes
14
8
Gardiner EB
7
Average Travel Time (min)

5
Freeway

0
1 2 3 4 5 6 7 8 9 10 11 12
Time Interval (5 min)

BC MARL-TI MARLIN-IC
Large-Scale Application
Average Route Travel Time for Selected Routes
15

20
LakeShore EB to Spadina NB
18
Average Travel Time (min)

16
Major Arterial

14
12
10
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10 11 12
Time Interval (5 min)

BC MARL-TI MARLIN-IC
MARLIN-HILS Architecture
16

Traffic Signal Controller


Controller Interface Device(CID)
RS485 to USB
RS485 -
SDLC protocol

USB -
SDLC protocol
Ethernet -
NTCIP protocol

Industrial Computer
Paramics
Modeller
HILS Setup: Demo
17
Conclusion
18

 MARLIN state of the art gen4+


 Thoroughly developed and tested
 Patent Pending Status
 On going:
 HILS & PEEK ATC-1000 Integration
 Potential Field Operation Test

 Productization

 From TSP to People Priority (PSP)


Samah El-Tantawy samah.el.tantawy@utoronto.ca
Baher Abdulhai baher.abdulhai@utoronto.ca
Hossam Abdelgawad h.abdel.gawad@utoronto.ca

ACGM 2013- Intelligent Transport for Smart Cities


Smart Traffic Lights that Learn !
Multi-Agent Reinforcement Learning Integrated Network
of Adaptive Traffic Signal Controllers
M A R L I N

Samah ElTantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering


Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering
Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed

ACGM 2013- Intelligent Transport for Smart Cities

Вам также может понравиться