Академический Документы
Профессиональный Документы
Культура Документы
$$
Dependability
Copyright 2008-13 Andrew Snow 14
All Rights Reserved
Focus is often on More Reliable
and Maintainable Components
• How to make things more reliable
– Avoid single points of failure (e.g. over concentration to achieve
economies of scale?)
– Diversity
• Redundant in-line equipment spares
• Redundant transmission paths
• Redundant power sources
• How to make things more maintainable
– Minimize fault detection, isolation, repair/replacement, and test
time
– Spares, test equipment, alarms, staffing levels, training, best
practices, transportation, minimize travel time
• What it takes --- $$$$$$$$$$$$$$$$$$$$$$$$$
Copyright 2008-13 Andrew Snow 15
All Rights Reserved
Paradox
• We are fickle
• When ICT works, no one wants to spend
$$ for unlikely events
• When an unlikely event occurs
– We wish we had spent more
• Our perceptions of risk before and after
catastrophes are key to societal behavior
when it comes to ICT dependability
Back-up
Facility
Back-up
Facility
Back-up
Facility
Site 3
1
N-1
Capacity
..
.
Site N
1
N-1
Capacit
y
• Outages
• Severity
• Likelihood
• Fault Prevention, Tolerance, Removal and
Forecasting
• Outages
• Severity
RISK
• Likelihood
• Fault Prevention, Tolerance, Removal and
Forecasting
III I
SEVERITY OF SERVICE OUTAGE
IV II
Commercial AC DC
DC
Rectifiers Distribution
Panel
Backup
Generator
Battery Backup
Alarms
BSC BSC
HLR VLR
BS
BS MSC
BSC
BS
BSC
MSC PSTN
BS SWITCH
SS7
STP
BSC BSC
BSC
BS
Copyright 2008-13 Andrew Snow BS 50
All Rights Reserved
Some Conclusions about Vulnerability
• Vulnerability highly situational, facility by
facility
• But a qualitative judgment can select a
quantitative score [1, 10]
10 144
Severity (Consequence)
500 1000
Likelihood
(Intention x Capability x Vulnerability)
Copyright 2008-13 Andrew Snow 56
All Rights Reserved
Conclusion Regarding Danger Index
• Highly situational, facility by facility
– Engineering, installation, operations, and
maintenance
– Security (physical, logical layers, etc)
– Degree of adherence to best practices, such as NRIC
• Need rules and consistency for assigning [1, 10]
scores in the four dimensions
• A normalized danger index looks feasible,
practical and useful for TCOM risk assessments
• Avoids guesses at probabilities
• Allows prioritization to ameliorate risk
Cut
Failure
Improper Deployment:
“Collapsed” or “Folded” Ring
sharing same path or conduit
Cut
STP STP
SCP
SCP
STP
STP
A, B, or C, or F Transmission Link
SSP: Signaling Service Point (Local or Tandem Switch)
STP: Signal Transfer Point (packet Switch Router)
SCP: Copyright
Service 2008-13
Control Andrew Snow
Point 72
All Rights Reserved
SS7 Vulnerabilities
• Lack of A-link path diversity: Links share a portion or a
complete path
• Lack of A-link transmission facility diversity: A-links share
the same high speed digital circuit, such as a DS3
• Lack of A-link power diversity: A-links are separate
transmission facilities, but share the same DC power
circuit
• Lack of timing redundancy: A-links are digital circuits that
require external timing. This should be accomplished by
redundant timing sources.
• Commingling SS7 link transmission with voice trunks
and/or alarm circuits: It is not always possible to allocate
trunks, alarms and A-links to separate transmission
facilities.
Copyright 2008-13 Andrew Snow 73
All Rights Reserved
SS7 A-Links
Switch 1 STP
Proper SS7
Deployment Switch 2 Network
Cut
STP
Switch 1
STP
STP
‘A’ Link
Improper DS3 F.O.
Fiber Cable
SW ‘A’ Link Cut
Deployment Mux Transceiver
DC
SW Fuse Power
Source
‘A’ Link
DS3 F.O.
Mux Transceiver Fib
er C
able
2
Commercial AC DC
DC
Rectifiers Distribution
Panel
Backup
Generator
Battery Backup
Alarms Inoperable
Copyright 2008-13 Andrew Snow 78
All Rights Reserved
Economy of Scale Over-Concentration
Vulnerabilities
Distributed Topology Switches Concentrated
To To
Tandem Tandem
SW1
SW1
SW2
SW2
SW3
SW3
Copyright Trunks
Local Loop2008-13 Andrew Snow 79
All Rights Reserved
Fiber Pair Gain Building
Proper Public Safety
Access Point (PSAP) Deployment
SRD
Local Local Local Local
Switch Switch Switch Switch
SRD
SRD
Selective Route Database
Copyright 2008-13 Andrew Snow 80
All Rights Reserved
Wireless Personal
Communication Systems
• Architecture
• Mobile Switching Center
• Base Station Controllers
• Base Stations
• Inter-Component Transmission
• Vulnerabilities
BSC BSC
HLR VLR
BS
BS MSC
BSC
BS
BSC
MSC PSTN
BS SWITCH
SS7
STP
BSC BSC
BSC
BS
Copyright 2008-13 Andrew Snow BS 82
All Rights Reserved
PCS Component Failure Impact
Components Users Potentially
Affected
Database 100,000
Mobile Switching Center 100,000
Base Station Controller 20,000
Links between MSC and BSC 20,000
Base Station 2,000
Links between BSC and BS 2,000
Copyright 2008-13 Andrew Snow 83
All Rights Reserved
Outages at Different Times of Day
Impact Different Numbers of People
PSTN MSC
Gateway MSC
MSC
Anchor
SW MSC
MSC
MSC MSC
• Architecture
• Head End
• Transmission (Fiber and Coaxial Cable)
• Cable Internet Access
• Cable Telephony
• Vulnerabilities
Call
800 LNP Connection
DB DB Agent
Core Packet
SS7 Signaling Network
Network Gateways
Billing
Agent
Circuit Circuit Trunk Access
Switch Switch Gateway Gateway
Source: DHS
Copyright 2008-13 Andrew Snow 93
All Rights Reserved
Vulnerabilities
• Adoption of TCP/IP as a de facto standard for all
data communications, including real-time control
systems in many industries including energy.
• Bias toward Commercial Off-the-Shelf (COTS)
software including both applications and
operating systems in order to accelerate
development and reduce costs.
• Note that “Open Source” software such as Linux
appears not to have a foothold in production
systems among utilities.
1.00
0.60
MTTF = 2 Yrs
MTTF = 3 Yrs
0.40
MTTF = 4 Yrs
0.20 MTTF = 5 Yrs
0.00
0 1 2 3 4 5
Years
MTTF 620
A 0.99919
MTTF MTTR 620.5
U 1 A 0.00081
Down _ Time 0.00081 24hrs 30day 3months 1.74 Hours
0.9998
Availability
MTTR = 10 Min
0.9996
MTTR = 1 Hr
MTTR = 2 Hr
0.9994
MTTR = 4 Hr
0.9992
0.999
0 1 2 3 4 5
MTTF in Years
D1 D2
100%
SV1
SV2 Outage 1: Failure and
Percent Users Served
75% Outage 1
complete recovery. E.g.
Switch failure
50% Outage 2
Outage 2: Failure and graceful
Recovery. E.g. Fiber cut with
25% rerouting
SV = SEVERITY OF OUTAGE
D = DURATION OF OUTAGE
0%
NON-SURVIVAL
30 K
Reportable2
Non-reportable
Non-reportable
30 Min Duration
Copyright 2008-13 Andrew Snow 111
All Rights Reserved
Severity
• The measure of severity can be expressed a number of
ways, some of which are:
– Percentage or fraction of users potentially or actually affected
– Number of users potentially or actually affected
– Percentage or fraction of offered or actual demand served
– Offered or actual demand served
• The distinction between “potentially” and “actually”
affected is important.
• If a 100,000 switch were to fail and be out from 3:30 to
4:00 am, there are 100,000 users potentially affected.
However, if only 5% of the lines are in use at that time of
the morning, 5,000 users are actually affected.
www.fcc.gov/nors/outage/bestpractice/BestPractice.cfm
Maintainability Growth,
Constancy, Deterioration
MG MC MD
RG
Reliability
Growth,
Constancy,
Deterioration RC
RD
w1
w2
Inputs
f(net)
wn
X1
X2
Input Output
X3
Layer Layer
X4
X5
Copyright 2008-13 Andrew Snow Hidden Layer 132
All Rights Reserved
Database
MTTF
BS
MTTF ANN Model
BSC
MTTF
MSC
MTTF
BS-BSC
MTTF
BSC-MSC
MTTF
Reportable
Outages
Database
MTR
BS
MTR
BSC
MTR
MSC
MTR
BS-BSC
MTR
BSC-MSC
MTR
Simulation Model
(Discrete-time Event Simulation in VC++)
Survivability
Outputs Availability
FCC-Reportable Outages
Failure Frequency
Customers Impacted
Component Failure
Simulation Timeline
Results as
Test Data
Train a NN
(NeuroSolution)
Test a NN
Survivability Graphs (9)
Reliability/
Learning Curve Maintainability
MSE Scenarios FCC-Reportable Outages (9)
R^2
Sensitivity
Analysis
40
35
30
25 REPORTABLE OUTAGES
Output
20
REPORTABLE OUTAGES
15 Output
10
0
1 9 17 25 33 41 49 57 65 73 81 89
Exemplar
REPORTABLE OUTAGES Output
40
Neural Network Output
35
30
25
REPORTABLE OUTAGES
20
Output
15
10
5
0
0 10 20 30 40
Simulation Output
25
MTTFBS
20 MTTFBSC
Sensitivity
MTTFBSCBS
15
MTTFMSC
10 MTTFMSCBSC
5 MTTFDB
MTRBS
0
MTRBSC
M FD CB
M RB
M
M T FB
M T FB
M FB C
M T FM CB
M T FM C
M B
M BS
M M BS
M M
Survivability
TR B S
TR SC
T
T S
TT S
T S
T S S
TT S
TR S
T C
TR SC
TR SC
MTRBSCBS
D BS
B
MTRMSC
MTRMSCBSC
C
C
Sensitivity About the Mean
Input Nam e MTRDB
180000
MT T FBS
160000
140000 MT T FBSC
120000 MT T FBSCBS
Sensitivity
100000 MT T FMSC
80000
60000 MT T FMSCBSC
40000 MT T FDB
20000 MT RBS
0
MT RBSC
M FB C
M BS
M
M TFB
M FB
M FM CBS
M FM C
M FD CB
M RB
M BS
M M BS
M M
TT S
TR C
T
TT S
TT S
TT S
TT S
T B SC
TR S
TR C
TR SC
TR SC
MT RBSCBS
D BS
B C
MT RMSC
MT RMSCBSC
RD/MG 0.999850
25.00 RD/MD RG/MD
Survivability
RG/MC 0.999800 RD/MG
20.00
RG/MG 0.999750 RG/MG
15.00 RD/MC
0.999700 RD/MD
10.00 RC/MD
RC/MG 0.999650 RD/MC
5.00
RC/MC RG/MC
0.00 0.999600
RC/MG
1 2 3 4 5 1 2 3 4 5
Years
RC/MD
Years
0.99999 8 42.05
100.00%
90.00%
80.00%
70.00%
60.00%
Long Tail
50.00% Short Tail
Exponential
40.00%
30.00%
20.00%
10.00%
0.00%
0.999925
0.999930
0.999935
0.999940
0.999945
0.999950
0.999955
0.999960
0.999965
0.999970
0.999975
0.999980
0.999985
0.999990
0.999995
1.000000
Copyright 2008-13 Andrew Snow 150
All Rights Reserved
Conditional Cumulative Distribution Function
MTTF = 4 Yr; MTTR = 21.02 min; TI = 1 Yr
Cond CDF
100.00%
90.00%
80.00%
70.00%
60.00%
Long Tail
50.00% Short Tail
Exponential
40.00%
30.00%
20.00%
10.00%
0.00%
0.999925
0.999930
0.999935
0.999940
0.999945
0.999950
0.999955
0.999960
0.999965
0.999970
0.999975
0.999980
0.999985
0.999990
0.999995
1.000000
Copyright 2008-13 Andrew Snow 151
All Rights Reserved
Cumulative Distribution Function MTTF = 4 Yr
MTTR = 20.6 min; TI = 1 YR; A= 0.999999
CDF
100.00%
90.00%
80.00%
70.00%
60.00%
Long Tail
50.00%
Short Tail
40.00%
30.00%
20.00%
10.00%
0.00%
0.999925
0.999930
0.999935
0.999940
0.999945
0.999950
0.999955
0.999960
0.999965
0.999970
0.999975
0.999980
0.999985
0.999990
0.999995
1.000000
Copyright 2008-13 Andrew Snow 152
All Rights Reserved
Conditional Cumulative Distribution Function
MTTF = 4 Yr; MTTR = 20.6 min;
TI = 1 YR; A= 0.999999
Cond CDF
100.00%
90.00%
80.00%
70.00%
60.00%
Long Tail
50.00%
Short Tail
40.00%
30.00%
20.00%
10.00%
0.00%
0.999925
0.999930
0.999935
0.999940
0.999945
0.999950
0.999955
0.999960
0.999965
0.999970
0.999975
0.999980
0.999985
0.999990
0.999995
1.000000
Copyright 2008-13 Andrew Snow 153
All Rights Reserved
Long Tail Lognormal Distribution
TI = 1 YR; A = 0.99999
Long Tail
90.00%
20.24%
80.00% 37.91%
70.00%
63.44%
60.00%
86.43% A < 0.99999
50.00% 98.19% 0.99999 >= A < 1
A=1
40.00% 77.59%
30.00% 60.65%
20.00%
36.18%
10.00%
13.49%
0.00% 1.81%
MTTF = 0.25 MTTF = 0.5 MTTF = 1 MTTF = 2 MTTF = 4
80.00%
70.00%
60.00%
A < 0.99999
50.00% 100.00% 99.91% 99.40% 96.34% 0.99999 >= A < 1
90.32%
40.00%
30.00%
20.00%
10.00%
0.00%
MTTF = 0.25 MTTF = 0.5 MTTF = 1 MTTF = 2 MTTF = 4
ACTS
AC Circuit CB
Rec
DC Circuit
B
CB/F
160
140
Cumulative Quarterly Count
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35
Quarter
(t ) (t )
Outage Intensity
Breakpoint Jump Point
160
Count of power outages
140
Power Law Model
120
Cumulative Count
100
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9
Years
160
140
Cumulative Quarterly Count
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35
Quarter
12
10
Outages per Quarter
0
0 5 10 15 20 25 30 35
Quarter
ACTS
Com
Generator 1 Batteries 3
AC
Alarms
CB
Rectifiers 2
DC Distr. Panel
CB/F
Power Ckts
In Telecom Eqpt
High 1,000 31
Trigger Cause Total Low Medium High Root Cause Total Low Medium High
Outages Impact Impact Impac Outages Impact Impact Impact
t
175
150
125
Before 911
Model
50
25
0
1993 1995 1997 1999 2001 2003 2005
Year
STP STP
SCP
SCP
STP
STP
250
60
200 50
SS7 outage
Total Outage 30
100
20
50 10
0
0
1999 2000 2003 2004 1999 2000 2003 2004
Year year
Outage Event
60
1999
50
2000
40
30
20
10
0
0 0.5 1 1.5 2
Time(i)
60
50
40
Outage Event
2003
30
2004
20
10
0
0 0.5 1 1.5 2
Time
8
Data
7
Outage per Month
6 Poisson Model
5
4
3
2
1
0
9
4
99
00
01
02
03
04
-9
-0
-0
-0
-0
-0
n-
n-
n-
n-
n-
n-
l
l
Ju
Ju
Ju
Ju
Ju
Ju
Ja
Ja
Ja
Ja
Ja
Ja
Time
30
25
Percent of Outage
20
1999-2000
15
2003-2004
10
0
AM
PM
AM
AM
PM
PM
2
-4
2
-8
-4
-8
-1
-1
AM
AM
PM
PM
AM
PM
4
12
4
12
8
8
Time Slot
30
25
Percent Of Outage
20
1999-2000
15
2003-2004
10
0
ay
y
y
y
ay
ay
da
a
da
sd
rd
nd
sd
id
es
on
Fr
tu
ur
Su
ne
Tu
M
Sa
Th
ed
W
Day Of Week