Академический Документы
Профессиональный Документы
Культура Документы
Most analysis and control assumes stationary stochastic processes{no change in { Mean { Variance { Covariances Bad things can happen to good processes { A router can fail in a network { A conveyor belt can stop on an assembly line { A bank can fail in an economy Need to determine when process parameters have changed in order to { Correct the process { Change control parameters
Mainframe Data
40 usr 20 0 0 20 mdb
10 20 time (hr)
0 0 400 Ipkt/s
0 0 400
10 20 time (hr)
10
200
0 0 20 Coll%
10 20 time (hr)
10
0 0
10 20 time (hr)
0 0
0 0
10 20 time (hr)
Outline
Hypothesis testing and statistical background O -line tests Theory for on-line tests On-line tests Practical considerations References
Hypothesis Testing
Test assertions about parameters of a process (e.g., mean, variance, covariance) H0 (Null hypothesis): normal situation (e.g., mean response time is 1 second) H1 (Alternate hypothesis): abnormal situation (e.g., mean response time is 3 seconds)
the data
T (y) = f (y1
yN )
termines if the test statistic is within an acceptable range { 0: Okay { 1: raise an alarm Observation: d classi es values of y
Test statistics T (y) = y T (y) = Pi(yi ; y)2 Decision functions: Use critical value (upper or lower 8 limit) > > 1 if T < T L < C d(T ) = > > 0 otherwise :
d(T )
8 > > < > > :
H 1 if T > TC 0 otherwise
10
Outcomes of Tests
Raise an alarm if d(T ) = 1 No Alarm Alarm H0 is true OK false positive H1 is true false negative OK
11
Critical Regions
Set of y values for which H0 is rejected { Denoted by C P ( false positive ) = = P (y 2 C j H0) P ( false negative ) = = P (y 6 2C j H1)
12
Critical Regions
13
Test Design
Objective: select test that minimizes subject to the constraint that is not too large. Power of a test provides a succinct way of expressing this objective ( ) = P ( rejects H0 j ) Note that 8 > > if 2 H0 > > > ( ) = < 1 ; if 2 H1 > > >
> > :
Ideal test
0 if ( ) = 1 if
8 > > > > > < > > > > > :
2 H0 2 H1
14
Notes
15
Likelihood Function
Example: Normal distribution with = ( 2) 0 1 (y ; )2 ) B 1 C L (y) = B p C exp(; @ A 22 2 where H0 is speci ed in terms of and . If observations are i.i.d. L (y1 yN ) = L (y1) L (yN ) For the normal, this means 0 1 1N exp(; N (yi ; )2 ) X B C L (y1 yN ) = B p C @ A 22 2
i=1
Transformation of the data { Used in test statistics Indicates the probability (or density) of the data if 8the distribution is known > > P (y j ) if y is discrete L (y) = < > > f (y j ) if y is continuous :
16
17
Notes
18
19
Likelihood Ratio
Indicates the relative probability (or density) of obtaining the data L 1 (yi) L 0 (yi) Often use the log of the likelihood ratio L 1 (yi) si = ln L (y ) 0 i Example: N ( 0 2) and N ( 1 2) 0 1 ; 0 @y ; 0 + 1 A si = 1 2 i 2 v2 (yi ; 0) ; v22 = b yi ; 0 ; v !2 = 2 v = 1 ; 0 is the change in magnitude b = 1; 0 is the signal to noise ratio
20
Notes
21
Consider Gaussian yi si is Gaussian since a linear combination of Gaussians is Gaussian If 2 H0, then E (si) < 0 If 2 H1, then E (si) > 0 If H0 H1, then E (si) 0
22
Notes
E (si) < 0 follows from E0( v2 (yi ; 0) ; v22 ) = ; v22 ) 2 2 E (si) > 0 follows from E1( v2 (yi ; 0) ; v22 ) = v22 )
2 2
23
H1
Given
corresponding to H0 and
De nition: is a most powerful test i For all such that ( 0) ( 0) Then ( 1) ( 1) Intuition for constructing : Place rst into the critical region those y that have: { the lowest probability under H0 { the highest probability under H1 Neyman-Pearson Lemma { is a most powerful test if it is constructed as follows L 1 (y) y 2 C i L (y) > h
0
24
Notes
25
O -Line Tests
View as constrained clustering Want homogeneous clusters Choose change points such that { Variance within a cluster is smaller than variance between Assumes that only the mean changes
26
Example of Partitioning
A 3-Partitioning
6 5 4 3 2 1 0 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
27
Notes
28
29
De nitions
Adjusted sum of squares (degree of homogeneity) within a partition n (y ; y m::n])2 X ASQ m::n] = j Figure of merit for the change points identi ed is k;1 DP = ASQ Pk ::N ] + X ASQ Pj ::Pj +1 ; 1]
j =1 j =m
30
Observations
The computational complexity of nding 1 0 BN C B C an optimal k-partitioning is B k C B C @ A If P is an N-partitioning, then DP = 0 Want a k-partitioning with { k large enough to nd the change points { k small enough so that non-change points are avoided
DP 0
31
Fisher Algorithm for Change-Point Detection ChangePoints( rst, last, CPList) Compute Q = (1 Q2), the optimal 1-partitioning
Compute T where
ASQ first::last] T = A first::Q ; 1] + A Q ::last] 2 SQ SQ 2
If T exceeds a critical value Add Q2 to CPList ChangePoints( rst, Q2 ; 1, CPList) ChangePoints(Q2, last, CPList) Return
32
33
34
Data are presented serially Raise an alarm if a change is detected { ta is the time of the alarm Identify when the change occurred { t0 is the time of the change-point, with
t0 ta
35
17 18
19 20
21 22
23 24
25 26
27 28
29 30
t0
ta
36
Let
H0 { p 1 be the distribution (density) under H1 Consider yk H0: p(yk j yk;1 y1) = p 0 (yk j yk;1 y1 ) H1: There is a time t0 such that { for 1 i t0 ; 1: p(yi j yi;1 y1) = p 0 (yi j yi;1 y1) { for t0 i k: p(yi j yi;1 yt0 ) = p 1 (yi j yi;1 yt0 ) The alarm time ta is the smallest k such that H1 is chosen over H0
37
Sum of log likelihood ratios k k = X sj Si Consider yi is N ( 2) = 0 under H0 = 1 under H1 m= k;i Then k = b Pk yj ; 0 ; v ! Si j =i 2 k is N (; mv22 mb2) H0: Si 2 k is N ( mv22 mb2) H1: Si 2
38
Notes
39
40
How is h Determined?
Ideally want { Short alarm delay if there is a changepoint { Long time until an alarm if there is no change-point Criteria: Average Run Length (ARL) { Average number of observations until there is an alarm ARL is related to but di erent from the power of an o -line test
41
42
43
Notes
Relate to Shewhart { Geometric weighting ensures that the mean of gk is the same as that for sk. { However, the variance is di erent.
44
45
46
47
Suppose that the data vary with time of day or day of month Question: How do we separate normal variability from abnormal variability? Answer: Model the normal variability
48
20 httpop/s
10
0 0 20 httpop/s
0 0 20
0 0 4
10 20 Sathour 14 day
10
10
0 0 4 httpop/s 2 0 0
10 20 Sunhour 15 day
0 0
10 hour
20
0 0
10 hour
20
10 hour
20
49
Summary
Basics { O -line vs. on-line change-point detection { Likelihood functions, ratios, log likelihood functions { Neyman-Pearson lemma O -line change point detection { Fisher algorithm On-Line change-point detection { Shewhart { Geometric moving average { CUSUM
50
References