Sigt 98

1
An Introduction to Change-Point Detection

Joseph L. Hellerstein T.J. Watson Research Center IBM Research Hawthorne, New York Fan Zhang Department of Industrial Engineering and Operations Research Columbia University New York, New York June, 1998
Hellerstein and Zhang
Background and Motivations
Most analysis and control assumes stationary stochastic processes{no change in { Mean { Variance { Covariances Bad things can happen to good processes { A router can fail in a network { A conveyor belt can stop on an assembly line { A bank can fail in an economy Need to determine when process parameters have changed in order to { Correct the process { Change control parameters
Mainframe Data
Web Server Data

40 40 20 pdb 10 20 time (hr) sys 20
40 usr 20 0 0 20 mdb
10 20 time (hr)
0 0 400 Ipkt/s
0 0 400
10 20 time (hr)
10
200
Opkts 10 20 time (hr)
200 0 0 10 20 5 x 10 time (hr) 4 tcpOut/s
0 0 20 Coll%
10 20 time (hr)
0 0 10 20 5 x 10 time (hr) 2 tcpIn/s
10
0 0
10 20 time (hr)
0 0
0 0
10 20 time (hr)
Types of Change-Point Detection

O -line { Data are presented en-mass { Identify stationary intervals On-line { Data are presented serially { Detect when the parameters of the process change
Outline
Hypothesis testing and statistical background O -line tests Theory for on-line tests On-line tests Practical considerations References
Hypothesis Testing
Test assertions about parameters of a process (e.g., mean, variance, covariance) H0 (Null hypothesis): normal situation (e.g., mean response time is 1 second) H1 (Alternate hypothesis): abnormal situation (e.g., mean response time is 3 seconds)
Components of a Statistical Test ( )

T : A test statistic that is computed from
the data
T (y) = f (y1
d(T ) 2 f0 1g: A decision function that de-
yN )
termines if the test statistic is within an acceptable range { 0: Okay { 1: raise an alarm Observation: d classi es values of y
Examples of Test Components
Test statistics T (y) = y T (y) = Pi(yi ; y)2 Decision functions: Use critical value (upper or lower 8 limit) > > 1 if T < T L < C d(T ) = > > 0 otherwise :
d(T )
8 > > < > > :
L H 1 if T < TC or T > TC d(T ) = 0 otherwise Mixed test: d(T ) 2 0 1]
8 > > < > > :
H 1 if T > TC 0 otherwise
10
Outcomes of Tests
Raise an alarm if d(T ) = 1 No Alarm Alarm H0 is true OK false positive H1 is true false negative OK
11
Critical Regions
Set of y values for which H0 is rejected { Denoted by C P ( false positive ) = = P (y 2 C j H0) P ( false negative ) = = P (y 6 2C j H1)
12
Critical Regions
13
Test Design
Objective: select test that minimizes subject to the constraint that is not too large. Power of a test provides a succinct way of expressing this objective ( ) = P ( rejects H0 j ) Note that 8 > > if 2 H0 > > > ( ) = < 1 ; if 2 H1 > > >
> > :
Ideal test
0 if ( ) = 1 if
8 > > > > > < > > > > > :
2 H0 2 H1
14
Notes
Can always minimize or separately by having a deterministic outcome to the test
15
Likelihood Function
Example: Normal distribution with = ( 2) 0 1 (y ; )2 ) B 1 C L (y) = B p C exp(; @ A 22 2 where H0 is speci ed in terms of and . If observations are i.i.d. L (y1 yN ) = L (y1) L (yN ) For the normal, this means 0 1 1N exp(; N (yi ; )2 ) X B C L (y1 yN ) = B p C @ A 22 2
i=1
Transformation of the data { Used in test statistics Indicates the probability (or density) of the data if 8the distribution is known > > P (y j ) if y is discrete L (y) = < > > f (y j ) if y is continuous :
16
Likelihood Function For Correct

N(0,1) Likelihood Values for N(0,1) RV
L(y1,...,yN) is approximately 1/10**21
17
Notes
Put on alternate projector
18
Likelihood Function For Incorrect

N(0,1) Likelihood Values for N(3,1) RV
L(y1,...,yN) is approximately 1/10**72
19
Likelihood Ratio
Indicates the relative probability (or density) of obtaining the data L 1 (yi) L 0 (yi) Often use the log of the likelihood ratio L 1 (yi) si = ln L (y ) 0 i Example: N ( 0 2) and N ( 1 2) 0 1 ; 0 @y ; 0 + 1 A si = 1 2 i 2 v2 (yi ; 0) ; v22 = b yi ; 0 ; v !2 = 2 v = 1 ; 0 is the change in magnitude b = 1; 0 is the signal to noise ratio
20
Notes
Put on second projector
21
Observations About Likelihood Ratios
Consider Gaussian yi si is Gaussian since a linear combination of Gaussians is Gaussian If 2 H0, then E (si) < 0 If 2 H1, then E (si) > 0 If H0 H1, then E (si) 0
22
Notes
E (si) < 0 follows from E0( v2 (yi ; 0) ; v22 ) = ; v22 ) 2 2 E (si) > 0 follows from E1( v2 (yi ; 0) ; v22 ) = v22 )
2 2
23
Most Powerful Test

and
1
H1
Given
corresponding to H0 and
De nition: is a most powerful test i For all such that ( 0) ( 0) Then ( 1) ( 1) Intuition for constructing : Place rst into the critical region those y that have: { the lowest probability under H0 { the highest probability under H1 Neyman-Pearson Lemma { is a most powerful test if it is constructed as follows L 1 (y) y 2 C i L (y) > h
0
24
Notes
Illustrate intuition from the critical region gure
25
O -Line Tests
View as constrained clustering Want homogeneous clusters Choose change points such that { Variance within a cluster is smaller than variance between Assumes that only the mean changes
26
Example of Partitioning
A 3-Partitioning
y[1..5]= .48 Asq[1..5]= .95
y(6..14]= 2.05 Asq[6..14]=1.72
6 5 4 3 2 1 0 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14
y(15..19]= 1.09 Asq[15..19]=1.31
y(20..30]= 3.83 Asq[20..30]=5.20
15 16
17 18
19 20
21 22
23 24
25 26
27 28
29 30
27
Notes
28
An Approach to O -Line Change Point Detection

Perspective: Locating change-points is equivalent to nding the optimal way to partition time-serial data { Homogeneous within a partition { Heterogeneous between partitions A range of indices is indicated by m::n] for 1 m < n N Detecting k change-points results in a k-
Approach is due to W.D. Fisher
partitioning { P = (P1 Pk ), { 1 P1 < P2 < < Pk
29
De nitions
Mean of a range of observations + y m::n] = ym + m + 1yn n;
Adjusted sum of squares (degree of homogeneity) within a partition n (y ; y m::n])2 X ASQ m::n] = j Figure of merit for the change points identi ed is k;1 DP = ASQ Pk ::N ] + X ASQ Pj ::Pj +1 ; 1]
j =1 j =m
P is an optimal k-partitioning if there is no k-partitioning P0 such that DP0 < DP
30
Observations
The computational complexity of nding 1 0 BN C B C an optimal k-partitioning is B k C B C @ A If P is an N-partitioning, then DP = 0 Want a k-partitioning with { k large enough to nd the change points { k small enough so that non-change points are avoided
DP 0
31
Fisher Algorithm for Change-Point Detection ChangePoints( rst, last, CPList) Compute Q = (1 Q2), the optimal 1-partitioning
Compute T where
ASQ first::last] T = A first::Q ; 1] + A Q ::last] 2 SQ SQ 2
If T exceeds a critical value Add Q2 to CPList ChangePoints( rst, Q2 ; 1, CPList) ChangePoints(Q2, last, CPList) Return
32
Results of Applying Fisher Algorithm to Mainframe Data
33
On-Line Change-Point Detection

Introduction Shewhart Test { Average run length Geometric moving average test CUSUM test
34
Introduction to On-Line Change-Point Detection
Data are presented serially Raise an alarm if a change is detected { ta is the time of the alarm Identify when the change occurred { t0 is the time of the change-point, with
t0 ta
35
Illustration of Concepts in On-Line Change-Point Detection

Ilustration of On-Line Detection
6 5 4 3 2 1 0 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Alarm Delay Change Point Alarm time
17 18
19 20
21 22
23 24
25 26
27 28
29 30
t0
ta
36
Formalization for On-Line Tests { p be the actual distribution (density)

function { p 0 be the distribution (density) under
Let
H0 { p 1 be the distribution (density) under H1 Consider yk H0: p(yk j yk;1 y1) = p 0 (yk j yk;1 y1 ) H1: There is a time t0 such that { for 1 i t0 ; 1: p(yi j yi;1 y1) = p 0 (yi j yi;1 y1) { for t0 i k: p(yi j yi;1 yt0 ) = p 1 (yi j yi;1 yt0 ) The alarm time ta is the smallest k such that H1 is chosen over H0
37
Test Statistic for On-Line Change-Points

j =i
Sum of log likelihood ratios k k = X sj Si Consider yi is N ( 2) = 0 under H0 = 1 under H1 m= k;i Then k = b Pk yj ; 0 ; v ! Si j =i 2 k is N (; mv22 mb2) H0: Si 2 k is N ( mv22 mb2) H1: Si 2
38
Notes
39
Shewhart Algorithm for On-Line Change Point Detection

Operation { Take samples of xed batch size N { Make a decision independently for each batch Do for k = 1 to 1 { Obtain the next N samples { If S(kN N +1 > h k;1) Raise an alarm exit Note { Granularity of detection is determined by N
40
How is h Determined?
Ideally want { Short alarm delay if there is a changepoint { Long time until an alarm if there is no change-point Criteria: Average Run Length (ARL) { Average number of observations until there is an alarm ARL is related to but di erent from the power of an o -line test
41
Computing Average Run Length for Shewhart Test

N = P (S1 > h j H0) 0 P (ta = kN j H0) = (1 ; 0)(k;1) 0 ARL0: Time until an alarm if there is no change point E (ARL0) = N= 0 ARL1: Alarm delay (time from changepoint until alarm is raised) N E (ARL1) = N=P (S1 > h j H1) Choose h based on a desired ARL0
42
Geometric Moving Average Control Charts

Motivation { Give recent observations more weight Consider 0 < 1 gk = (1 ; )(gk;1) + (sk ) = P1 (1 ; )nsk;n n=0 Decision for an alarm ta = minfk j gk hg Obtaining h: Observe that under H0 v22 b2 ) gk is N (; 2 2;
43
Notes
Relate to Shewhart { Geometric weighting ensures that the mean of gk is the same as that for sk. { However, the variance is di erent.
44
CUSUM: Cumulative Sum Control Charts

Motivation { Sik has a negative drift under H0 { As a result, ARL1 may be longer than necessary Strategy { Adjust Sik so that it does not become too small k gk = S1 ; mk j mk = minfS1 j 1 j kg Approach ta = minfk j Sk mk + hg
45
Illustration of the Three On-Line Algorithms
46
Unknown Probability Distributions

Situation { Can estimate 0 from historical data { 1 is unknown Generalized likelihood ratio Uik = sup 1 fSik ( 1)g For Gaussian distributions and changes in the mean 1 0 k = sup 1 @ 1;2 0 A Pk=i(yj ; 1+ 0 ) Ui j 2 This can be solved explicitly Use Uik instead of Sik
47
Handling Non-Stationary Data
Suppose that the data vary with time of day or day of month Question: How do we separate normal variability from abnormal variability? Answer: Model the normal variability
48
Normal Variability in Web Server Data

Mon day 9 Tue day 10 20 httpop/s httpop/s 20 Wed day 11 10 10
20 httpop/s
10
0 0 20 httpop/s
10 20 Thuhour 12 day httpop/s
0 0 20
10 20 Fri hour 13 day httpop/s
0 0 4
10 20 Sathour 14 day
10
10
0 0 4 httpop/s 2 0 0
10 20 Sunhour 15 day
0 0
10 hour
20
0 0
10 hour
20
10 hour
20
49
Summary
Basics { O -line vs. on-line change-point detection { Likelihood functions, ratios, log likelihood functions { Neyman-Pearson lemma O -line change point detection { Fisher algorithm On-Line change-point detection { Shewhart { Geometric moving average { CUSUM
50
References
Michele Basseville and Igor V. Nikiforov,
Detection of Abrupt Changes: Theory and Application, Prentice Hall, 1992.

W.D. Fisher, "On Grouping for Maximum Homogeneity, Journal of the American Statistical Association, 53:789-798, December, 1958. Richard A. Johnson and Dean W. Wichern, Applied Multivariate Statistical Analysis, Prentice Hall, 1987.

Sigt 98

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Sigt 98

Загружено:

Авторское право:

Доступные форматы

1

An Introduction to Change-Point Detection

Hellerstein and Zhang

Background and Motivations

Hellerstein and Zhang

Hellerstein and Zhang

Web Server Data

Opkts 10 20 time (hr)

200 0 0 10 20 5 x 10 time (hr) 4 tcpOut/s

0 0 10 20 5 x 10 time (hr) 2 tcpIn/s

Hellerstein and Zhang

Types of Change-Point Detection

Hellerstein and Zhang

Hellerstein and Zhang

Hellerstein and Zhang

Components of a Statistical Test ( )

d(T ) 2 f0 1g: A decision function that de-

Hellerstein and Zhang

Examples of Test Components

L H 1 if T < TC or T > TC d(T ) = 0 otherwise Mixed test: d(T ) 2 0 1]

8 > > < > > :

Hellerstein and Zhang

Hellerstein and Zhang

Hellerstein and Zhang

Hellerstein and Zhang

Hellerstein and Zhang

Can always minimize or separately by having a deterministic outcome to the test

Hellerstein and Zhang

Hellerstein and Zhang

Likelihood Function For Correct

L(y1,...,yN) is approximately 1/10**21

Hellerstein and Zhang

Put on alternate projector

Hellerstein and Zhang

Likelihood Function For Incorrect

L(y1,...,yN) is approximately 1/10**72

Hellerstein and Zhang

Hellerstein and Zhang

Put on second projector

Hellerstein and Zhang

Observations About Likelihood Ratios

Hellerstein and Zhang

Hellerstein and Zhang

Most Powerful Test

Hellerstein and Zhang

Illustrate intuition from the critical region gure

Hellerstein and Zhang

Hellerstein and Zhang

y[1..5]= .48 Asq[1..5]= .95

y(6..14]= 2.05 Asq[6..14]=1.72

y(15..19]= 1.09 Asq[15..19]=1.31

y(20..30]= 3.83 Asq[20..30]=5.20

Hellerstein and Zhang

Put on alternate projector

Hellerstein and Zhang

An Approach to O -Line Change Point Detection

Approach is due to W.D. Fisher

partitioning { P = (P1 Pk ), { 1 P1 < P2 < < Pk

Hellerstein and Zhang

Mean of a range of observations + y m::n] = ym + m + 1yn n;

P is an optimal k-partitioning if there is no k-partitioning P0 such that DP0 < DP

Hellerstein and Zhang