Вы находитесь на странице: 1из 50

1

An Introduction to Change-Point Detection


Joseph L. Hellerstein T.J. Watson Research Center IBM Research Hawthorne, New York Fan Zhang Department of Industrial Engineering and Operations Research Columbia University New York, New York June, 1998

Hellerstein and Zhang

Background and Motivations

Most analysis and control assumes stationary stochastic processes{no change in { Mean { Variance { Covariances Bad things can happen to good processes { A router can fail in a network { A conveyor belt can stop on an assembly line { A bank can fail in an economy Need to determine when process parameters have changed in order to { Correct the process { Change control parameters

Hellerstein and Zhang

Mainframe Data

Hellerstein and Zhang

Web Server Data


40 40 20 pdb 10 20 time (hr) sys 20

40 usr 20 0 0 20 mdb

10 20 time (hr)

0 0 400 Ipkt/s

0 0 400

10 20 time (hr)

10

200

Opkts 10 20 time (hr)

200 0 0 10 20 5 x 10 time (hr) 4 tcpOut/s

0 0 20 Coll%

10 20 time (hr)

0 0 10 20 5 x 10 time (hr) 2 tcpIn/s

10

0 0

10 20 time (hr)

0 0

0 0

10 20 time (hr)

Hellerstein and Zhang

Types of Change-Point Detection


O -line { Data are presented en-mass { Identify stationary intervals On-line { Data are presented serially { Detect when the parameters of the process change

Hellerstein and Zhang

Outline

Hypothesis testing and statistical background O -line tests Theory for on-line tests On-line tests Practical considerations References

Hellerstein and Zhang

Hypothesis Testing

Test assertions about parameters of a process (e.g., mean, variance, covariance) H0 (Null hypothesis): normal situation (e.g., mean response time is 1 second) H1 (Alternate hypothesis): abnormal situation (e.g., mean response time is 3 seconds)

Hellerstein and Zhang

Components of a Statistical Test ( )


T : A test statistic that is computed from

the data

T (y) = f (y1

d(T ) 2 f0 1g: A decision function that de-

yN )

termines if the test statistic is within an acceptable range { 0: Okay { 1: raise an alarm Observation: d classi es values of y

Hellerstein and Zhang

Examples of Test Components

Test statistics T (y) = y T (y) = Pi(yi ; y)2 Decision functions: Use critical value (upper or lower 8 limit) > > 1 if T < T L < C d(T ) = > > 0 otherwise :
d(T )
8 > > < > > :

L H 1 if T < TC or T > TC d(T ) = 0 otherwise Mixed test: d(T ) 2 0 1]

8 > > < > > :

H 1 if T > TC 0 otherwise

Hellerstein and Zhang

10

Outcomes of Tests

Raise an alarm if d(T ) = 1 No Alarm Alarm H0 is true OK false positive H1 is true false negative OK

Hellerstein and Zhang

11

Critical Regions

Set of y values for which H0 is rejected { Denoted by C P ( false positive ) = = P (y 2 C j H0) P ( false negative ) = = P (y 6 2C j H1)

Hellerstein and Zhang

12

Critical Regions

Hellerstein and Zhang

13

Test Design

Objective: select test that minimizes subject to the constraint that is not too large. Power of a test provides a succinct way of expressing this objective ( ) = P ( rejects H0 j ) Note that 8 > > if 2 H0 > > > ( ) = < 1 ; if 2 H1 > > >
> > :

Ideal test

0 if ( ) = 1 if

8 > > > > > < > > > > > :

2 H0 2 H1

Hellerstein and Zhang

14

Notes

Can always minimize or separately by having a deterministic outcome to the test

Hellerstein and Zhang

15

Likelihood Function

Example: Normal distribution with = ( 2) 0 1 (y ; )2 ) B 1 C L (y) = B p C exp(; @ A 22 2 where H0 is speci ed in terms of and . If observations are i.i.d. L (y1 yN ) = L (y1) L (yN ) For the normal, this means 0 1 1N exp(; N (yi ; )2 ) X B C L (y1 yN ) = B p C @ A 22 2
i=1

Transformation of the data { Used in test statistics Indicates the probability (or density) of the data if 8the distribution is known > > P (y j ) if y is discrete L (y) = < > > f (y j ) if y is continuous :

Hellerstein and Zhang

16

Likelihood Function For Correct


N(0,1) Likelihood Values for N(0,1) RV

L(y1,...,yN) is approximately 1/10**21

Hellerstein and Zhang

17

Notes

Put on alternate projector

Hellerstein and Zhang

18

Likelihood Function For Incorrect


N(0,1) Likelihood Values for N(3,1) RV

L(y1,...,yN) is approximately 1/10**72

Hellerstein and Zhang

19

Likelihood Ratio

Indicates the relative probability (or density) of obtaining the data L 1 (yi) L 0 (yi) Often use the log of the likelihood ratio L 1 (yi) si = ln L (y ) 0 i Example: N ( 0 2) and N ( 1 2) 0 1 ; 0 @y ; 0 + 1 A si = 1 2 i 2 v2 (yi ; 0) ; v22 = b yi ; 0 ; v !2 = 2 v = 1 ; 0 is the change in magnitude b = 1; 0 is the signal to noise ratio

Hellerstein and Zhang

20

Notes

Put on second projector

Hellerstein and Zhang

21

Observations About Likelihood Ratios

Consider Gaussian yi si is Gaussian since a linear combination of Gaussians is Gaussian If 2 H0, then E (si) < 0 If 2 H1, then E (si) > 0 If H0 H1, then E (si) 0

Hellerstein and Zhang

22

Notes

E (si) < 0 follows from E0( v2 (yi ; 0) ; v22 ) = ; v22 ) 2 2 E (si) > 0 follows from E1( v2 (yi ; 0) ; v22 ) = v22 )
2 2

Hellerstein and Zhang

23

Most Powerful Test


and
1

H1

Given

corresponding to H0 and

De nition: is a most powerful test i For all such that ( 0) ( 0) Then ( 1) ( 1) Intuition for constructing : Place rst into the critical region those y that have: { the lowest probability under H0 { the highest probability under H1 Neyman-Pearson Lemma { is a most powerful test if it is constructed as follows L 1 (y) y 2 C i L (y) > h
0

Hellerstein and Zhang

24

Notes

Illustrate intuition from the critical region gure

Hellerstein and Zhang

25

O -Line Tests

View as constrained clustering Want homogeneous clusters Choose change points such that { Variance within a cluster is smaller than variance between Assumes that only the mean changes

Hellerstein and Zhang

26

Example of Partitioning
A 3-Partitioning

y[1..5]= .48 Asq[1..5]= .95

y(6..14]= 2.05 Asq[6..14]=1.72

6 5 4 3 2 1 0 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14

y(15..19]= 1.09 Asq[15..19]=1.31

y(20..30]= 3.83 Asq[20..30]=5.20

15 16

17 18

19 20

21 22

23 24

25 26

27 28

29 30

Hellerstein and Zhang

27

Notes

Put on alternate projector

Hellerstein and Zhang

28

An Approach to O -Line Change Point Detection


Perspective: Locating change-points is equivalent to nding the optimal way to partition time-serial data { Homogeneous within a partition { Heterogeneous between partitions A range of indices is indicated by m::n] for 1 m < n N Detecting k change-points results in a k-

Approach is due to W.D. Fisher

partitioning { P = (P1 Pk ), { 1 P1 < P2 < < Pk

Hellerstein and Zhang

29

De nitions

Mean of a range of observations + y m::n] = ym + m + 1yn n;

Adjusted sum of squares (degree of homogeneity) within a partition n (y ; y m::n])2 X ASQ m::n] = j Figure of merit for the change points identi ed is k;1 DP = ASQ Pk ::N ] + X ASQ Pj ::Pj +1 ; 1]
j =1 j =m

P is an optimal k-partitioning if there is no k-partitioning P0 such that DP0 < DP

Hellerstein and Zhang

30

Observations

The computational complexity of nding 1 0 BN C B C an optimal k-partitioning is B k C B C @ A If P is an N-partitioning, then DP = 0 Want a k-partitioning with { k large enough to nd the change points { k small enough so that non-change points are avoided
DP 0

Hellerstein and Zhang

31

Fisher Algorithm for Change-Point Detection ChangePoints( rst, last, CPList) Compute Q = (1 Q2), the optimal 1-partitioning
Compute T where
ASQ first::last] T = A first::Q ; 1] + A Q ::last] 2 SQ SQ 2

If T exceeds a critical value Add Q2 to CPList ChangePoints( rst, Q2 ; 1, CPList) ChangePoints(Q2, last, CPList) Return

Hellerstein and Zhang

32

Results of Applying Fisher Algorithm to Mainframe Data

Hellerstein and Zhang

33

On-Line Change-Point Detection


Introduction Shewhart Test { Average run length Geometric moving average test CUSUM test

Hellerstein and Zhang

34

Introduction to On-Line Change-Point Detection

Data are presented serially Raise an alarm if a change is detected { ta is the time of the alarm Identify when the change occurred { t0 is the time of the change-point, with
t0 ta

Hellerstein and Zhang

35

Illustration of Concepts in On-Line Change-Point Detection


Ilustration of On-Line Detection
6 5 4 3 2 1 0 -1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Alarm Delay Change Point Alarm time

17 18

19 20

21 22

23 24

25 26

27 28

29 30

t0

ta

Hellerstein and Zhang

36

Formalization for On-Line Tests { p be the actual distribution (density)


function { p 0 be the distribution (density) under

Let

H0 { p 1 be the distribution (density) under H1 Consider yk H0: p(yk j yk;1 y1) = p 0 (yk j yk;1 y1 ) H1: There is a time t0 such that { for 1 i t0 ; 1: p(yi j yi;1 y1) = p 0 (yi j yi;1 y1) { for t0 i k: p(yi j yi;1 yt0 ) = p 1 (yi j yi;1 yt0 ) The alarm time ta is the smallest k such that H1 is chosen over H0

Hellerstein and Zhang

37

Test Statistic for On-Line Change-Points


j =i

Sum of log likelihood ratios k k = X sj Si Consider yi is N ( 2) = 0 under H0 = 1 under H1 m= k;i Then k = b Pk yj ; 0 ; v ! Si j =i 2 k is N (; mv22 mb2) H0: Si 2 k is N ( mv22 mb2) H1: Si 2

Hellerstein and Zhang

38

Notes

Put on alternate projector

Hellerstein and Zhang

39

Shewhart Algorithm for On-Line Change Point Detection


Operation { Take samples of xed batch size N { Make a decision independently for each batch Do for k = 1 to 1 { Obtain the next N samples { If S(kN N +1 > h k;1) Raise an alarm exit Note { Granularity of detection is determined by N

Hellerstein and Zhang

40

How is h Determined?

Ideally want { Short alarm delay if there is a changepoint { Long time until an alarm if there is no change-point Criteria: Average Run Length (ARL) { Average number of observations until there is an alarm ARL is related to but di erent from the power of an o -line test

Hellerstein and Zhang

41

Computing Average Run Length for Shewhart Test


N = P (S1 > h j H0) 0 P (ta = kN j H0) = (1 ; 0)(k;1) 0 ARL0: Time until an alarm if there is no change point E (ARL0) = N= 0 ARL1: Alarm delay (time from changepoint until alarm is raised) N E (ARL1) = N=P (S1 > h j H1) Choose h based on a desired ARL0

Hellerstein and Zhang

42

Geometric Moving Average Control Charts


Motivation { Give recent observations more weight Consider 0 < 1 gk = (1 ; )(gk;1) + (sk ) = P1 (1 ; )nsk;n n=0 Decision for an alarm ta = minfk j gk hg Obtaining h: Observe that under H0 v22 b2 ) gk is N (; 2 2;

Hellerstein and Zhang

43

Notes

Relate to Shewhart { Geometric weighting ensures that the mean of gk is the same as that for sk. { However, the variance is di erent.

Hellerstein and Zhang

44

CUSUM: Cumulative Sum Control Charts


Motivation { Sik has a negative drift under H0 { As a result, ARL1 may be longer than necessary Strategy { Adjust Sik so that it does not become too small k gk = S1 ; mk j mk = minfS1 j 1 j kg Approach ta = minfk j Sk mk + hg

Hellerstein and Zhang

45

Illustration of the Three On-Line Algorithms

Hellerstein and Zhang

46

Unknown Probability Distributions


Situation { Can estimate 0 from historical data { 1 is unknown Generalized likelihood ratio Uik = sup 1 fSik ( 1)g For Gaussian distributions and changes in the mean 1 0 k = sup 1 @ 1;2 0 A Pk=i(yj ; 1+ 0 ) Ui j 2 This can be solved explicitly Use Uik instead of Sik

Hellerstein and Zhang

47

Handling Non-Stationary Data

Suppose that the data vary with time of day or day of month Question: How do we separate normal variability from abnormal variability? Answer: Model the normal variability

Hellerstein and Zhang

48

Normal Variability in Web Server Data


Mon day 9 Tue day 10 20 httpop/s httpop/s 20 Wed day 11 10 10

20 httpop/s

10

0 0 20 httpop/s

10 20 Thuhour 12 day httpop/s

0 0 20

10 20 Fri hour 13 day httpop/s

0 0 4

10 20 Sathour 14 day

10

10

0 0 4 httpop/s 2 0 0

10 20 Sunhour 15 day

0 0

10 hour

20

0 0

10 hour

20

10 hour

20

Hellerstein and Zhang

49

Summary

Basics { O -line vs. on-line change-point detection { Likelihood functions, ratios, log likelihood functions { Neyman-Pearson lemma O -line change point detection { Fisher algorithm On-Line change-point detection { Shewhart { Geometric moving average { CUSUM

Hellerstein and Zhang

50

References

Michele Basseville and Igor V. Nikiforov,

Detection of Abrupt Changes: Theory and Application, Prentice Hall, 1992.


W.D. Fisher, "On Grouping for Maximum Homogeneity, Journal of the American Statistical Association, 53:789-798, December, 1958. Richard A. Johnson and Dean W. Wichern, Applied Multivariate Statistical Analysis, Prentice Hall, 1987.

Вам также может понравиться