Вы находитесь на странице: 1из 8

Checking data for outliers

2. 3σ edit rule and Hampel’s method

7th Seminar on statistics


in seed testing
Gregoire, Laffont, Remund
Overview
• Consider this set of real time PCR results (%):
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0.1 0.2 0.3 0.4 0.5 0.6

Alarm from the boxplot

• Can we detect this exotic value with an automatic method?

ISTA Statistics Committee 2


3σ edit rule

If x ~ N(μ, σ2)

P(x)

3σ 3σ
x

then P(|x-μ|>3σ) ≈ 0.0027

Æ The probability that an observed value is outside


the range [μ-3σ ; μ+3σ] is very small

Æ 3σ edit rule
ISTA Statistics Committee 3
3σ edit rule
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1. Estimate the mean x and the standard-deviation s:


x = 0.1486 s = 0.1251
xi − x
2. For each value xi in the dataset, compute: zi =
s
zi -0.2341 -0.3581 -0.4500 -0.2501 0.0065 -0.2054 -0.2885 -0.3397 -0.4364 -0.2181 -0.1374 -0.2677 -0.2613 -0.1502 3.5905

3. Identify xi as an outlier if |zi| > 3

ISTA Statistics Committee 4


3σ edit rule

Problem: the mean and the standard-deviation are sensitive


to outliers
• The mean moves towards outliers
• The standard-deviation is inflated

Æ Hampel’s method preferred

ISTA Statistics Committee 5


Hampel’s method

~
• An outlier-resistant alternative to the mean: the median x

• An outlier-resistant alternative to the standard-deviation: the MAD


MAD = {
median x i − ~
x }
• Hampel’s method: identify xi as an outlier if x i − ~
x > 5.2 MAD

ISTA Statistics Committee 6


Hampel’s method
xi 0.1193 0.1038 0.0923 0.1173 0.1494 0.1229 0.1125 0.1061 0.0940 0.1213 0.1314 0.1151 0.1159 0.1298 0.5977
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1. Compute the median x and the MAD:


x = 0.1173 MAD = 0.0112

2. For each value xi in the dataset, compute: d i = x i − ~


x
di 0.0020 0.0135 0.0250 0.0000 0.0321 0.0056 0.0048 0.0112 0.0233 0.0040 0.0141 0.0022 0.0014 0.0125 0.4804

3. Identify xi as an outlier if di is greater than


5.2 MAD = 0.05824.

ISTA Statistics Committee 7


Exercise

Using the 3σ edit rule and the Hampel’s method, do you


identify any outliers in the following dataset?

xi
0.2954
0.1436
0.1075
0.1731
0.1771
0.1259

ISTA Statistics Committee 8

Вам также может понравиться