Академический Документы
Профессиональный Документы
Культура Документы
Confiabilidade de Bootstrapping
Sistemas
Introdução
(a)
0_
"/ n
"
Theory
!
NORMAL POPULATION Sampling distribution
unknown mean !
(b)
Resample of size n
x–
Resample of size n
One SRS of size n x–
Resample of size n
x–
· ·
· ·
· ·
POPULATION
unknown mean ! Bootstrap distribution
(c)
FIGURE 14.4 (a) The idea of the sampling distribution of the sample mean x: take very
many samples, collect the x-values from each, and look at the distribution of these values.
(b) The theory shortcut: if we know that the population values follow a normal distribution,
theory tells us that the sampling distribution of x is also normal. (c) The bootstrap idea: when
theory fails and we can afford only one sample, that sample stands in for the population, and
the distribution of x in many resamples stands in for the sampling distribution.
Procedimento
Dados:
• população: x;
• n amostras iid: Xi , i = 1, . . . , n;
• o estimador de interesse: ⇥ = T (x);
ˆ = T (X).
• uma estatı́stica definida para a amostra: ⇥
• Para cada k de 1 até N :
– extraia uma amostra de tamanho n de X, com repetição, para en-
contrar Xk⇤ ;
ˆ ⇤ = T (X ⇤ );
– encontre ⇥ k k
5 1,26E+02
10 9,24E+04
20 6,89E+10
30 5,913E+16
40 5,38E+22
50 5,04E+28
❖ Definição de N:
❖ Simulação de Monte Carlo:
❖ combinações aleatórias.
❖ Forma: a forma da distribuição obtida no bootstrapping
se aproxima da forma da distribuição da estatística
considerada.
❖ Tendência central: a estimativa tende a ser polarizada
caso a distribuição da amostra não esteja centrada no
valor real do parâmetro.
❖ Dispersão: o erro padrão da estatística é o desvio padrão
da distribuição do bootstrapping.
❖ O bootstrapping não tem a capacidade de gerar dados!
O método estima como a grandeza amostral varia tendo
em conta as n amostras disponíveis.
❖ Similar a abordagem média / erro padrão.
❖ Não depende de normalidade ou do TLC.
Premissas
❖ Simplicidade;
❖ Aplicação à casos complexos;
❖ Independência de distribuição.
Principais Desvantagens
Exemplos
drawn. So resamples from this sample represent what we would get
if we took many samples from the population. The bootstrap distribu-
tion of a statistic, based on many resamples, represents the sampling
distribution of the statistic, based on many samples.
1.57 0.22 19.67 0.00 0.22 3.12 0.00 2.20 2.20 2.20 19.67 1.57 0.22 3.12 1.57 3.12 2.20 0.22
mean = 4.13 mean = 4.64 mean = 1.74
FIGURE 14.2 The resampling idea. The top box is a sample of size n = 6 from the Verizon
data. The three lower boxes are three resamples from this original sample. Some values from
the original are repeated in the resamples because each resample is formed by sampling with
replacement. We calculate the statistic of interest—the sample mean in this example—for the
original sample and each resample.
Population distribution Sampling distribution
Population mean = µ
Sample mean = x–
–3 0 µ 3 6 0 µ 3
0 x– 3 0 x– 3 0 x– 3
0 x– 3 0 x– 3 0 x– 3
0 x– 3 0 x– 3 0 x– 3
0 x– 3 0 x– 3 0 x– 3
0 x– 3 0 x– 3 0 x– 3
FIGURE 14.12 Five random samples (n = 50) from the same population, with a bootstrap
distribution for the sample mean formed by resampling from each of the five samples. At the
right are five more bootstrap distributions from the first sample.
14-28
❖ Distribuição uniforme [0.00;1.00];
❖ 1.000 iterações no bootstrapping.
❖ 10 amostras:
❖ 30 amostras:
❖ 100 amostras:
Intervalos de Confiança
❖ Percentil:
Intervalos de confiança de (1 ↵) · 100%:
h i
• F̂b 1 (↵/2) ; F̂b 1 (1 ↵/2)
h i
1
• 1 ; F̂b (1 ↵)
h i
1
• F̂b (↵) ; +1
• [ 1 ; xb + t↵,n 1 · sb ]
• [xb t↵,n 1 · sb ; +1]
• [Q(↵) ; +1]
µ0
Lb Ub
µ0
❖ Duas amostras:
⇢
H0 : µ1 = µ2
H1 6 µ2
: µ1 =
⇢
H0 : µ1 µ2 = 0
H1 : µ1 µ2 6= 0
Lb 0 Ub
Lb Ub
Lb Ub
❖ Várias amostras:
⇢
H0 : µi = µj 8i, j 2 1, . . . , n
H1 : 9i, j 2 1, . . . , n | µi 6= µj
⇢ ⇢ ⇢
H01 : µ1 = µ2 H02 : µ1 = µ3 H0m : µn 1 = µn
...
H11 : µ1 6= µ2 H12 : µ1 6= µ3 H1m : µn 1 6= µn
❖ Várias amostras:
❖ Comparações múltiplas:
❖ Ajuste da significância para construção dos
intervalos.
❖ Métodos de correção: Bonferroni, Holm-Bonferroni,
Sidák, Dunnett, Tukey-Kramer, Nemenyi,
Bonferroni-Dunn, Scheffe, etc.
Casos Críticos
–3 ! 3 –3 ! 3
_ _ _
–3 x 3 –3 x 3 –3 x 3
_ _ _
–3 x 3 –3 x 3 –3 x 3
_ _ _
–3 x 3 –3 x 3 –3 x 3
_ _ _
–3 x 3 –3 x 3 –3 x 3
_ _ _
–3 x 3 –3 x 3 –3 x 3
FIGURE 14.13 Five random samples (n = 9) from the same population, with a bootstrap
distribution for the sample mean formed by resampling from each of the five samples. At the
right are five more bootstrap distributions from the first sample.
14-30
Population Sampling
distribution distribution
Population median = M
Sample median = m
–4 M 10 –4 M 10
–4 m 10 –4 m 10 –4 m 10
–4 m 10 –4 m 10 –4 m 10
–4 m 10 –4 m 10 –4 m 10
–4 m 10 –4 m 10 –4 m 10
–4 m 10 –4 m 10 –4 m 10
FIGURE 14.14 Five random samples (n = 15) from the same population, with a bootstrap
distribution for the sample median formed by resampling from each of the five samples. At
the right are five more bootstrap distributions from the first sample.
14-32
❖ Smoothed Bootstrapping
❖ Adiciona-se um ruído gaussiano de baixa magnitude
a cada observação reamostrada.
2
ruı́do: N (0, )
1
=p
n
Estudo de Caso - Comparação de Algoritmos
Abstract—This paper presents a statistical based comparison ing with specific problems or classes of problems usually
methodology for performing evolutionary algorithm comparison involves some kind of tradeoff between the computational
under multiple merit criteria. The analysis of each criterion
effort associated with algorithm execution and the solution
is based on the progressive construction of a ranking of the
algorithms under analysis, with the determination of signifi- quality. In the case of deterministic algorithms, such a compar-
cance levels for each ranking step. The multicriteria analysis ison is performed on the basis of algorithm results which are
is based on the aggregation of the different criteria rankings deterministic for each given problem instance. It is guaranteed
via a non-dominance analysis which indicates the algorithms under some assumptions that, starting from a given initial
which constitute the efficient set. In order to avoid correlation
point, these algorithms always perform the same sequence of
effects, a principal component analysis pre-processing is per-
formed. Bootstrapping techniques allow the evaluation of merit deterministic steps, and the algorithm converges (i.e., reaches
criteria data with arbitrary probability distribution functions. a stop criterion) in a fixed number of algorithm iterations [1].
The algorithm ranking in each criterion is built progressively, As a consequence, such algorithms are often evaluated on
using either ANOVA or first order stochastic dominance. The the basis of single-run results performed on sets of different
resulting ranking is checked using a permutation test which
problem instances.
detects possible inconsistencies in the ranking—leading to the
execution of more algorithm runs which refine the ranking The performance evaluation of non-deterministic algo-
confidence. As a by-product, the permutation test also delivers rithms, such as evolutionary algorithms, cannot be performed
p-values for the ordering between each two algorithms which using such a kind of procedure. The stochastic nature of
have adjacent rank positions. A comparison of the proposed these methods introduces some random variability in the
method with other methodologies has been performed using
answer provided by the algorithm: the solution obtained by
reference probability distribution functions (PDFs). The proposed
methodology has always reached the correct ranking with less the algorithm can vary considerably from one run to another,
samples and, in the case of non-Gaussian PDFs, the proposed and even when the same solution is reached, the computational
methodology has worked well, while the other methods have time required for achieving such a solution is usually different
not been able even to detect some PDF differences. The ap- for different runs of the same algorithm [1].
plication of the proposed method is illustrated in benchmark
The flexible structure of the evolutionary algorithms makes
problems.
it possible to build them in several different ways. Each
Index Terms—Algorithm evaluation, evolutionary algorithms, operator variation inside an algorithm leads to a different
multicriteria statistical comparison.
algorithm version with its own associated performance. This
combinatorial scenario of possible algorithms which are orig-
I. Introduction inated from variations of several operators leads to the need
of methods for the evaluation of such groups of algorithms,
T HE COMPARISON between optimization algorithms
which constitute alternative candidate methods for deal-
allowing well-founded choices of one or few ones that should
be considered for being applied. This motivates the devel-
Manuscript received September 6, 2009; revised May 3, 2010 and January opment of methods for evolutionary algorithm performance
7, 2010; accepted July 16, 2010. Date of publication January 10, 2011; date of
❖ Compara K algoritmos evolucionários em um problema
considerando C critérios de qualidade (fatores).
❖ Oferece como saída um ranking dos métodos e os p-
valores associados a este ranking.
❖ Permite comparações de algoritmos “a posteriori” ou
iterativamente.
❖ Repita:
A4 A2 A5 A1 A3 .
his ranking, it is possible to see that algorithm A
sion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Gaussian1
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
ribution Functions
ters
A4 A5
0 2.00 2.50
0 1.00 1.00
0 2.00 2.50
0 1.00 1.50
5 0.50 0.40
0 0.95 0.50
0 0.40 0.50
Ranking
epeatability
—
—
400, and 500
400, and 500
Gaussian2
GARRANO et al.: A MULTICRITERIA STATISTICAL BASED COMPARISON METHODOLOGY FOR EVALUATING
TABLE IV
Results for Gaussian2 Reference Problem Means and Standard D
TABLE V
nce Problem Means and Standard Deviations for Beta Reference Problem
Ranking A1 A2 A3 A4 A5
peatability µ 0.56 0.45 0.65 0.35 0.45
Beta