0 Голоса «за»0 Голоса «против»

Просмотров: 846 стр.Oct 09, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT или читайте онлайн в Scribd

Attribution Non-Commercial (BY-NC)

Просмотров: 8

Attribution Non-Commercial (BY-NC)

- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- Girl, Wash Your Face: Stop Believing the Lies About Who You Are so You Can Become Who You Were Meant to Be
- How To Win Friends and Influence People
- Sapiens: A Brief History of Humankind
- Sapiens: A Brief History of Humankind
- The Alchemist
- The Alchemist
- Good Omens
- Good Omens: The Nice and Accurate Prophecies of Agnes Nutter, Witch
- The Rosie Project: A Novel
- Never Split the Difference: Negotiating As If Your Life Depended On It
- Proof of Heaven: A Neurosurgeon's Journey into the Afterlife
- The Secret
- America's First Daughter: A Novel
- Influence: The Psychology of Persuasion
- The Purpose Driven Life: What on Earth Am I Here For?
- Elon Musk: Tesla, Spacex, and the Quest for a Fantastic Future

Вы находитесь на странице: 1из 46

At one level of analysis at least, statisticians and philosophers

of science ask many of the same questions:

- What should be observed and what may justifiably be

inferred from the resulting data?

- How well do data confirm or fit a model?

- What is a good test?

- Must predictions be novel in some sense? (selection

effects, double counting, data mining)

- How can spurious relationships be distinguished from

genuine regularities? from causal regularities?

- How can we infer more accurate and reliable

observations from less accurate ones?

- When does a fitted model account for regularities in the

data?

That these very general questions are entwined with long

standing debates in philosophy of science helps to explain

why the field of statistics tends to cross over so often into

philosophical territory.

2

Statistics Philosophy

3 ways statistical accounts are used in philosophy of

science

(1) Model Scientific Inference to capture either the

actual or rational ways to arrive at evidence and inference

(2) Resolve Philosophical Problems about scientific

inference, observation, experiment;

(problem of induction, objectivity of observation,

reliable evidence, Duhem's problem,

underdetermination).

(3) Perform a Metamethodological Critique

-scrutinize methodological rules, e.g., accord special

weight to "novel" facts, avoid ad hoc hypotheses, avoid

"data mining", require randomization.

Philosophy Statistics

Central job to help resolve the conceptual, logical, and

methodological discomforts of scientists as to: how to

make reliable inferences despite uncertainties and errors?

Philosophy of statistics and the goal of a philosophy of

science relevant for philosophical problems in scientific

practice

3

Fresh methodological problems arise in practice

surrounding a panoply of methods and models relied on

to learn from incomplete, and often non-experimental,

data.

Examples abound:

Disputes over hypothesis-testing in psychology (e.g., the

recently proposed significance test ban);

Disputes over the proper uses of regression in applied

statistics;

Disputes over dose-response curves in estimating risks;

Disputes about the use of computer simulations in

observational sciences;

Disputes about external validity in experimental

economics; and,

Across the huge landscape of fields using the latest, high-

powered, computer methods, there are disputes about

data-mining, algorithmic searches, and model validation.

Equally important are the methodological

presuppositions that are not, but perhaps ought to be,

disputed, debated, or at least laid out in the open

often, ironically, in the very fields in which philosophers

of science immerse themselves.

4

I used to teach a course in this department: philosophy of

science and economic methodology

We read how many economic methodologists questioned

the value of philosophy of science

If philosophers and others within science theory cant

agree about the constitution of the scientific method (or

even whether asking about a scientific method makes

any sense), doesnt it seem a little dubious for

economists to continue blithely taking things off the shelf

and attempting to apply them to economics? (Hands,

2001, p. 6).

Deciding that it is, methodologists of economics

increasingly look to sociology of science, rhetoric,

evolutionary psychology.

The problem is not merely how this cuts philosophers of

science out of being engaged in methodological practice;

equally serious, is how it encourages practitioners to

assume there are no deep epistemological problems with

the ways they collect and base inferences on data.

5

Professional agreement on statistical philosophy is not

on the immediate horizon, but this should not stop us

from agreeing on methodology, as if what is correct

methodologically does not depend on what is correct

philosophically (Berger, 2003, p. 2).

In addition to the resurgence of the age-old

controversies significance test vs. confidence

intervals, frequentist vs. Bayesian measures, the

latest statistical modeling techniques have introduced

brand new methodological issues.

High-powered computer science packages offer a

welter of algorithms for automatically selecting among

this explosion of models, but as each boasts different,

and incompatible, selection criteria, we are thrown back

to the basic question of inductive inference: what is

required, to severely discriminate among well-fitting

models such that, when a claim (or hypotheses or model)

survives a test the resulting data count as good evidence

for the claims correctness or dependability or adequacy.

6

A romp through 4 "waves in philosophy of statistics"

History and philosophy of statistics is a huge territory

marked by 70 years of debates widely known for reaching

unusual heights both of passion and of technical

complexity.

Wave I ~ 1930 1955/60

Wave II~ 1955/60-1980

Wave III~1980-2005 & beyond

Wave IV ~ 2006 and beyond

7

A core question: What is the nature and role of

probabilistic concepts, methods, and models in making

inferences in the face of limited data, uncertainty and

error?

1. Two Roles For Probability:

Degrees of Confirmation and Degrees of Well-Testedness

a. To provide a post-data assignment of degree of

probability, confirmation, support or belief in a

hypothesis;

b. To assess the probativeness, reliability,

trustworthiness, or severity of a test or inference

procedure.

These two contrasting philosophies of the role of

probability in statistical inference are very much at the

heart of the central points of controversy in the three

waves of philosophy of statistics

8

Having conceded loss in the battle for justifying induction,

philosophers appeal to logic to capture scientific method

Inductive Logics Logic of falsification

Confirmation Theory

Rules to assign degrees of

probability or confirmation to

hypotheses given evidence e

Methodological falsification

Rules to decide when to

prefer or accept hypotheses

Carnap C(H,e)

Popper

Inductive Logicians

we can build and try to justify

inductive logics

straight rule: Assign degrees of

confirmation/credibility

Statistical affinity

Bayesian (and likelihoodist)

accounts

Deductive Testers

we can reject induction and

uphold the rationality of

preferring or accepting

H if it is well tested

Statistical affinity

Fisherian, Neyman-Pearson

methods: probability enters to

ensure reliability and severity of

tests with these methods.

9

I. Philosophy of Statistics: The First Wave

WAVE I: circa 1930-1955:

Fisher, Neyman, Pearson, Savage, and Jeffreys.

Statistical inference tools use data x

0

to probe aspects of the

data generating source:

In statistical testing, these aspects are in terms of statistical

hypotheses about parameters governing a statistical distribution

H tells us the probability of x under H, written P(x;H)

(probabilistic assignments under a model)

Important to avoid confusion with conditional probabilities in

Bayess theorem, P(x|H).

Testing model assumptions extremely important, though will

not discuss.

10

Modern Statistics Begins with Fisher:

Simple Significance Tests

Example. Let the sample be X = (X

1

, ,X

n

), be IID from a

Normal distribution (NIID) with o =1.

1. A null hypothesis H

0

: H

0

: = 0

e.g., 0 mean concentration of lead, no difference in mean

survival in a given group, in mean risk, mean deflection of

light.

2. A function of the sample, d(X), the test statistic: which

reflects the difference between the data x

0

= (x

1

, ,x

n

), and H

0

;

The larger d(x

0

) the further the outcome is from what is

expected under H

0

, with respect to the particular question being

asked.

3. The p-value is the probability of a difference larger than

d(x

0

), under the assumption that H

0

is true:

p(x

0

)=P(d(X) > d(x

0

); H

0

)

11

The observed significance level (p-value) with observed

X = .1

p(x

0

)=P(d(X) > d(x

0

); H

0

).

The relevant test statistic d(X) is:

d(X) = ( X -

0

)/o

x

,

where X is the sample mean with standard deviation o

x

=

(o/n).

0

Observed- Expected(under H )

( )

x

d

o

= X

Since

x

n

o

o = = 1/5 = .2, d(X) = .1 0 in units of o

x

yields

d(x

0

)=.1/.2 = .5

Under the null, d(X) is distributed as standard Normal,

denoted by d(X) ~ N(0,1).

(Area to the right of .5) ~.3, i.e. not very significant.

12

Logic of Simple Significance Tests: Statistical Modus

Tollens

Every experiment may be said to exist only in order to

give the facts a chance of disproving the null hypothesis

(Fisher, 1956, p.160).

Statistical analogy to the deductively valid pattern modus

tollens:

If the hypothesis H

0

is correct then, with high

probability, 1-p, the data would not be statistically

significant at level p.

x

0

is statistically significant at level p.

____________________________

Thus, x

0

is evidence against H

0

, or x

0

indicates the falsity of

H

0

.

Fisher described the significance test as a procedure

for rejecting the null hypothesis and inferring that the

phenomenon has been experimentally demonstrated

once one is able to generate at will a statistically

significant effect. (Fisher, 1935a, p. 14),

13

The Alternative or Non-Null Hypothesis

Evidence against H

0

seems to indicate evidence for

some alternative.

Fisherian significance tests strictly consider only the

H

0

Neyman and Pearson (N-P) tests introduce an

alternative H

1

(even if only to serve as a direction of

departure).

Example. X = (X

1

, ,X

n

), NIID with o =1:

H

0

: = 0 vs. H

1

: > 0

Despite the bitter disputes with Fisher that were to

erupt soon after ~1935, Neyman and Pearson, at first saw

their work as merely placing Fisherian tests on firmer

logical footing.

Much of Fishers hostility toward N-P methods

reflects professional and personality conflicts more than

philosophical differences.

14

Neyman-Pearson (N-P) Tests

N-P hypothesis test: maps each outcome x = (x

1

, ,x

n

)

into either the null hypothesis H

0

, or an alternative

hypothesis H

1

(where the two exhaust the parameter

space) to ensure the probabilities of erroneous rejections

(type I errors) and erroneous acceptances (type II errors)

are controlled at prespecified values, e.g., 0.05 or 0.01, the

significance level of the test.

Test T(o): X = (X

1

, ,X

n

), NIID with o =1,

H

0

: =

0

vs. H

1

: >

0

if d(x

0

) > c

o

, "reject" H

0

, (or declare the result

statistically significant at the o level);

if d(x

0

) < c

o

, "accept" H

0

,

e.g. c

o

=1.96 for o=.025, i.e.

Accept/Reject uninterpreted parts of the mathematical

apparatus.

Type I error probability = P(d(x

0

) > c

o

; H

0

) o.

The Type II error probability:

P(Test T(o) does not reject H

0

;

=

1

) =

= P(d(X) < c

o

; H

0

) = (

1

), for any

1

>

0

.

15

The "best" test at level o at the same time minimizes the

value of for all

1

>

0

, or equivalently, maximizes the

power:

POW(T(o);

1

)= P(d(X) > c

o

;

1

T(o) is a Uniformly Most Powerful (UMP) level o test

16

Inductive Behavior Philosophy

Philosophical issues and debates arise once one begins to

consider the interpretations of the formal apparatus

Accept/Reject are identified with deciding to take

specific actions, e.g., publishing a result, announcing a

new effect.

The justification for optimal tests is that

it may often be proved that if we behave according to

such a rule ... we shall reject H when it is true not more,

say, than once in a hundred times, and in addition we may

have evidence that we shall reject H sufficiently often

when it is false.

Neyman: Tests are not rules of inductive inference but rules of

behavior:

The goal is not to adjust our beliefs but rather to adjust our

behavior to limited amounts of data

Is he just drawing a stark contrast between N-P tests and

Fisherian as well as Bayesian methods? Or is the behavioral

interpretation essential to the tests?

17

Inductive behavior vs. Inductive inference

battle

commingles philosophical, statistical and personality

clashes.

Fisher (1955) denounced the way that Neyman and

Pearson transformed his significance tests into

acceptance procedures.

- Theyve turned my tests into mechanical rules or

recipes for deciding to accept or reject statistical

hypothesis H

0

,

- The concern has more to do with speeding up

production or making money than in learning about

phenomena

18

N-P followers are like:

Russians (who) are made familiar with the ideal

that research in pure science can and should be geared

to technological performance, in the comprehensive

organized effort of a five-year plan for the nation.

(1955, 70)

In the U.S. also the great importance of

organized technology has I think made it easy to

confuse the process appropriate for drawing correct

conclusions, with those aimed rather atspeeding

production, or saving money.

19

Pearson distanced himself from Neymans

inductive behavior jargon, calling it Professor

Neymans field rather than mine.

But the most impressive mathematical results were in

the decision-theoretic framework of Neyman-Pearson-

Wald.

Many of the qualifications by Neyman and Pearson

in the first wave are overlooked in the philosophy of

statistics literature.

Admittedly, these evidential practices were not

made explicit *. (Had they been, the subsequent waves of

philosophy of statistics might have looked very different).

*Mayos goal in ~ 1978

20

The Second Wave: ~1955/60 -1980

Post-data criticisms of N-P methods:

Ian Hacking (1965), framed the main lines of criticism by

philosophers Neyman-Pearson tests as suitable for before-trial

betting, but not for after-trial evaluation. (p. 99):

Battles: initial precision vs. final precision,

before-data vs. after data

After the data, he claimed, the relevant measure of support is

the (relative) likelihood

Two data sets x and y may afford the same "support"

to H, yet warrant different inferences [on

significance test reasoning] because x and y arose

from tests with different error probabilities.

o This is just what error statisticians want!

21

o But (at least early on) Hacking (1965) held to the

Law of Likelihood: x

0

support hypotheses H

1

more

than H

2

if,

P(x

0

;H

1

) > P(x

0

;H

2

).

Yet, as Barnard notes, there always is such a rival

hypothesis: That things just had to turn out the way they

actually did .

Since such a maximally likelihood alternative H

2

can

always be constructed, H

1

may always be found less well

supported, even if H

1

is trueno error control

Hacking soon rejected the likelihood approach on such

grounds, likelihoodist accounts are advocated by others.

22

Perhaps THE key issue of controversy in the

philosophy of statistics battles

The (strong) likelihood principle, likelihoods suffice to

convey all that the data have to say

According to Bayess theorem, P(x|) ... constitutes

the entire evidence of the experiment, that is, it tells all

that the experiment has to tell. More fully and more

precisely, if y is the datum of some other experiment, and

if it happens that P(x|) and P(y|) are proportional

functions of (that is, constant multiples of each other),

then each of the two data x and y have exactly the same

thing to say about the values of (Savage 1962, p. 17.)

the error probabilist needs to consider, in addition, the

sampling distribution of the likelihoods.

significance levels and other error probabilities all

violate the likelihood principle (Savage 1962).

23

Paradox of Optional Stopping

Instead of fixing the same size n in advance, in some tests, n is

determined by a stopping rule:

In Normal testing, 2-sided H

0

: = 0 vs. H

1

: 0

Keep sampling until H is rejected at the .05 level

(i.e., keep sampling until | X | > 1.96 o/

n

).

Nominal vs. Actual significance levels: with n fixed the type 1

error probability is .05.

With this stopping rule the actual significance level differs

from, and will be greater than .05.

By contrast, since likelihoods are unaffected by the stopping

rule, the LP follower denies there really is an evidential

difference between the two cases (i.e., n fixed and n determined

by the stopping rule).

Should it matter if I decided to toss the coin 100 times and

happened to get 60% heads, or if I decided to keep tossing until

I could reject at the .05 level (2-sided) and this happened to

occur on trial 100?

Should it matter if I kept going until I found statistical

significance?

Error statistical principles: Yes! penalty for perseverance!

The LP says NO!

24

Savage Forum 1959: Savage audaciously declares that

the lesson to draw from the optional stopping effect is that

optional stopping is no sin so the problem must lie with

the use of significance levels. But why accept the

likelihood principle (LP)? (simplicity and freedom?)

The likelihood principle emphasized in Bayesian statistics

implies, that the rules governing when data collection stops

are irrelevant to data interpretation. It is entirely appropriate to

collect data until a point has been proved or disproved (p.

193)This irrelevance of stopping rules to statistical inference

restores a simplicity and freedom to experimental design that

had been lost by classical emphasis on significance levels (in

the sense of Neyman and Pearson) (Edwards, Lindman, Savage

1963, p. 239).

For frequentists this only underscores the point raised years

before by Pearson and Neyman:

A likelihood ratio (LR) may be a criterion of relative fit

but it is still necessary to determine its sampling distribution

in order to control the error involved in rejecting a true

hypothesis, because a knowledge of [LR] alone is not adequate

to insure control of this error (Pearson and Neyman, 1930, p.

106).

25

The key difference: likelihood fixes the actual outcome,

i.e., just d(x), while error statistics considers outcomes other

than the one observed in order to assess the error properties

LP irrelevance of, and no control over, error

probabilities.

("why you cannot be just a little bit Bayesian" EGEK

1996)

Update: A famous argument (1962, Birnbaum)

purports to show that plausible error statistical principles

entails the LP!

"Radical!" "Breakthrough!" (since the LP entails the

irrelevance of error probabilities!

But the "proof" is flawed! (Mayo 2010 See blog).

26

The Statistical Significance Test Controversy

(Morrison and Henkel, 1970) contributors chastise social

scientists for slavish use of significance tests

o Focus on simple Fisherian significance tests

o Philosophers direct criticisms mostly to N-P tests.

Fallacies of Rejection: Statistical vs. Substantive Significance

(i) take statistical significance as evidence of

substantive theory that explains the effect

(ii) Infer a discrepancy from the null beyond what the test

warrants

(i) Paul Meehl: It is fallacious to go from a statistically

significant result, e.g., at the .001 level, to infer that ones

substantive theory T, which entails the [statistical] alternative

H

1

, has received .. quantitative support of magnitude around

.999

A statistically significant difference (e.g., in child rearing) is

not automatically evidence for a Freudian theory.

T is subjected to only a feeble risk, violating Popper.

27

Fallacies of rejection:

(i) Take statistical significance as evidence of

substantive theory that explains the effect

(ii) Infer a discrepancy from the null beyond what the

test warrants.

Finding a statistically significant effect, d(x

0

) > c

o

(cut-

off for rejection) need not be indicative of large or

meaningful effect sizes test too sensitive

Large n Problem: an o significant rejection of H

0

can be

very probable, even with a substantively trivial discrepancy

from H

0

can

This is often taken as a criticism because it is assumed that

statistical significance at a given level is more evidence

against the null the larger the sample size (n) fallacy!

"The thesis implicit in the [NP] approach [is] that a hypothesis

may be rejected with increasing confidence or reasonableness

as the power of the test increases (Howson and Urbach 1989

and later editions)

In fact, it is indicative of less of a discrepancy from the null

than if it resulted from a smaller sample size.

28

(analogy with smoke detector: an alarm from one that often

goes off from merely burnt toast (overly powerful or sensitive),

vs. alarm from one that rarely goes off unless the house is

ablaze)

Comes also in the form of the Jeffrey-Good-Lindley

paradox

Even a highly statistically significant result can, with n

sufficiently large, correspond to a high posterior probability to

a null hypothesis.

29

Fallacy of Non-Statistically Significant Results

Test T(o) fails to reject the null, when the test statistic

fails to reach the cut-off point for rejection, i.e., d(x

0

) c

o

.

A classic fallacy is to construe such a negative result as

evidence FOR the correctness of the null hypothesis (common

in risk assessment contexts).

No evidence against is not evidence for

Merely surviving the statistical test is too easy, occurs too

frequently, even when the null is false.

results from tests lacking sufficient sensitivity or

power.

The Power Analytic Movement of the 60s in psychology

Jacob Cohen: By considering ahead of time the Power of

the test, select a test capable of detecting discrepancies of

interest.

pre-data use of power (for planning).

30

A multitude of tables were supplied (Cohen, 1988), but

until his death he bemoaned their all-to-rare use.

(Power is a feature of N-P tests, but apparently the

prevalence of Fisherian tests in the social sciences, coupled,

perhaps, with the difficulty in calculating power, resulted in

ignoring power. There was also the fact that they were not able

to get decent power in psychology; they turned to meta-

analysis)

31

Post-data use of power to avoid fallacies of insensitive tests

If there's a low probability of a statistically significant

result, even if a non-trivial discrepancy o

non-trivial

is present (low

power against o

non-trivial)

) then a non-significant difference is not

good evidence that a non-trivial discrepancy is absent.

Still too course: power is always calculated relative to the cut-

off point c

o

for rejecting H

0

.

Consider test T(o= .025) , o = 1, n = 25, and let

o

non-trivial

= .2

No matter what the non-significant outcome, power to detect

o

non-trivial

is only .16!

So wed have to deny the data were good evidence that < .2

This suggested to me (in writing my dissertation around

1978) that rather than calculating

(1) P(d(X) > c

o

; =.2) Power

one should calculate

(2) P(d(X) > d(x

0

); =.2). observed power (severity)

Even if (1) is low, (2) may be high. We return to this in

the developments of Wave III.

32

III. The Third Wave: Relativism, Reformulations,

Reconciliations ~1980-2005

+

(skip) Rational Reconstruction and Relativism in

Philosophy of Science

Fighting Kuhnian battles to the very idea of a unified method of

scientific inference, statistical inference less prominent in

philosophy

largely used rational reconstructions of scientific episodes,

in appraising methodological rules,

in classic philosophical problems e.g., Duhems

problemreconstruct a given assignment of blame so as to

be warranted by Bayesian probability assignments.

no normative force.

The recognition that science involves subjective judgments and

values, reconstructions often appeal to a subjective Bayesian

account (Salmons Tom Kuhn Meets Tom Bayes).

(Kuhn thought this was confused: no reason to suppose an

algorithm remains through theory change)

Naturalisms, HPS

33

Wave III in Scientific Practice

Statisticians turn to eclecticism.

Non-statistician practitioners (e.g., in psychology,

ecology, medicine), bemoan unholy hybrids

a mixture of ideas from N-P methods, Fisherian tests, and

Bayesian accounts that is inconsistent from both perspectives

and burdened with conceptual confusion. (Gigerenzer, 1993,

p. 323).

- Faced with foundational questions, non statistician

practitioners raise anew the questions from the first and

second waves.

- Finding the automaticity and fallacies still rampant, most,

if they are not calling for an outright ban on significance

tests in research, insist on reforms and reformulations of

statistical tests.

Task Force to consider Test Ban in Psychology: 1990s

34

Reforms and Reinterpretations Within Error Probability

Statistics

Any adequate reformulation must:

(i) Show how to avoid classic fallacies (of acceptance

and of rejection) on principled grounds,

(ii) Show that it provides an account of inductive

inference

35

Avoiding Fallacies

To quickly note my own recommendation (for test T(a)):

Move away from coarse accept/reject rule; use specific result

(significant or insignificant) to infer those discrepancies from

the null that are well ruled-out, and those which are not.

e.g., Interpretation of Non-Significant results:

If d(x) is not statistically significant, and the

test had a very high probability of a more

statistically significant difference if >

0

+ ,

then d(x) is good grounds for inferring

0

+ .

Use specific outcome to infer an upper bound

* (values beyond are ruled out by given

severity.)

If d(x) is not statistically significant, but the test

had a very low probability chance of a more

statistically significant difference if >

0

+ ,

then d(x) is poor evidence for inferring

0

+

.

The test had too little probative power to have

detected such discrepancies even if they existed!

36

Takes us back to the post-data version of power:

Rather than construe a miss as good as a mile, parity of

logic suggests that the post-data power assessment should

replace the usual calculation of power against

1

:

POW(T(o),

1

) = P(d(X) > c

o

; =

1

),

with what might be called the power actually attained or, to

have a distinct term, the severity (SEV):

SEV(T(o),

1

) = P(d(X) > d(x

0

); =

1

),

where d(x

0

) is the observed (non-statistically significant)

result.

37

Figure 1 compares power and severity for different

outcomes

Figure 1. POW(T(.025),

1

=.2) =.168, irrespective of the value

of d(x

0

) ; solid curve, the severity evaluations are data-specific:

The severity for the inference: < .2

Both X = .39, andX = -.2, fail to reject H

0

, but

But with X = .39, SEV( < .2) is low (.17)

But with X = -.2, SEV( < .2) is high (.97)

38

Fallacies of Rejection: The Large n-Problem

While with a nonsignificant result, the concern is erroneously

inferring that a discrepancy from

0

is absent;

With a significant result x

0

, the concern is erroneously inferring

that it is present.

Utilizing the severity assessment an o-significant

difference with n

1

passes >

1

less severely than with n

2

where

n

1

> n

2

.

Figure 2 compares test T(o), with three different sample

sizes:

n = 25, n = 100, n = 400, denoted by T(o,n);

where in each case d(x

0

) = 1.96 reject at the cut-off

point.

In this way we solve the problems of tests too sensitive or not

sensitive enough, but theres one more thing ... showing how it

supplies an account of inductive inference

Many argue in wave III that error statistical methods cannot

supply an account of inductive inference because error

probabilities conflict with posterior probabilities.

39

Figure 2 compares test T(o), with three different sample sizes:

n =25, n =100, n =400, denoted by T(o,n);

in each case d(x

0

) = 1.96 reject at the cut-off point.

Figure 2. In test T(o), (H

0

: < 0 against H

1

: > 0, and o= 1),

o=.025, c

o

= 1.96 and d(x

0

) = 1.96.

The severity for the inference: > .1

n = 25, SEV( >.1) is .93

n = 100, SEV( >.1) is .83

n = 400, SEV( >.1) is .5

40

P-values vs. Bayesian Posteriors

A statistically significant difference from H

0

can correspond

to large posteriors in H

0

. From the Bayesian perspective, it

follows that p-values come up short as a measure of inductive

evidence,

- the significance testers balk at the recommended priors

resulting in highly significant results being construed as no

evidence against the null or even evidence for it!

The conflict often considers the two sided T(2o) test

H

0

: = 0 vs. H

1

: 0.

(The difference between p-values and posteriors are far

less marked with one-sided tests).

Assuming a prior of .5 to H

0

, with n = 50 one can classically

reject H

0

at significance level p = .05, although P(H

0

|x) = .52

(which would actually indicate that the evidence favors H

0

).

This is taken as a criticism of p-values, only because, it is

assumed the .51 posterior is the appropriate measure of the

beliefworthiness.

41

As the sample size increases, the conflict becomes

more noteworthy.

If n = 1000, a result statistically significant at the

.05 level leads to a posterior to the null of .82!

SEV (H

1

) = .95 while the corresponding posterior has gone

from .5 to .82. What warrants such a prior?

n (sample size)

______________________________________________________

p t n=10 n=20 n=50 n=100 n=1000

.10 1.645 .47 .56 .65 .72 .89

.05 1.960 .37 .42 .52 .60 .82

.01 2.576 .14 .16 .22 .27 .53

.001 3.291 .024 .026 .034 .045 .124

(1) Some claim the prior of .5 is a warranted frequentist

assignment:

H

0

was randomly selected from an urn in which 50% are

true

(*) Therefore P(H

0

) = p

42

H

0

may be 0 change in extinction rates, 0 lead

concentration, etc.

What should go in the urn of hypotheses?

For the frequentist: either H

0

is true or false the

probability in (*) is fallacious and results from an

unsound instantiation.

We are very interested in how false it might be, which is

what we can do by means of a severity assessment.

(2) Subjective degree of belief assignments will not ensure

the error probability, and thus the severity assessments we

need.

(3) Some suggest an impartial or uninformative Bayesian

prior gives .5 to H

0

, the remaining .5 probability being spread

out over the alternative parameter space, Jeffreys.

This spiked concentration of belief in the null is at odds with

the prevailing view we know all nulls are false.

The Bayesian recently co-opts 'error probability' to describe a

posterior, but it is not a frequentist error probability which is

measuring something very different.

43

Fisher: The Function of the p-Value Is Not Capable of

Finding Expression

Faced with conflicts between error probabilities and Bayesian

posterior probabilities, the error probabilist would conclude

that the flaw lies with the latter measure.

Fisher: Discussing a test of the hypothesis that the stars

are distributed at random, Fisher takes the low p-value (about 1

in 33,000) to exclude at a high level of significance any theory

involving a random distribution (Fisher, 1956, page 42).

Even if one were to imagine that H

0

had an extremely high

prior probability, Fisher continues never minding what

such a statement of probability a priori could possibly mean

the resulting high posteriori probability to H

0

, he thinks, would

only show that reluctance to accept a hypothesis strongly

contradicted by a test of significance (ibid, page 44) . . . is

not capable of finding expression in any calculation of

probability a posteriori (ibid, page 43).

44

Wave IV? 2006+ The Reference Bayesians Abandon

Coherence, the LP, and strive to match frequentist error

probabilities!

Contemporary Impersonal Bayesianism

Because of the difficulty of eliciting subjective priors, and

because of the reluctance among scientists to allow

subjective beliefs to be conflated with the information

provided by data, much current Bayesian work in practice

favors conventional default, uninformative, or

reference, priors .

1. What do reference posteriors measure?

- A classic conundrum: there is no unique

noninformative prior. (Supposing there is one

leads to inconsistencies in calculating posterior

marginal probabilities).

- Any representation of ignorance or lack of

information that succeeds for one parameterization

will, under a different parameterization, entail having

knowledge.

Contemporary reference Bayesians seeks priors that are

simply conventions to serve as weights for reference

posteriors.

45

- not to be considered expressions of uncertainty,

ignorance, or degree of belief.

- may not even be probabilities; flat priors may not

sum to one (improper prior). If priors are not

probabilities, what then is the interpretation of a

posterior? (a serious problem I would like to see

Bayesian philosophers tackle).

2. Priors for the same hypothesis changes according to

what experiment is to be done! Bayesian incoherence

If the prior is to represent information why should it be

influenced by the sample space of a contemplated

experiment?

Violates the likelihood principle the cornerstone of

Bayesian coherency

Reference Bayesians: it is the price of objectivity.

seems to wreck havoc with basic Bayesian

foundations, but without the payoff of an objective,

interpretable output even subjective Bayesians

object

46

3. Reference posteriors with good frequentist

properties

Reference priors are touted as having some good

frequentist properties, at least in one-dimensional

problems.

They are deliberately designed to match frequentist error

probabilities.

If you want error probabilities, why not use techniques

that provide them directly?

Note: using conditional probability which is part and

parcel of probability theory, as in Bayes nets does not

make one a Bayesian

no priors to hypotheses

- UT Dallas Syllabus for opre6301.502 06s taught by Carol Flannery (flannery)Загружено:UT Dallas Provost's Technology Group
- D6518-Standard Practice for Bias Testing a Mechanical Coal Sampling System.pdfЗагружено:Aya
- 2002 AP Statistics Multiple Choice ExamЗагружено:Michael Powell
- Confidence IntervalЗагружено:Trinhlazy
- SIP GuidelinesЗагружено:KavyaVishoriya
- Ujian Nisbah Bagi Dua PopulasiЗагружено:Nadzri Hasan
- Kruskal-Wallis testЗагружено:xiaoqiang9527
- Cidam- Statistics FfinalЗагружено:Kyle Bersalote
- Examen Modelo QAЗагружено:Jorge Rojas
- Pre TestЗагружено:anaggarwal
- CCCU_CGE13101_EXAM2013AЗагружено:Ping Fan
- Z1 QPЗагружено:Daniel Conway
- Jan 2003 W2Загружено:api-3726022
- 14Загружено:Shrey Mangal
- 2528901Загружено:andres57042
- prosthetic eyes7Загружено:Amar Bhochhibhoya
- Tut6Загружено:Raaisa Mahajan
- My bookЗагружено:CA Meenaxi Soni
- Bayesian Neural Networks for Bridge Integrity AssessmentЗагружено:Francisco Calderón
- COMPARATIVE STUDY OF ADVERTISEMENTS OF JEWELLERY BRANDS AND ITS IMPACT ON CUSTOMERS IN MUMBAIЗагружено:Anonymous CwJeBCAXp
- W5INSE6220.pdfЗагружено:picala
- Business Statistics ProjectЗагружено:Vaibhav Garg
- Abd-Elfattah, A.M.,2010,Goodness of fit Tests for Generalized Frechet Distribution, Australian Journal of Basic and Applied Sciences.pdfЗагружено:Aqsa Aim
- An Empirical Analysis of Factors Affecting Work Life Balance Among University Teachers the Case of PakistanЗагружено:Mohan Kumar M
- Model of Social and Environmental Scientific Research: A Theoretical test Applied to the Analysis of Environmental Public Policies and the Economic and Socio-Environmental Performance of FirmsЗагружено:IJAERS JOURNAL
- 02TicaЗагружено:aaditya01
- Reproducibility and Reliability of Biomedical ResearchЗагружено:nleonardo
- 5 Most Important Methods for Statistical Data AnalysisЗагружено:Mayank Kakkar
- comparing student's attitudes towards english language in an Indonesian state high schoolЗагружено:Friska Selvya
- ReviewChaps3-4Загружено:Fanny Sylvia C.

- Surface DetectionЗагружено:stranjerr
- Ecriture Fraction 1Загружено:ahmed22gouda22
- Medical Devices by FacilityЗагружено:jwalit
- 40592374Загружено:ahmed22gouda22
- Triumph Hotel & Conference CentreЗагружено:ahmed22gouda22
- 6965197 the Nine Basic Human NeedsЗагружено:ahmed22gouda22
- 2692221Загружено:ahmed22gouda22
- North TutorialЗагружено:ahmed22gouda22
- OpenAir Calculating Utilization Services CompanyЗагружено:ahmed22gouda22
- Ten YearЗагружено:ahmed22gouda22
- new11Загружено:ahmed22gouda22
- 08Загружено:ahmed22gouda22
- 15195736 Selling Skills Module FinalЗагружено:ahmed22gouda22
- 15121230 Golden Selling Skills Rules 1Загружено:ahmed22gouda22
- 7182082 Selling Skill TrainingЗагружено:ahmed22gouda22
- Roberts-Phelps G.-customer Relationship Management (2009)Загружено:Arfianty Reka
- PIRT9Загружено:ahmed22gouda22
- Handbook of Market SegmentationЗагружено:ahmed22gouda22
- BrightTALK_Measuring Marketing EffectivenessЗагружено:ahmed22gouda22
- CommunicationAffectAndLearningЗагружено:Mohd Redzuan Mohd Nor
- Using the Simplex Method to Solve LinearЗагружено:mbuyiselwa
- DK-The British Medical Association-Complete Family Health GuideЗагружено:dangnghia2112
- Lecture 2Загружено:ahmed22gouda22
- Muleke Et AlЗагружено:ahmed22gouda22
- Kit Ron 2010Загружено:ahmed22gouda22
- no_9Загружено:ahmed22gouda22
- Lab5Загружено:ahmed22gouda22

- My Educational Philosophy-1Загружено:caitie07
- fs2Загружено:Cruz L Antonio
- Syllabus Corporate StrategyЗагружено:Bona Ridya
- Evolutionary Thought in America-STOW PERSONS (1956)Загружено:Eva Hernandez
- Effective Language Teaching a Synthesis of ResearchЗагружено:resfreak
- Definitions of EmergenceЗагружено:eduardoazevedo_ccci
- 6. Operationalization of VariablesЗагружено:Muneer Hussain
- types of essay.docxЗагружено:Nerinel Coronado
- Math LessonЗагружено:Payton Ball
- Albert Einstein - Ether RelativityЗагружено:శ్యందీప్సహ
- Intercultural Communication EDUC13Загружено:Terrencio Reodava
- Study GuideЗагружено:Angelo Piner
- Hrd 3e Outlines LongЗагружено:Denisho Dee
- There Are Three Main Domains of LearningЗагружено:Mariel Abatayo
- annotated bibЗагружено:api-282287947
- All About MCQsЗагружено:blitzfritz
- Elal Phd Prop Guidelines 12 13Загружено:Matt Chen
- socratic seminar lesson planЗагружено:api-252467565
- DLL Practical Reeach 1 wk 2Загружено:JulieAnnLucasBagamaspad
- Crisp & Turner in Essential social psychology Chapter 2 AttributionЗагружено:WaldenSucksPsych
- magnets.pdfЗагружено:darqm5
- Behavioral Finance the Explanation of Investors’ Personality AndЗагружено:Nyoman Riyo
- observation one supervisor wordЗагружено:api-295143318
- stepping stones ch 7Загружено:api-348905386
- Final Multiple Choice QuestionsЗагружено:aartikalani
- Utilizing the Three-phase Strategy in TeachingЗагружено:Muhammad Bakri
- 19074388_nlЗагружено:Ivana Fasano
- Do Dice Play God.docxЗагружено:Pilo Pas Kwal
- Macro MicroЗагружено:Hardi Yanti Purnama
- Hook and Eye- Violence and the Captive GazeЗагружено:mitchellsturm

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.