Вы находитесь на странице: 1из 14

Probability versus Statistical Science

Capture Recapture

Some Details for this Course

Text: Rice - Mathematical Statistics and Data Analysis


Coverage: At least Chapters 8, 9 & 13 [skip Bayesian sections; Bootstrap (?)]
Exposure to some R
Evaluation: Weekly Quizes [20%], Mid-term [30%], Final [50%]
Weekly Assignments: Textbook questions on which quiz is based
Weekly tutorials: Q&A for assigned questions, review of material, quiz
My office hours: SS 6019

1 / 14

Probability versus Statistical Science

Capture Recapture

Review

Mathematics: calculus and algebra


Special subset of mathematics:

Probability
Axioms, random variables,
Densities, distributions
Expected value, variance, skewness
Normal theory

Axioms: for & A, A1 , A2


P() = 1
P(A) 0 A
A1 , A2 and disjoint
2 ) = P(A1 ) + P(A2 )
P(A1 A
Example: density function f (X )
fR (X ) 0 X X

f (u)du = 1

Is it helpful?

Limit theorems: WLLN, CLT

2 / 14

Probability versus Statistical Science

Capture Recapture

Improving bit transmission

Bits [0-1] transmitted over noisy channel


Chance a bit is flipped with probability p
To improve communication the receiver uses a majority decoded
Bit is sent an odd # of times, say n = 5
Let X be the number of times the bit is flipped
So the bit is communicated correctly if X 2
Say we know from the physical properties of the channel that p = 0.1
Then X Bin(5, 0.1) and P(X 2) = .9914
Transmission of the bit has improved from 90% to over 99% success rate
The use of simply probability has been helpful in improving communication
No guessing - were not answering anything that was previously unknown

3 / 14

Probability versus Statistical Science

Capture Recapture

Probability vs. Statistical Science


Parameter of scientific interest
Data X from a process modelled using
Probability:
Fixed and known
X Random and unknown
Straightfoward [no inference]

Statistical Inference:
Unknown (Fixed or Random?)
X Observed: Fixed and known
Inference for Guess in a formal or disciplined way
Philosophical question: If is unknown is it fixed or random?
Subjective or Objective reality; Bayesian or Frequentist
Inference required through [statistical/quantitative] reasoning/thinking
Struggle with the nature of evidence and truth in the face of uncertainty

4 / 14

Probability versus Statistical Science

Capture Recapture

A Wildlife Study

For the population

N
Wildlife Population - Caribou
Population size: N
Capture Caribou and tag them: T
Release the caribou
Wait

N T

For the recaptured Caribou

Recapture n Caribou

Count how many are tagged: t


Whats the distribution of t?

n t

t Hypergeometric[N , T , n]

5 / 14

Probability versus Statistical Science

Capture Recapture

The HyperGeometric Distribution

t Hypergeometric[N , T , n]


PrN (t) =

> phyper(c(2,6),20,80,10)

 

N t
T
nt
  t
N
n

Suppose N = 100
We tag T = 20
Recapture n = 10

[1] 0.6812201
[2] 0.9996083
> phyper(2,20,80,10)-phyper(1,20,80,10)
[1] 0.3181706

Have t {0, 1, . . . , 10}

> phyper(6,20,80,10)-phyper(5,20,80,10)

P(t t0 ) = phyper (t0 , T , N T , n)

[1] 0.00354136

6 / 14

Probability versus Statistical Science

Capture Recapture

The Density
Plot PrN (t)

0.30

0.25
0.20
0.10

0.15

0.05

HyperGeometric Probability

0.00

10

t # tagged

Suppose tobs = 6

7 / 14

Probability versus Statistical Science

Capture Recapture

Suppose tobs = 6 - Do we really know N ?

0.30
0.20
0.10

t # tagged

10

10

0.20

0.10

N=40

0.00

0.20

N=60

0.10

t # tagged

0.00

t # tagged

10

0.00

HyperGeometric Probability

HyperGeometric Probability

0.30
0.20
0.10

HyperGeometric Probability

N=80

0.00

HyperGeometric Probability

N=100

10

t # tagged

8 / 14

Probability versus Statistical Science

Capture Recapture

t # tagged

10

0.4

10

0.3

0.2

0.1

0.30
0.20
0.10

0.00

N=25

N=30

t # tagged

t # tagged

0.20

10

0.0

0.10

0.00

HyperGeometric Probability

HyperGeometric Probability

0.10

0.20

HyperGeometric Probability

N=35

0.00

HyperGeometric Probability

N=40

10

t # tagged

9 / 14

Probability versus Statistical Science

Capture Recapture

The Likelihood
Wait a minute!!
Why are we plotting t vs PrN (t) for various N
Lets plot N vs PrN (t) at t = tobs = 6

0.25
0.05

0.10

0.15

0.20

0.00

HyperGeometric Probability[Likelihood]

40

60

80

100

N Poulation Size

This is the likelihood - L(N ) = PrN (tobs )


10 / 14

Probability versus Statistical Science

Capture Recapture

The Likelihood

0.30

Repeat the plot for N {20, . . . , 100}

0.25

0.20

0.15

0.10

0.05

0.00

HyperGeometric Probability[Likelihood]

20

40

60

80

100

N Poulation Size

11 / 14

Probability versus Statistical Science

Capture Recapture

Evidence of discrimination

48 Files: 24 women, 24 men


Randomly assigned to 48 male

supervisors

Promote
Hold

Male
21
3
24

Female
14
10
24

35
13
48

Assessed as promote or hold

21 Men promoted versus 14 women

However all files are identical

Is there a bias against women?

Just labelled male or female

Is there any evidence of bias?


Could 21 & 14 happen by chance?

12 / 14

Probability versus Statistical Science

Capture Recapture

Evidence of discrimination

Could 21 & 14 happen by chance?


Sure! In fact, so could 24 & 0
But unlikely in the absence of bias
But is 21 & 14 unlikely?
We need to quantify this
Attach probabilities to outcomes
In the absence of bias

Assume theres no bias


Then we simply have 35 promoters
And 21 of them were given Male files
Whats the probability of this?
And from this probability what can

we infer about the presence or


absence of bias?
This is Statistical Inference

13 / 14

Probability versus Statistical Science

Capture Recapture

Modelling

Have a population of 48 assessors


Of these 35 are promoters

X Hypergeometric[48, 35, 24]


But only if theres no bias!!!
So we can now compute

48

35

Whats the distribution of X ?

probabilities and, in a disciplined or


formal way, consider the question:

13

Could 21 &14 happen by chance?

Assign assessors to male files


P(X = 21) = .021

24

X = 21

Evidence of bias

x
P(X=x)

18
0.24

19
0.16

20
0.07

14 / 14