Вы находитесь на странице: 1из 11

BIOMETRY – MINOR I

Name – Vikash Kumar Entry No – 2016CH10117

1. Poisson Distribution

a. We have
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
𝑃𝑠(𝑡) =
𝑠!

And 𝜆𝑓 = 9 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠

𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
Probability for no mutations of a fibrinopeptide in a billion years will be equal to 𝑃𝑠(𝑡) = 𝑠!
when s goes to zero i.e. no mutations at 𝜆 = 𝜆𝑓
−𝜆𝑓 𝑡 0
𝑒 (𝜆𝑓 𝑡)
Therefore required probability is 𝑃𝑠(𝑡) = 0!
9
1
= 𝑒 − 1000000000∗10000000000 ∗
1

= 𝑒 −9

= 1.234 ∗ 10−4

b. We have
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
𝑃𝑠(𝑡) = 𝑠!

And 𝜆𝑙 = 1.0 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠


𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
Probability for three mutations of a lysozyme in a million years will be equal to 𝑃𝑠(𝑡) = 𝑠!
when s goes to 3 i.e. 3 mutations at 𝜆 = 𝜆𝑙
𝑒 −𝜆𝑙 𝑡 (𝜆𝑙 𝑡)3
Therefore required probability is 𝑃𝑠(𝑡) = 3!

1 3
1 ( ∗100000000)
− ∗100000000 1000000000
=𝑒 1000000000 ∗ 3!

(0.1)3
= 𝑒 −0.1 ∗ 6

= 1.508 ∗ 10−4
c. Assuming the three mutations to be three different sample set following Poisson’s Distribution
with three rates 𝜆𝑓 , 𝜆𝑙 , 𝜆𝐻

For Fibrinopeptides,
s→∞ (𝑡)
Σs=0 𝑃𝑠 = 1;
−𝜆𝑓 𝑡 𝑠
s→∞ 𝑒 (𝜆𝑓 𝑡)
Or Σs=0 = 1;
𝑠!

Here we can take 𝑒 −𝜆𝑓𝑡 out of summation, as it is independent of s.


𝑠
s→∞ (𝜆𝑓 𝑡) 1
Hence, the equation changes to Σs=0 = −𝜆𝑓 𝑡 ; ………………………….. 1
𝑠! 𝑒
𝑠
s→∞ (𝜆𝑓 𝑡) 1
So we get, 𝛼 = Σs=0 𝑠!
= −𝜆𝑓 𝑡 ;
𝑒

s→∞ (𝜆𝑙 𝑡)𝑠 1


Similarly for lysozyme, 𝛼 = Σs=0 = ;
𝑠! 𝑒 −𝜆𝑙 𝑡

s→∞ (𝜆𝐻 𝑡)𝑠 1


And for Histones it is, 𝛼 = Σs=0 𝑠!
= 𝑒 −𝜆𝐻 𝑡 ;

d. In general for any 𝜆


s→∞
< 𝑠 > = Σs=0 𝑠 ∗ 𝑃𝑠 (𝑡)

s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠


< 𝑠 > = Σs=0 s∗ 𝑠!
;

1 s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠


< 𝑠 > = e𝜆𝑡 Σs=0 s∗ 𝑠!
;

Now taking 𝜆𝑡 from (𝜆𝑡)𝑠 and s from 𝑠! which eventually will be cancelled by the already present s in
s→∞ λt (𝜆𝑡)𝑠−1
numerator we get, < 𝑠 > = e𝜆𝑡 Σs=0 (𝑠−1)!
;

Since 𝑠 ⟶ ∞ 𝑡ℎ𝑒𝑟𝑓𝑜𝑟𝑒 𝑤𝑒 𝑐𝑎𝑛 𝑠𝑎𝑓𝑒𝑙𝑦 𝑠𝑎𝑦 (𝑠 − 1) ⟶ ∞

For 𝑠 > 0 we replace 𝑠 − 1 𝑤𝑖𝑡ℎ 𝑘 𝑤ℎ𝑒𝑟𝑒 𝑘 𝑎𝑙𝑠𝑜 𝑡𝑒𝑛𝑑𝑠 𝑡𝑜 ∞

k→∞ λt (𝜆𝑡)𝑘
So we have < 𝑠 > = e𝜆𝑡 Σk=0 𝑘!
;

Using relation from 1


λt
We have < 𝑠 > = e𝜆𝑡 ∗ 𝑒 𝜆𝑡

Or < 𝑠 > = λt

For fibrinopeptide, < 𝑠 > = λf t

For lysozyme, < 𝑠 > = λl t

For Histone, < 𝑠 > = λH t


<𝑠>𝑓𝑖𝑏 λf t λ 9
e. <𝑠>𝐻𝑖𝑠
= λH t
= λ f = 0.01 = 900
H

a. As we can see that the three events are statistically independent event therefore probability
that a suspect’s DNA will match all three of these fragment characteristics by chance will be
equal to 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶)
Where A: Occurrence of DNA fragment A
B: Occurrence of DNA fragment B
C: Occurrence of DNA fragment C
Now we have been provided with in question that
𝑃(𝐴) = 1% 𝑖. 𝑒. 0.01
𝑃(𝐵) = 4% 𝑖. 𝑒. 0.04
𝑃(𝐶) = 2.5% 𝑖. 𝑒. 0.025
Therefore 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶) = 0.01 ∗ 0.04 ∗ 0.025 = 10−5

b. Using the same nomenclature as in above we will have


𝐶 𝐶 𝐵
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃((𝐴 ∩ 𝐵) ∩ 𝐶) = 𝑃 (𝐴∩𝐵) ∗ 𝑃(𝐴 ∩ 𝐵) = 𝑃 (𝐴∩𝐵) ∗ 𝑃 (𝐴) ∗ 𝑃(𝐴)

Here A B and A C are not independent events and we can’t deduce any relation between B
and C.

Now to solve it further we are going to need some kind of additional relationship or more
information.

3 Exponential Distribution

𝑓𝑥 (𝑥) = 𝜆𝑒 −𝜆𝑥
a. Assuming continuous distribution

Ε(Χ) = ∫0 𝜆𝑥𝑒 −𝜆𝑥 𝑑𝑥

We can solve this integral using Feynman’s relation


∞ 1
∫0 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆
Upon taking derivative with respect to 𝜆 we get,
∞ 1
∫0 𝑥𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆2
Now upon multiplying both the sides by 𝜆 we get
∞ 1
∫0 𝜆𝑥𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆
1
Hence, we have Ε(Χ) = 𝜆
Now 𝑉𝑎𝑟(𝑋) = Ε(Χ 2 ) − Ε(Χ)2

We have Ε(Χ 2 ) = ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥
Again using the same Feynman’s relation and differentiating one step further we have
∞ 2
∫0 𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆3
∞ 2
Or ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆 ∗ 𝜆3
∞ 2
Or ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆2

1 2 1
Now Ε(Χ)2 = (𝜆) = 𝜆2
2 1 1
Hence 𝑉𝑎𝑟(𝑋) = Ε(Χ 2 ) − Ε(Χ)2 = 𝜆2 − 𝜆2 = 𝜆2

1 1
b. Given Ε(Χ) = 𝜆 and 𝑉𝑎𝑟(𝑋) = 𝜆2
;
Taking 𝜆 = 0.2
We have Ε(Χ) = 5 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = 25

Random number were generated using Random number generator tool in Data Analysis in excel. To
generate Probability density function for normal distribution NORM.DIST(RANDOM NUMBER
GENERATOR, MEAN, STANDARD DEVIATION,FALSE) has been used. Argument false leads to output as
pdf.
To generate Probability density function for exponential distribution EXPON.DIST(RAND(),𝜆 ,FALSE) has
been used. Argument false leads to output as pdf.
Histogram has been plotted using DATA ANALYSIS TOOL in Excel.

For Sample Size of 10 –


Bin Frequency
Histogram from Normal Distribution is
0.048911 0
0.05167 4
0.054429 2
More 4

Normal Distribution
5
4
Frequency

3
2
1
0 Frequency

Bin
Histogram from exponential distribution is

Bin Frequency
0.168352 2
0.177237 3
0.186122 3
More 2

Exponential Distribution
3.5
3
Frequency

2.5
2
1.5
1
0.5
0 Frequency

Bin

For sample size of 50


Bin Frequency
Normal Distribution- 0.048749 2
0.050059 7
0.051369 6
0.052679 4
0.053988 13
0.055298 7
0.056608 8
More 3
Normal Distribution
14
12
10
Frequency

8
6
4
2 Frequency
0

Bin

Bin Frequency
Exponential Distribution- 0.164065 2
0.169111 6
0.174157 10
0.179203 12
0.184249 2
0.189296 6
0.194342 6
More 6

Exponential Distribution
14
12
Frequency

10
8
6
4
2
0 Frequency

Bin

For sample size 100


Normal Distribution- Bin Frequency
0.048798 2
0.049708 8
0.050619 11
0.05153 7
0.052441 12
0.053352 9
0.054262 12
0.055173 7
0.056084 8
0.056995 15
More 9

Normal Distribution
16
14
12
Frequency

10
8
6
4
2
0 Frequency

Bin

For exponential distribution-


Bin Frequency
0.163814 0
0.167417 10
0.171019 12
0.174622 12
0.178225 9
0.181827 8
0.18543 11
0.189032 13
0.192635 6
0.196238 7
More 12
Exponential Distribution
14
12
Frequency

10
8
6
4
2
0 Frequency

Bin

As we can see that when the size is smaller for a sample there is larger differences in the frequency
v/s bin histogram plot between normal distribution and exponential distribution but as the number
of values in sample increases the plots are almost same which we can see in sample size of 100.
When I checked it for much larger sample sizes, 10000 the distribution plot was much same for both
the distribution highly skewed towards the right side of plot that is end of x-axis. The similarity must
be because probability density function of both exponential distribution and normal distribution is
exponential function.

c. As per question using the same assumption as above-


Taking 𝜆 = 0.2
We have Ε(Χ) = 5 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = 25
Random number generator has been used for generating random numbers in excel.

Now for 10,000 samples with 10 values each


Mean of means comes out to be 4.969628
Mean Variance comes out to be 24.90519
Histogram Plot for Mean is-
Frequency Frequency

0
100
200
300
400
500
600
700
800
900
50
0
100
150
250
300

200

1.424446958 -0.978833997
16.06512068 -0.329528237
30.70579441 0.319777524
45.34646813 0.969083284
59.98714186 1.618389044

Histogram Plot for Mean is-


74.62781558 2.267694805

Histogram Plot for Variance is-


89.26848931 2.917000565
103.909163 3.566306325
118.5498368 4.215612085
133.1905105 4.864917846

Mean Variance comes out to be 25.11218


Bin

Mean of means comes out to be 5.002944


Bin
5.514223606

for 10,000 samples with 100 values each


147.8311842
162.4718579 6.163529366
177.1125317 VARIANCE 6.812835126
191.7532054 7.462140887
8.111446647
Mean of samples

206.3938791
221.0345528 8.760752407
235.6752266 9.410058168
250.3159003 10.05936393
264.956574 10.70866969
Frequency

Frequency
d.

𝑋̅.
Frequency Frequency

50
50

0
0

100
150
200
250
300
350
400
100
150
250
300

200
13.76736853 3.07545305
15.652225 3.260450313
17.53708147 3.445447575
19.42193794 3.630444838
21.30679441 3.815442101
23.19165088 4.000439364
25.07650735 4.185436627

Histogram Plot for Variance is-


26.96136382 4.37043389
28.84622029 4.555431152
30.73107676 4.740428415
32.61593323 4.925425678
Bin

Bin
34.5007897 5.110422941
36.38564617 5.295420204
Mean

38.27050264 Variance 5.480417467


40.15535911 5.66541473
42.04021558 5.850411992
43.92507205 6.035409255
45.80992852 6.220406518
47.69478499 6.405403781
49.57964146 6.590401044
51.46449793 6.775398307
53.3493544 6.960395569

Here 𝑋̅ is mean of means which is only one for each samples (i.e. n=10 and n=100)
Frequency

Frequency

Therefore expected value of 𝑋 will be 𝑋 itself and Var(𝑋) will be zero since there is only one
For n=10 𝑋̅ = 4.969628 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 4.969628
Similarly for n=100 𝑋̅ = 5.002944 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 5.002944

̅ was mean for each sample, we will have the following results-
However considering that 𝑿

Ε(𝑋) Var(X̅ ) Ε(𝑆𝑥2 ) Var(𝑆𝑥2 )


n = 10 4.969628 2.50304 24.90519 159.6396
n = 100 5.002944 0.250528 25.11218 13.99591

Here
𝑆𝑥2 = (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 )2 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

Ε(𝑋̅) = Mean of 𝑋̅
1 i→n 2
Var(𝑋̅) = 𝑛 Σi=0 (𝑋̅ − 𝐸(𝑋̅))

Ε(𝑆𝑥2 ) = 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑉𝑎𝑟(𝑋̅)

1 i→n 2
Var(𝑆𝑥2 ) = Σ (𝑆
𝑛 i=0 𝑥
− 𝐸(𝑆𝑥2 ))2

e. Central limit theorem states that random variables, which are a sum of a number of other
random variables, will tend towards a normal distribution even if the variable that are sum
are not normally distributed as the number of sum variable increases.
From the histogram plot of part d (for means and variances), we can clearly see that the
variables are following trends like that of a normal distribution.
In part c, the trend (normal distribution) is not observed when sample size is small i.e. 10
however, as the sample size is increasing from 10 to 50 to 100 the exponential and normal
distribution appears roughly following same trend although the bell shaped plot which is
characteristic of normal distribution is not clearly observable.

Вам также может понравиться