Biometry - Minor I: Name - Vikash Kumar Entry No - 2016CH10117

BIOMETRY – MINOR I
Name – Vikash Kumar Entry No – 2016CH10117
1. Poisson Distribution
a. We have
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
𝑃𝑠(𝑡) =
𝑠!
And 𝜆𝑓 = 9 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠
Probability for no mutations of a fibrinopeptide in a billion years will be equal to 𝑃𝑠(𝑡) = 𝑠!
when s goes to zero i.e. no mutations at 𝜆 = 𝜆𝑓
−𝜆𝑓 𝑡 0
𝑒 (𝜆𝑓 𝑡)
Therefore required probability is 𝑃𝑠(𝑡) = 0!
9
1
= 𝑒 − 1000000000∗10000000000 ∗
1
= 𝑒 −9
= 1.234 ∗ 10−4
b. We have
𝑃𝑠(𝑡) = 𝑠!
And 𝜆𝑙 = 1.0 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠

Probability for three mutations of a lysozyme in a million years will be equal to 𝑃𝑠(𝑡) = 𝑠!
when s goes to 3 i.e. 3 mutations at 𝜆 = 𝜆𝑙
𝑒 −𝜆𝑙 𝑡 (𝜆𝑙 𝑡)3
Therefore required probability is 𝑃𝑠(𝑡) = 3!
1 3
1 ( ∗100000000)
− ∗100000000 1000000000
=𝑒 1000000000 ∗ 3!
(0.1)3
= 𝑒 −0.1 ∗ 6
= 1.508 ∗ 10−4
c. Assuming the three mutations to be three different sample set following Poisson’s Distribution
with three rates 𝜆𝑓 , 𝜆𝑙 , 𝜆𝐻
For Fibrinopeptides,
s→∞ (𝑡)
Σs=0 𝑃𝑠 = 1;
−𝜆𝑓 𝑡 𝑠
s→∞ 𝑒 (𝜆𝑓 𝑡)
Or Σs=0 = 1;
𝑠!
Here we can take 𝑒 −𝜆𝑓𝑡 out of summation, as it is independent of s.

𝑠
s→∞ (𝜆𝑓 𝑡) 1
Hence, the equation changes to Σs=0 = −𝜆𝑓 𝑡 ; ………………………….. 1
𝑠! 𝑒
𝑠
s→∞ (𝜆𝑓 𝑡) 1
So we get, 𝛼 = Σs=0 𝑠!
= −𝜆𝑓 𝑡 ;
𝑒
s→∞ (𝜆𝑙 𝑡)𝑠 1

Similarly for lysozyme, 𝛼 = Σs=0 = ;
𝑠! 𝑒 −𝜆𝑙 𝑡
s→∞ (𝜆𝐻 𝑡)𝑠 1

And for Histones it is, 𝛼 = Σs=0 𝑠!
= 𝑒 −𝜆𝐻 𝑡 ;
d. In general for any 𝜆

s→∞
< 𝑠 > = Σs=0 𝑠 ∗ 𝑃𝑠 (𝑡)
s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠

< 𝑠 > = Σs=0 s∗ 𝑠!
;
1 s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠

< 𝑠 > = e𝜆𝑡 Σs=0 s∗ 𝑠!
;
Now taking 𝜆𝑡 from (𝜆𝑡)𝑠 and s from 𝑠! which eventually will be cancelled by the already present s in
s→∞ λt (𝜆𝑡)𝑠−1
numerator we get, < 𝑠 > = e𝜆𝑡 Σs=0 (𝑠−1)!
;
Since 𝑠 ⟶ ∞ 𝑡ℎ𝑒𝑟𝑓𝑜𝑟𝑒 𝑤𝑒 𝑐𝑎𝑛 𝑠𝑎𝑓𝑒𝑙𝑦 𝑠𝑎𝑦 (𝑠 − 1) ⟶ ∞
For 𝑠 > 0 we replace 𝑠 − 1 𝑤𝑖𝑡ℎ 𝑘 𝑤ℎ𝑒𝑟𝑒 𝑘 𝑎𝑙𝑠𝑜 𝑡𝑒𝑛𝑑𝑠 𝑡𝑜 ∞
k→∞ λt (𝜆𝑡)𝑘
So we have < 𝑠 > = e𝜆𝑡 Σk=0 𝑘!
;
Using relation from 1

λt
We have < 𝑠 > = e𝜆𝑡 ∗ 𝑒 𝜆𝑡
Or < 𝑠 > = λt
For fibrinopeptide, < 𝑠 > = λf t
For lysozyme, < 𝑠 > = λl t
For Histone, < 𝑠 > = λH t

<𝑠>𝑓𝑖𝑏 λf t λ 9
e. <𝑠>𝐻𝑖𝑠
= λH t
= λ f = 0.01 = 900
H
a. As we can see that the three events are statistically independent event therefore probability
that a suspect’s DNA will match all three of these fragment characteristics by chance will be
equal to 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶)
Where A: Occurrence of DNA fragment A
B: Occurrence of DNA fragment B
C: Occurrence of DNA fragment C
Now we have been provided with in question that
𝑃(𝐴) = 1% 𝑖. 𝑒. 0.01
𝑃(𝐵) = 4% 𝑖. 𝑒. 0.04
𝑃(𝐶) = 2.5% 𝑖. 𝑒. 0.025
Therefore 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶) = 0.01 ∗ 0.04 ∗ 0.025 = 10−5
b. Using the same nomenclature as in above we will have

𝐶 𝐶 𝐵
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃((𝐴 ∩ 𝐵) ∩ 𝐶) = 𝑃 (𝐴∩𝐵) ∗ 𝑃(𝐴 ∩ 𝐵) = 𝑃 (𝐴∩𝐵) ∗ 𝑃 (𝐴) ∗ 𝑃(𝐴)
Here A B and A C are not independent events and we can’t deduce any relation between B
and C.
Now to solve it further we are going to need some kind of additional relationship or more
information.
3 Exponential Distribution
𝑓𝑥 (𝑥) = 𝜆𝑒 −𝜆𝑥
a. Assuming continuous distribution
∞
Ε(Χ) = ∫0 𝜆𝑥𝑒 −𝜆𝑥 𝑑𝑥
We can solve this integral using Feynman’s relation

∞ 1
∫0 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆
Upon taking derivative with respect to 𝜆 we get,
∞ 1
∫0 𝑥𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆2
Now upon multiplying both the sides by 𝜆 we get
∞ 1
∫0 𝜆𝑥𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆
1
Hence, we have Ε(Χ) = 𝜆
Now 𝑉𝑎𝑟(𝑋) = Ε(Χ 2 ) − Ε(Χ)2
∞
We have Ε(Χ 2 ) = ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥
Again using the same Feynman’s relation and differentiating one step further we have
∞ 2
∫0 𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆3
∞ 2
Or ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆 ∗ 𝜆3
∞ 2
Or ∫0 𝜆𝑥 2 𝑒 −𝜆𝑥 𝑑𝑥 = 𝜆2
1 2 1
Now Ε(Χ)2 = (𝜆) = 𝜆2
2 1 1
Hence 𝑉𝑎𝑟(𝑋) = Ε(Χ 2 ) − Ε(Χ)2 = 𝜆2 − 𝜆2 = 𝜆2
1 1
b. Given Ε(Χ) = 𝜆 and 𝑉𝑎𝑟(𝑋) = 𝜆2
;
Taking 𝜆 = 0.2
We have Ε(Χ) = 5 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = 25
Random number were generated using Random number generator tool in Data Analysis in excel. To
generate Probability density function for normal distribution NORM.DIST(RANDOM NUMBER
GENERATOR, MEAN, STANDARD DEVIATION,FALSE) has been used. Argument false leads to output as
pdf.
To generate Probability density function for exponential distribution EXPON.DIST(RAND(),𝜆 ,FALSE) has
been used. Argument false leads to output as pdf.
Histogram has been plotted using DATA ANALYSIS TOOL in Excel.
For Sample Size of 10 –

Bin Frequency
Histogram from Normal Distribution is
0.048911 0
0.05167 4
0.054429 2
More 4
Normal Distribution
5
4
Frequency
3
2
1
0 Frequency
Bin
Histogram from exponential distribution is
Bin Frequency
0.168352 2
0.177237 3
0.186122 3
More 2
Exponential Distribution
3.5
3
Frequency
2.5
2
1.5
1
0.5
0 Frequency
Bin
For sample size of 50

Bin Frequency
Normal Distribution- 0.048749 2
0.050059 7
0.051369 6
0.052679 4
0.053988 13
0.055298 7
0.056608 8
More 3
Normal Distribution
14
12
10
Frequency
8
6
4
2 Frequency
0
Bin
Bin Frequency
Exponential Distribution- 0.164065 2
0.169111 6
0.174157 10
0.179203 12
0.184249 2
0.189296 6
0.194342 6
More 6
14
12
Frequency
10
8
6
4
2
0 Frequency
Bin
For sample size 100

Normal Distribution- Bin Frequency
0.048798 2
0.049708 8
0.050619 11
0.05153 7
0.052441 12
0.053352 9
0.054262 12
0.055173 7
0.056084 8
0.056995 15
More 9
Normal Distribution
16
14
12
Frequency
10
8
6
4
2
0 Frequency
Bin
For exponential distribution-

Bin Frequency
0.163814 0
0.167417 10
0.171019 12
0.174622 12
0.178225 9
0.181827 8
0.18543 11
0.189032 13
0.192635 6
0.196238 7
More 12
14
12
Frequency
10
8
6
4
2
0 Frequency
Bin
As we can see that when the size is smaller for a sample there is larger differences in the frequency
v/s bin histogram plot between normal distribution and exponential distribution but as the number
of values in sample increases the plots are almost same which we can see in sample size of 100.
When I checked it for much larger sample sizes, 10000 the distribution plot was much same for both
the distribution highly skewed towards the right side of plot that is end of x-axis. The similarity must
be because probability density function of both exponential distribution and normal distribution is
exponential function.
c. As per question using the same assumption as above-

Taking 𝜆 = 0.2
We have Ε(Χ) = 5 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = 25
Random number generator has been used for generating random numbers in excel.
Now for 10,000 samples with 10 values each

Mean of means comes out to be 4.969628
Mean Variance comes out to be 24.90519
Histogram Plot for Mean is-
Frequency Frequency
0
100
200
300
400
500
600
700
800
900
50
0
100
150
250
300
200
1.424446958 -0.978833997
16.06512068 -0.329528237
30.70579441 0.319777524
45.34646813 0.969083284
59.98714186 1.618389044
Histogram Plot for Mean is-

74.62781558 2.267694805
Histogram Plot for Variance is-

89.26848931 2.917000565
103.909163 3.566306325
118.5498368 4.215612085
133.1905105 4.864917846
Mean Variance comes out to be 25.11218

Bin
Mean of means comes out to be 5.002944

Bin
5.514223606
for 10,000 samples with 100 values each

147.8311842
162.4718579 6.163529366
177.1125317 VARIANCE 6.812835126
191.7532054 7.462140887
8.111446647
Mean of samples
206.3938791
221.0345528 8.760752407
235.6752266 9.410058168
250.3159003 10.05936393
264.956574 10.70866969
Frequency
Frequency
d.
𝑋̅.
Frequency Frequency
50
50
0
0
100
150
200
250
300
350
400
100
150
250
300
200
13.76736853 3.07545305
15.652225 3.260450313
17.53708147 3.445447575
19.42193794 3.630444838
21.30679441 3.815442101
23.19165088 4.000439364
25.07650735 4.185436627
Histogram Plot for Variance is-

26.96136382 4.37043389
28.84622029 4.555431152
30.73107676 4.740428415
32.61593323 4.925425678
Bin
Bin
34.5007897 5.110422941
36.38564617 5.295420204
Mean
38.27050264 Variance 5.480417467

40.15535911 5.66541473
42.04021558 5.850411992
43.92507205 6.035409255
45.80992852 6.220406518
47.69478499 6.405403781
49.57964146 6.590401044
51.46449793 6.775398307
53.3493544 6.960395569
Here 𝑋̅ is mean of means which is only one for each samples (i.e. n=10 and n=100)
Frequency
Frequency
Therefore expected value of 𝑋 will be 𝑋 itself and Var(𝑋) will be zero since there is only one
For n=10 𝑋̅ = 4.969628 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 4.969628
Similarly for n=100 𝑋̅ = 5.002944 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 5.002944
̅ was mean for each sample, we will have the following results-
However considering that 𝑿
Ε(𝑋) Var(X̅ ) Ε(𝑆𝑥2 ) Var(𝑆𝑥2 )

n = 10 4.969628 2.50304 24.90519 159.6396
n = 100 5.002944 0.250528 25.11218 13.99591
Here
𝑆𝑥2 = (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 )2 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Ε(𝑋̅) = Mean of 𝑋̅
1 i→n 2
Var(𝑋̅) = 𝑛 Σi=0 (𝑋̅ − 𝐸(𝑋̅))
Ε(𝑆𝑥2 ) = 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑉𝑎𝑟(𝑋̅)
1 i→n 2
Var(𝑆𝑥2 ) = Σ (𝑆
𝑛 i=0 𝑥
− 𝐸(𝑆𝑥2 ))2
e. Central limit theorem states that random variables, which are a sum of a number of other
random variables, will tend towards a normal distribution even if the variable that are sum
are not normally distributed as the number of sum variable increases.
From the histogram plot of part d (for means and variances), we can clearly see that the
variables are following trends like that of a normal distribution.
In part c, the trend (normal distribution) is not observed when sample size is small i.e. 10
however, as the sample size is increasing from 10 to 50 to 100 the exponential and normal
distribution appears roughly following same trend although the bell shaped plot which is
characteristic of normal distribution is not clearly observable.

Biometry - Minor I: Name - Vikash Kumar Entry No - 2016CH10117

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Biometry - Minor I: Name - Vikash Kumar Entry No - 2016CH10117

Загружено:

Авторское право:

Доступные форматы

BIOMETRY – MINOR I

Name – Vikash Kumar Entry No – 2016CH10117

And 𝜆𝑓 = 9 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠

And 𝜆𝑙 = 1.0 𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑜𝑛𝑠 𝑝𝑒𝑟 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 𝑦𝑒𝑎𝑟𝑠

Here we can take 𝑒 −𝜆𝑓𝑡 out of summation, as it is independent of s.

s→∞ (𝜆𝑙 𝑡)𝑠 1

s→∞ (𝜆𝐻 𝑡)𝑠 1

d. In general for any 𝜆

s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠

1 s→∞ 𝑒 −𝜆𝑡 (𝜆𝑡)𝑠

Since 𝑠 ⟶ ∞ 𝑡ℎ𝑒𝑟𝑓𝑜𝑟𝑒 𝑤𝑒 𝑐𝑎𝑛 𝑠𝑎𝑓𝑒𝑙𝑦 𝑠𝑎𝑦 (𝑠 − 1) ⟶ ∞

For 𝑠 > 0 we replace 𝑠 − 1 𝑤𝑖𝑡ℎ 𝑘 𝑤ℎ𝑒𝑟𝑒 𝑘 𝑎𝑙𝑠𝑜 𝑡𝑒𝑛𝑑𝑠 𝑡𝑜 ∞

Using relation from 1

For fibrinopeptide, < 𝑠 > = λf t

For lysozyme, < 𝑠 > = λl t

For Histone, < 𝑠 > = λH t

b. Using the same nomenclature as in above we will have

We can solve this integral using Feynman’s relation

For Sample Size of 10 –

For sample size of 50

For sample size 100

For exponential distribution-

c. As per question using the same assumption as above-

Now for 10,000 samples with 10 values each

Histogram Plot for Mean is-

Histogram Plot for Variance is-

Mean Variance comes out to be 25.11218

Mean of means comes out to be 5.002944

for 10,000 samples with 100 values each

Histogram Plot for Variance is-

38.27050264 Variance 5.480417467

Ε(𝑋) Var(X̅ ) Ε(𝑆𝑥2 ) Var(𝑆𝑥2 )

Ε(𝑆𝑥2 ) = 𝑀𝑒𝑎𝑛 𝑜𝑓 𝑉𝑎𝑟(𝑋̅)

Вам также может понравиться