Академический Документы
Профессиональный Документы
Культура Документы
1. Poisson Distribution
a. We have
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
𝑃𝑠(𝑡) =
𝑠!
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
Probability for no mutations of a fibrinopeptide in a billion years will be equal to 𝑃𝑠(𝑡) = 𝑠!
when s goes to zero i.e. no mutations at 𝜆 = 𝜆𝑓
−𝜆𝑓 𝑡 0
𝑒 (𝜆𝑓 𝑡)
Therefore required probability is 𝑃𝑠(𝑡) = 0!
9
1
= 𝑒 − 1000000000∗10000000000 ∗
1
= 𝑒 −9
= 1.234 ∗ 10−4
b. We have
𝑒 −𝜆𝑡 (𝜆𝑡)𝑠
𝑃𝑠(𝑡) = 𝑠!
1 3
1 ( ∗100000000)
− ∗100000000 1000000000
=𝑒 1000000000 ∗ 3!
(0.1)3
= 𝑒 −0.1 ∗ 6
= 1.508 ∗ 10−4
c. Assuming the three mutations to be three different sample set following Poisson’s Distribution
with three rates 𝜆𝑓 , 𝜆𝑙 , 𝜆𝐻
For Fibrinopeptides,
s→∞ (𝑡)
Σs=0 𝑃𝑠 = 1;
−𝜆𝑓 𝑡 𝑠
s→∞ 𝑒 (𝜆𝑓 𝑡)
Or Σs=0 = 1;
𝑠!
Now taking 𝜆𝑡 from (𝜆𝑡)𝑠 and s from 𝑠! which eventually will be cancelled by the already present s in
s→∞ λt (𝜆𝑡)𝑠−1
numerator we get, < 𝑠 > = e𝜆𝑡 Σs=0 (𝑠−1)!
;
k→∞ λt (𝜆𝑡)𝑘
So we have < 𝑠 > = e𝜆𝑡 Σk=0 𝑘!
;
Or < 𝑠 > = λt
a. As we can see that the three events are statistically independent event therefore probability
that a suspect’s DNA will match all three of these fragment characteristics by chance will be
equal to 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶)
Where A: Occurrence of DNA fragment A
B: Occurrence of DNA fragment B
C: Occurrence of DNA fragment C
Now we have been provided with in question that
𝑃(𝐴) = 1% 𝑖. 𝑒. 0.01
𝑃(𝐵) = 4% 𝑖. 𝑒. 0.04
𝑃(𝐶) = 2.5% 𝑖. 𝑒. 0.025
Therefore 𝑃(𝐴)𝑃(𝐵)𝑃(𝐶) = 0.01 ∗ 0.04 ∗ 0.025 = 10−5
Here A B and A C are not independent events and we can’t deduce any relation between B
and C.
Now to solve it further we are going to need some kind of additional relationship or more
information.
3 Exponential Distribution
𝑓𝑥 (𝑥) = 𝜆𝑒 −𝜆𝑥
a. Assuming continuous distribution
∞
Ε(Χ) = ∫0 𝜆𝑥𝑒 −𝜆𝑥 𝑑𝑥
1 2 1
Now Ε(Χ)2 = (𝜆) = 𝜆2
2 1 1
Hence 𝑉𝑎𝑟(𝑋) = Ε(Χ 2 ) − Ε(Χ)2 = 𝜆2 − 𝜆2 = 𝜆2
1 1
b. Given Ε(Χ) = 𝜆 and 𝑉𝑎𝑟(𝑋) = 𝜆2
;
Taking 𝜆 = 0.2
We have Ε(Χ) = 5 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑋) = 25
Random number were generated using Random number generator tool in Data Analysis in excel. To
generate Probability density function for normal distribution NORM.DIST(RANDOM NUMBER
GENERATOR, MEAN, STANDARD DEVIATION,FALSE) has been used. Argument false leads to output as
pdf.
To generate Probability density function for exponential distribution EXPON.DIST(RAND(),𝜆 ,FALSE) has
been used. Argument false leads to output as pdf.
Histogram has been plotted using DATA ANALYSIS TOOL in Excel.
Normal Distribution
5
4
Frequency
3
2
1
0 Frequency
Bin
Histogram from exponential distribution is
Bin Frequency
0.168352 2
0.177237 3
0.186122 3
More 2
Exponential Distribution
3.5
3
Frequency
2.5
2
1.5
1
0.5
0 Frequency
Bin
8
6
4
2 Frequency
0
Bin
Bin Frequency
Exponential Distribution- 0.164065 2
0.169111 6
0.174157 10
0.179203 12
0.184249 2
0.189296 6
0.194342 6
More 6
Exponential Distribution
14
12
Frequency
10
8
6
4
2
0 Frequency
Bin
Normal Distribution
16
14
12
Frequency
10
8
6
4
2
0 Frequency
Bin
10
8
6
4
2
0 Frequency
Bin
As we can see that when the size is smaller for a sample there is larger differences in the frequency
v/s bin histogram plot between normal distribution and exponential distribution but as the number
of values in sample increases the plots are almost same which we can see in sample size of 100.
When I checked it for much larger sample sizes, 10000 the distribution plot was much same for both
the distribution highly skewed towards the right side of plot that is end of x-axis. The similarity must
be because probability density function of both exponential distribution and normal distribution is
exponential function.
0
100
200
300
400
500
600
700
800
900
50
0
100
150
250
300
200
1.424446958 -0.978833997
16.06512068 -0.329528237
30.70579441 0.319777524
45.34646813 0.969083284
59.98714186 1.618389044
206.3938791
221.0345528 8.760752407
235.6752266 9.410058168
250.3159003 10.05936393
264.956574 10.70866969
Frequency
Frequency
d.
𝑋̅.
Frequency Frequency
50
50
0
0
100
150
200
250
300
350
400
100
150
250
300
200
13.76736853 3.07545305
15.652225 3.260450313
17.53708147 3.445447575
19.42193794 3.630444838
21.30679441 3.815442101
23.19165088 4.000439364
25.07650735 4.185436627
Bin
34.5007897 5.110422941
36.38564617 5.295420204
Mean
Here 𝑋̅ is mean of means which is only one for each samples (i.e. n=10 and n=100)
Frequency
Frequency
Therefore expected value of 𝑋 will be 𝑋 itself and Var(𝑋) will be zero since there is only one
For n=10 𝑋̅ = 4.969628 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 4.969628
Similarly for n=100 𝑋̅ = 5.002944 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 Ε(𝑋̅) = 5.002944
̅ was mean for each sample, we will have the following results-
However considering that 𝑿
Here
𝑆𝑥2 = (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 )2 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Ε(𝑋̅) = Mean of 𝑋̅
1 i→n 2
Var(𝑋̅) = 𝑛 Σi=0 (𝑋̅ − 𝐸(𝑋̅))
1 i→n 2
Var(𝑆𝑥2 ) = Σ (𝑆
𝑛 i=0 𝑥
− 𝐸(𝑆𝑥2 ))2
e. Central limit theorem states that random variables, which are a sum of a number of other
random variables, will tend towards a normal distribution even if the variable that are sum
are not normally distributed as the number of sum variable increases.
From the histogram plot of part d (for means and variances), we can clearly see that the
variables are following trends like that of a normal distribution.
In part c, the trend (normal distribution) is not observed when sample size is small i.e. 10
however, as the sample size is increasing from 10 to 50 to 100 the exponential and normal
distribution appears roughly following same trend although the bell shaped plot which is
characteristic of normal distribution is not clearly observable.