You are on page 1of 15

# 4.

Ch 4. Statistics

## Quantitative analysis requires

: sound knowledge of chemistry
: possibility of interferences
WHY do we need to use STATISTICS in Anal. Chem. ?

## uncertainty exists.  will we accept uncertainty always ?

if not, from how will we disregard the data ?

by statistical treatment

4.2

## 1) mean value & standard deviation

* mean :x : or average

x i
x i
n

4.3

## * standard dev. : s : measures

how closely the data are clustered
around the mean
s
(x i  x )2
n 1

## for an infinite set of data:

x (mean)   (mu, popular mean)
s   (sigma, popular standard deviation)
or  2 : var iance

4.4

4.5

## 2) std.dev. & probability

1 ( x   )2
Gaussian curve  y exp(  )
 2 2 2

##  tells the broadness

of Gaussian curve
in a gaussian curve
area under 1 = 68.3 %
2 = 95.5 %
3 = 99.7 %

4.6

## 4-1 Gaussian Distribution (Cont.)

3) std.dev. of mean
more measurements  more confident on average
1 (nearly the true value)
uncertainty decreases by : n = number of meas.
n
s
standard deviation of mean = : s = std.dev.
n
s
* relative standard deviation = (RSD)
x
or into percentage = s  100 = C.V.
x
x
precision of mean =
n
average deviation of mean = d ( d   xx )
n n

4.7

## 4-2 Confidence Intervals

1) confidence interval : an expression stating that true mean, ,
is likely to lie within a certain distance
our measurements  , s (instead of , )
x
True mean () is likely to lie within a certain range from
Confidence intervals
s
  x t
n

4.8

## Ex. The content of carbohydrate in a glycoprotein (a protein with

sugars attached to it) is determined to be 12.6, 11.9, 13.0, 12.7, and
12.5 g per 100 g of protein in replicated analysis. Find the 50% and
90% confidence intervals for the carbohydrate content.

4.9

## 4-3 Comparison of means with Student's t

(from different measurements)

## : tool for expressing confidence

interval for comparing results
from other experimental tech.

## Normally, 95% confidence level

: Two results do not differ from
each other IF there is 95%
chance that our conclusion is
correct.

4.10

## : when we test a new analytical method,

we want to see if it agrees to a known value.
ex) Ni content; known value : 0.0319% (from std. Material)
measured value : 0.0329, 0.0322, 0.0330, 0.0323 %
The 95% confidence interval ?
0.0004
x  3.182  0.0326  0.0006
4
this interval doesn't cover 0.0319,
thus, measured value are different from known val.

## Not within the random error boundary.

(it implies there exists systematic errors)

## Anal. Chem. by Prof. Myeong Hee Moon

4.11
1. <t-test> You are developing a procedure for determining traces of copper in
biological materials using a wet digestion followed by measurements by atomic
absorption spectrophotometry. In order to test the validity of the method, you
obtain a NIST orchard leaves standard reference material and analyze this
material. Five replicas are sampled and analyzed, and the mean of the results is
found to be 10.08 ppm with a standard deviation of 0.7ppm. The listed value is
11.7ppm. Does your method gives a statistically correct value at the 95%
confidence level ?

4.12

## Case 2. t test: comparing replicate measurements

(test of two sets of measurements)
: test the two techniques are statistically the SAME or NOT
for two sets of data, n1, n2 measurements

x1  x 2 n1n2
t 
S pooled n1  n2

S pooled 
(x i  x1 )2   ( x j  x 2 )2

s12 (n1  1)  s22 (n2  1)
n1  n2  2 n1  n2  2

## If tcal > ttable (within 95%)

this difference is significant
(out of random error range)
there exists systematic error

## Anal. Chem. by Prof. Myeong Hee Moon

4.13
Ex) The average mass of nitrogen from air in Table 4-3 is =2.31011 g, with
a standard deviation of s1=0.00014, (for n1=7 measurements). The average
mass from chemical sources is =2.29947 g, with a standard deviation of
s2=0.00138 (for n2=8 measurements)

## Anal. Chem. by Prof. Myeong Hee Moon

4.14
2. <t-test> A new gravimetric method is developed for iron (II) in which the iron
is precipitated in crystalline form with an organocarbon "cage" compound. The
accuracy of the method is checked by analyzing the iron in an ore sample and
comparing with the results using the standard precipitation with ammonia and
weighing of Fe2O3. The results, reported as % Fe for each analysis, were as
follows.
Test method Reference Method
20.10% 18.89%
20.50 19.20
18.65 19.00
19.25 19.70
19.40 19.40
19.99 19.40
=19.65% =19.24%
Is there a difference between the two methods ?

4.15

## Two different methods on several different samples (no duplication)

Cholesterol content (g/L)

## Plasma Method A Method B Difference (di)

sample
1 1.46 1.42 0.04
2 2.22 2.38 -0.16
3 2.84 2.67 0.17
4 1.97 1.80 0.17
5 1.13 1.09 0.04
6 2.35 2.25 0.10
 =+0.06
d

t cal 
d
n sd 
 (d i  d )2
Sd n 1

4.16

## Red cell counts on five “normal” days

: 5.1, 5.3, 4.8, 5.4, and 5.2x106 cells/L  x =5.16 s=0.23
Today’s value = 5.6x106 cells/L

## today ' s count  x 5.16  5.6

t cal  n 5  4.28
Sd 0.23

## What is the probability of finding t=4.28 for 4 degrees of freedom ?

See table 4.2: at 4 degrees of freedom, 4.28 lies between 98 & 99%
 There is less than a 2% probability of observing a count of
5.6x106 cells/L on normal days.

4.17

## 4-4 Comparison of st.dev. with the F test

F test ---- check two std.devs are significantly different each other.
S12
Fcalc  2 If Fcalc > Ftable then significant
S2

4.18

## 4-6. Grubbs test for an outlier

during measurements of mass lost of zinc,
we need to discard some questionable data
10.2, 10.8, 11.6, 9.9, 9.4, 7.8, 10.0, 9.2,
11.3, 9.5, 10.6, 11.6

4.19

## 4-7. Method of Least Squares

1. Finding the BEST STRAIGHT LINE

## 1) Method of Least Squares

y = mx + b
m: slope, b: y-intercept

## each data --- ( xi, yi )

vertical deviation
= di = yi - y
= yi - (mxi + b)

4.20

## we want to MINIMIZE di (whether positive or neg.)

-- direct summation of each di ? no good

## method of maximum likelihood

: Assume a gaussian distribution with std.dev. i.
for the observations about the actual value y(xi) at x=xi

1  1  y  y  2 
the probability Pi Pi  exp  i  
i 2 
  i  
2

##  maximize the probability ?

 minimize the sum in the exponential…

4.21

## 4-7. Method of Least Squares

2
d 
    i 
2
di2 = (yi - y)2 = (yi - mxi -b)2
 i 

minimizing  (assume )
2

 2

m
 2
 
b

n ( x i y i )   x i  y i
METHOD OF m
LEAST SQUARES n ( xi2 )  (  xi )2

b  ( xi2 ) yi   ( xiyi ) xi
n ( xi2 )  (  xi )2
Anal. Chem. by Prof. Myeong Hee Moon

4.22

## 2) How reliable are least-squares parameters ?

estimate UNCERTAINTY in slope & intercept

std. dev. of y
y  sy   (di )2
deg rees of freedom (  n - 2)

 2yn
m
2

n ( xi2 )  (  xi )2
2y  ( xi2 )
b2 
n ( xi2 )  (  xi )2

4.23

## Std. Solution : solutions with known concentrations

How to build calibration ?
1. prepare a series of std. Solutions (varying conc.)
measure absorbance.

4.24

## 3. Plot the absorbances vs. Concentration

 then do least squares.

4.25

## 4-8. Calibration Curves

Uncertainty Propagation in Calibration curve

m : slope

## Depends on # of calibration points.

Lowest error data from the center of calibration

4.26

Homework

4.27

## Additional Problems Set

1. The following replicate calcium determinations on a blood sample using AAS
and a new colorimetric method were reported. Is there a significant difference in

the precision of the two methods ?
AAS (mg/dL) 10.9, 10.1, 10.6, 11.2, 9.7, 10.0
Colorimetric (mg/dL) 9.2, 10.5,9.7, 11.5,11.6, 9.3, 10.1, 11.2

## titrations using different indicators to find the end point. Is the

difference between indicators 1 and 2 significant at the 95%
confidence level ? Answer the same question for indicator 2 and 3.
Indicator Mean HCl concentration Number of
(M) (+std.dev.) Measurements
1. Bromothymol blue 0.09565 + 0.00225 28
2. Methyl red 0.08686 + 0.00098 18
3. Bromocresol green 0.08641 + 0.00113 29

4.29

## 3. A Standard Reference Material is certified to contain 94.6 ppm of an organic

contaminant in soil. Your analysis gives values of 98.6, 98.4, 97.2, 94.6, and
96.2 ppm. Do your results differ from the expected results at the 95%
confidence level ? If you made one more measurement and found 94.5, would
your conclusion change ?